date:20110323

[jira] [Created] (LUCENE-2988) trunk 'ant test' hangs

2011-03-23 Thread Doron Cohen (JIRA)

trunk 'ant test' hangs
--

 Key: LUCENE-2988
 URL: https://issues.apache.org/jira/browse/LUCENE-2988
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
 Environment: inspected so far on XP within Cygwin using IBM JDK 6
Reporter: Doron Cohen
Assignee: Doron Cohen
 Fix For: 4.0


Running 'ant test' from trunk on XP in a Cygwin shell hangs, taking 100% CPU.
There was no progress in the console for a long time, so i stopped the program.
Before stopping it, created 5 consecutive thread dumps to see where the code is.
It is not clear what is going on - does not seem like a Lucene code I think but 
not sure.
Opening this issue to keep an eye on this - I will try with other JDKs to see 
if this is persistent.
Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
LUCENE-2977 - I think the changes in these issues are related but will repeat 
the tests without these changes.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2061) Generate jar containing test classes.

2011-03-23 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2061:
--

Attachment: SOLR-2061.patch

This version of the patch includes all of Robert's, and adds in Maven and 
IntelliJ support.

The Solr test-framework binary, source, and javadoc jars are produced by {{ant 
generate-maven-artifacts}} and signed, along with their {{.pom}} file, by {{ant 
sign-artifacts}}.

The Maven build works through the {{install}} phase, including the {{test}} 
phase, switching all modules to depend on the new Solr test framework jar 
instead of the jar produced from all Solr test sources.

The IntelliJ build works, and all modules' test suites run and succeed.


> Generate jar containing test classes.
> -
>
> Key: SOLR-2061
> URL: https://issues.apache.org/jira/browse/SOLR-2061
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1
>Reporter: Drew Farris
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, 
> SOLR-2061.patch
>
>
> Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate 
> and deploy a jar contaiing the test classes so other projects could write 
> unit tests using the framework in Solr. 
> This may take care of SOLR-717 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2987) QueryParser throwing null pointer exception if input is invalid

2011-03-23 Thread Ramesh (JIRA)

QueryParser throwing null pointer exception if input is invalid
---

 Key: LUCENE-2987
 URL: https://issues.apache.org/jira/browse/LUCENE-2987
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 3.0.2
Reporter: Ramesh


I was using org.apache.lucene.queryParser.QueryParser for parsing the input.
My input:
Input query string:  "category:(4 or 6 or 8)"
Analyzer: StandardAnalyzer
QueryParser's parse() method is resulting in Null Pointer Exception.

If i give input query string as "category:(4 OR 6 OR 8)" which is uppercase 
'OR', it works fine and i get the desired results.
I'm seeing the problem only with lower case 'or'

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2987) QueryParser throwing null pointer exception if input is invalid

2011-03-23 Thread Ramesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh updated LUCENE-2987:
---

Description: 
I was using org.apache.lucene.queryParser.QueryParser for parsing the input.
My input:
Input query string:  "category: (4 or 6 or 8)"
Analyzer: StandardAnalyzer
QueryParser's parse() method is resulting in Null Pointer Exception.

If i give input query string as "category: (4 OR 6 OR 8)" which is uppercase 
'OR', it works fine and i get the desired results.
I'm seeing the problem only with lower case 'or'

  was:
I was using org.apache.lucene.queryParser.QueryParser for parsing the input.
My input:
Input query string:  "category:(4 or 6 or 8)"
Analyzer: StandardAnalyzer
QueryParser's parse() method is resulting in Null Pointer Exception.

If i give input query string as "category:(4 OR 6 OR 8)" which is uppercase 
'OR', it works fine and i get the desired results.
I'm seeing the problem only with lower case 'or'


> QueryParser throwing null pointer exception if input is invalid
> ---
>
> Key: LUCENE-2987
> URL: https://issues.apache.org/jira/browse/LUCENE-2987
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: 3.0.2
>Reporter: Ramesh
>
> I was using org.apache.lucene.queryParser.QueryParser for parsing the input.
> My input:
> Input query string:  "category: (4 or 6 or 8)"
> Analyzer: StandardAnalyzer
> QueryParser's parse() method is resulting in Null Pointer Exception.
> If i give input query string as "category: (4 OR 6 OR 8)" which is uppercase 
> 'OR', it works fine and i get the desired results.
> I'm seeing the problem only with lower case 'or'

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-23 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010570#comment-13010570
 ] 

Doron Cohen commented on LUCENE-2986:
-

+1 for this change (I did not remember discussing this, but other than 
remembering I am consistent :))

Patch looks very clean.

Minor technical comments - concerning just some tests:

- some of the DSP implementations are still named xyzSimilarity - I think it 
would be more clear to name them xyzSimilarityProvider:
-- o.a.l.search.payloads.TestPayloadNearQuery.BoostingSimilarity
-- o.a.l.search.payloads.TestPayloadTermQuery.BoostingSimilarity
-- o.a.solr.schema.MockConfigurableSimilarity
-- o.a.l.index.TestIndexWriterConfig.MySimilarity
-- o.a.l.index.TestIndexReaderCloneNorms.SimilarityOne
-- o.a.l.index.TestNorms.SimilarityOne
-- o.a.l.index.TestOmitTf.SimpleSimilarity
-- o.a.l.search.TestSimilarity.SimpleSimilarity

- for few of the above it is not only the name - they are still doing both 
roles: {code}extends DefaultSimilarity implements SimilarityProvider{code}:
-- o.a.l.search.payloads.TestPayloadNearQuery.BoostingSimilarity
-- o.a.l.search.payloads.TestPayloadTermQuery.BoostingSimilarity
-- o.a.l.index.TestOmitTf.SimpleSimilarity
-- o.a.l.search.TestSimilarity.SimpleSimilarity

Other than that I think it is good to go in.

Also, tests from trunk/lucene and trunk/solr passed.
(I am seeing problems in running all trunk tests, at least on Windows, but I'll 
send a separate mail to the list on that)

> divorce defaultsimilarityprovider from defaultsimilarity
> 
>
> Key: LUCENE-2986
> URL: https://issues.apache.org/jira/browse/LUCENE-2986
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2986.patch
>
>
> In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
> factory interface (SimilarityProvider), and also extends Similarity.
> Its factory interface just returns itself always by default.
> Doron mentioned it would be cleaner to split the two, and I thought it would 
> be good to revisit it later.
> Today as I was looking at SOLR-2338, it became pretty clear that we should do 
> this, it makes things a lot cleaner. I think currently its confusing to users 
> to see the two apis mixed if they are trying to subclass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Ryan McKinley

>
> I don't think someone should have to deal with maven to get the lucene
> source release... I think lucene should have its own artifacts as in
> the past (the source code being the most important).
>

sorry, did not mean to muddy the water with maven discussion...
ignore my comment

when you say "lucene should have its own artifacts" do you mean lucene
w/o solr?  or could a single source artifact include everything?
(making the release process easier and apparently cleaner)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

GSoC 2011

2011-03-23 Thread Phillipe Ramalho

Hello,

I am planning to submit a project proposal to GSoC 2011 and Lucene seems to
have a lot of GSoC projects this year. Last year I did a GSoC project using
Lucene for PhotArk project. This year, instead of just using Lucene, I am
planning to contribute code to it.

My experience with Lucene is just as a regular user, the only code I have
changed/extended so far was token streams/analyzers and query parser, so I
have more knowledge on this part of the code. Based on that, I'm planning to
focus on query parser and analyzer/token stream projects. Does that sound
reasonable?

I will be studying the code and planning the proposal(s), so you should
start seeing more posts from me in the next few days.

--
Phillipe Ramalho

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Robert Muir

On Thu, Mar 24, 2011 at 12:18 AM, Ryan McKinley  wrote:
>
> I don't want to suggest anything to slow down the release... but if
> the problems are with the source release, what about just doing a
> single source release for lucene+solr?
>
> We currently have:
>
> lucene-solr-3.1RC2/lucene/
> lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz
> lucene-solr-3.1RC2/lucene/...
> lucene-solr-3.1RC2/solr/
> lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz
> lucene-solr-3.1RC2/solr/...
>
> Why not:
> lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz
> lucene-solr-3.1RC2/lucene/...
> lucene-solr-3.1RC2/solr/...
>
> and let the src release be as close to svn export as possible?  This
> will make sure the result builds just as it does when we actually
> build it!
>
> With the maven artifacts, we have source for each jar:
> http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar
>
> http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar
>
> I'm not sure the exact ASF source requirements, but maybe the maven
> source.jar files are good enough?
>

I don't think someone should have to deal with maven to get the lucene
source release... I think lucene should have its own artifacts as in
the past (the source code being the most important).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2338) improved per-field similarity integration into schema.xml

2011-03-23 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2338:
--

Attachment: SOLR-2338.patch

Here's a first stab: I included LUCENE-2986's cleanup work for easy testing 
(this issue depends upon it).

Here is the syntax:
{noformat}
  
  

  


  


  

  


  is there an echo?

  
{noformat}

Additionally, its necessary to allow customization of the SimilarityProvider 
too, in order to customize the non-field specific stuff like coord()... this is 
done via:
{noformat}
 
 
   is there an echo?
 
{noformat}


> improved per-field similarity integration into schema.xml
> -
>
> Key: SOLR-2338
> URL: https://issues.apache.org/jira/browse/SOLR-2338
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: SOLR-2338.patch
>
>
> Currently since LUCENE-2236, we can enable Similarity per-field, but in 
> schema.xml there is only a 'global' factory
> for the SimilarityProvider.
> In my opinion this is too low-level because to customize Similarity on a 
> per-field basis, you have to set your own
> CustomSimilarityProvider with  and manage the 
> per-field mapping yourself in java code.
> Instead I think it would be better if you just specify the Similarity in the 
> FieldType, like after .
> As far as the example, one idea from LUCENE-1360 was to make a "short_text" 
> or "metadata_text" used by the
> various metadata fields in the example that has better norm quantization for 
> its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Ryan McKinley

>
> : Please vote to release the artifacts at
> : http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2
>
> -0
>
> I can't in good conscience vote for these artifacts.
>

I don't want to suggest anything to slow down the release... but if
the problems are with the source release, what about just doing a
single source release for lucene+solr?

We currently have:

lucene-solr-3.1RC2/lucene/
lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz
lucene-solr-3.1RC2/lucene/...
lucene-solr-3.1RC2/solr/
lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz
lucene-solr-3.1RC2/solr/...

Why not:
lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz
lucene-solr-3.1RC2/lucene/...
lucene-solr-3.1RC2/solr/...

and let the src release be as close to svn export as possible?  This
will make sure the result builds just as it does when we actually
build it!

With the maven artifacts, we have source for each jar:
http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar

http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar

I'm not sure the exact ASF source requirements, but maybe the maven
source.jar files are good enough?

Again, I don't think this should be a blocker, but it would be nice to
have things simplified for the next release -- gasp.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010547#comment-13010547
 ] 

Shai Erera commented on LUCENE-2977:


Looks good to me.

> WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
> specified output file name
> -
>
> Key: LUCENE-2977
> URL: https://issues.apache.org/jira/browse/LUCENE-2977
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2977.patch, LUCENE-2977.patch
>
>
> Since the readers behave this way it would be nice and handy if also this 
> line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Chris Hostetter


: Please vote to release the artifacts at
: http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2

-0

I can't in good conscience vote for these artifacts.

For the most part, there only only a few minor hicups -- but the big 
blocker (in my opinion) is that since RC1, dev-tools has been removed from 
the solr src packages and this causes the top level build.xml (and 
instructions for IDE users in the top level README.txt file) to be broken.

My detailed notes below...

##
### apache-solr-3.1.0-src.tgz

dev-tools isn't in here -- this totally boggles my mind, particularly 
since there was a deliberate and concious switch to make the source 
releases match what you get when doing an "svn export"

because dev-tools is missing, 3 of the top level ant targets advertised 
using "ant -p" don't work; including 'ant idea' and 'ant eclipse' which 
are also explicitly mentioned in the top level README.txt as how people 
using those IDEs should get started developing the code.

This seems like a major issue to me.   

we're setting ourselves up to make the release look completely broken 
right out of the gate for anyone using one of those IDEs.

Ask about this on IRC.  yonik & ryan indicated that a couple of folks had 
said they would veto any release with dev-tools in it because that stuff 
is suppose to be "unsupported" ... this makes no sense to me as we have 
lots of places in the code base where things are documented as being 
experimental, subject to change, and/or for developer use only.  i don't 
relaly see how dev-tools should be any different.

if there is really such violent oposition to including dev-tools in src 
releases, then the top level build.xml should not depend on it, and the 
top level README.txt should not refer to it (except maybe with something 
like "people interested in hacking on the src should use svn which 
includes some unofficial 'dev-tools'"
---

Now that the src packages are driven by svn exports, more files exist then 
were in RC1 and some of the changes we made to the solr/README.txt based 
on the earlier release candidates are missleading.  

In particular a lot of things are listed as being in the "docs" directory 
of a binary distribution, but those files *do* exist in the src packages 
-- if you look in the "site" directory.  This seems silly, but at no point 
is the README.txt factually incorrect, so I guess it's not a big enough 
deal to worry about.

---

running all tests, running the example, and building the javadocs all 
worked fine.

##
### apache-solr-3.1.0.tgz

docs look good, basic example usage works fine.

##
### apache-solr-3.1.0.zip

Diffing the contents of apache-solr-3.1.0.tgz with apache-solr-3.1.0.zip 
(using "diff --ignore-all-space --strip-trailing-cr -r") turned up a quite 
a fiew instances where the CRLF fixing in build.xml seems to have 
corrupted some non-ascii characters in a few files

 contrib/dataimporthandler/lib/activation-LICENSE.txt 
 contrib/dataimporthandler/lib/mail-LICENSE.txt
 docs/skin/CommonMessages_de.xml
 docs/skin/CommonMessages_es.xml
 docs/skin/CommonMessages_fr.xml
 example/solr/conf/velocity/facet_dates.vm

...but these changes don't seem to have substantively harmed the files.

##
### lucene-3.1.0-src.tar.gz

tests and javadocs worked fine.

##
### lucene-3.1.0.tar.gz

docs look good, demo runs fine.

##
### lucene-3.1.0.zip

no differences found with lucene-3.1.0.tar.gz





-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2439) change solr javadocs to link to local lucene javadocs w/relative links

2011-03-23 Thread Hoss Man (JIRA)

change solr javadocs to link to local lucene javadocs w/relative links
--

 Key: SOLR-2439
 URL: https://issues.apache.org/jira/browse/SOLR-2439
 Project: Solr
  Issue Type: Task
  Components: documentation
Reporter: Hoss Man
 Fix For: 3.2


Now that solr/lucene are in lock step development, and solr releases include 
the entire lucene-java release, the solr ant targets for building javadocs 
should depend on the lucene (and module) targets for building javadocs and link 
directly to the local copies of those docs (using relative paths)

(currently, the links point to 
https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2399) Solr Admin Interface, reworked

2011-03-23 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010318#comment-13010318
 ] 

Stefan Matheis (steffkes) edited comment on SOLR-2399 at 3/23/11 8:15 PM:
--

Ryan: ty, will take your points on my list - pretty sure, that it should be 
possible to integrate them
Mark: ty! :)

For today, it's about *Logging*. Talked about that with Hoss on #solr the last 
days, so already changed a few things .. on the way, but not finished: 
http://files.mathe.is/solr-admin/07_logging.png

Actually thinking about the following points:
* Tree Structure good way to solve it?
* Do we need the possibitly to collapse/expand the three/the childrens? The 
List could be longer (the screenshot is cropped, just for layout reasons) 
especially while using SolrCloud which adds about 30 Loggers
* In the current er .. "Interface" you are able to see that the row you're 
looking at has a level set and in the end (at the right) which is the effective 
level - for me, that does not matter. if a row/logger, has level-x - that's 
enough to know. don't need to see if this level is set or inherited.
* just a quick idea: if you change f.e. {{org.apache.solr}} then the interface 
will automatically update all childrens in realtime, affects all nested/sub 
loggers w/o a assigned level.

Thoughts on these points? anyone? :>

Short Note: i moved Logging to a global level, because it's not configurable on 
a per-core basis.

# Edit

What i forgot to mention .. actually it's based on a [static 
logging.json-file|https://github.com/steffkes/solr-admin/blob/master/logging.json]
 but will try to change the {{LogLevelSection}} Servlet so that it outputs the 
needed json-structure

  was (Author: steffkes):
Ryan: ty, will take your points on my list - pretty sure, that it should be 
possible to integrate them
Mark: ty! :)

For today, it's about *Logging*. Talked about that with Hoss on #solr the last 
days, so already changed a few things .. on the way, but not finished: 
http://files.mathe.is/solr-admin/07_logging.png

Actually thinking about the following points:
* Tree Structure good way to solve it?
* Do we need the possibitly to collapse/expand the three/the childrens? The 
List could be longer (the screenshot is cropped, just for layout reasons) 
especially while using SolrCloud which adds about 30 Loggers
* In the current er .. "Interface" you are able to see that the row you're 
looking at has a level set and in the end (at the right) which is the effective 
level - for me, that does not matter. if a row/logger, has level-x - that's 
enough to know. don't need to see if this level is set or inherited.
* just a quick idea: if you change f.e. {{org.apache.solr}} then the interface 
will automatically update all childrens in realtime, affects all nested/sub 
loggers w/o a assigned level.

Thoughts on these points? anyone? :>

Short Note: i moved Logging to a global level, because it's not configurable on 
a per-core basis.
  
> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Minor
> Fix For: 4.0
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin 
> [This commit shows the 
> differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
>  between old/existing index.jsp and my new one (which is could 
> copy-cut/paste'd from the existing one).
> Main Action takes place in 
> [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
>  which is actually neither clean nor pretty .. just work-in-progress.
> Actually it's Work in Progress, so ... give it a try. It's developed with 
> Firefox as Browser, so, for a first impression .. please don't use _things_ 
> like Internet Explorer or so ;o
> Jan already suggested a bunch of good things, i'm sure there are more ideas 
> over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Attachment: LUCENE-2977.patch

Thanks for reviewing Shai!

bq. In StreamUtils you have ".bz" -- it should be ".bz2"

Good catch!
Fixed.

bq. +1 (you mean the bzip.compression property in WLDT right?). 

Yes.

bq. I think that it's reasonable to request the user to specify an output file 
with .bz2 extension if he wants bzip compression. 

Great, I removed it.

bq. I don't see how it will simplify StreamUtils though, but I trust you :) 
(perhaps you meant it will simplify WLDT?)

It allowed to keep just one of the two variations of 
StreamUtils.outputStream(). WLDT and the tests became simpler as well.

Attaching updated patch.
(again first apply that svn mv...)

> WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
> specified output file name
> -
>
> Key: LUCENE-2977
> URL: https://issues.apache.org/jira/browse/LUCENE-2977
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2977.patch, LUCENE-2977.patch
>
>
> Since the readers behave this way it would be nice and handy if also this 
> line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-03-23 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010318#comment-13010318
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

Ryan: ty, will take your points on my list - pretty sure, that it should be 
possible to integrate them
Mark: ty! :)

For today, it's about *Logging*. Talked about that with Hoss on #solr the last 
days, so already changed a few things .. on the way, but not finished: 
http://files.mathe.is/solr-admin/07_logging.png

Actually thinking about the following points:
* Tree Structure good way to solve it?
* Do we need the possibitly to collapse/expand the three/the childrens? The 
List could be longer (the screenshot is cropped, just for layout reasons) 
especially while using SolrCloud which adds about 30 Loggers
* In the current er .. "Interface" you are able to see that the row you're 
looking at has a level set and in the end (at the right) which is the effective 
level - for me, that does not matter. if a row/logger, has level-x - that's 
enough to know. don't need to see if this level is set or inherited.
* just a quick idea: if you change f.e. {{org.apache.solr}} then the interface 
will automatically update all childrens in realtime, affects all nested/sub 
loggers w/o a assigned level.

Thoughts on these points? anyone? :>

Short Note: i moved Logging to a global level, because it's not configurable on 
a per-core basis.

> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Minor
> Fix For: 4.0
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin 
> [This commit shows the 
> differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d]
>  between old/existing index.jsp and my new one (which is could 
> copy-cut/paste'd from the existing one).
> Main Action takes place in 
> [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js]
>  which is actually neither clean nor pretty .. just work-in-progress.
> Actually it's Work in Progress, so ... give it a try. It's developed with 
> Firefox as Browser, so, for a first impression .. please don't use _things_ 
> like Internet Explorer or so ;o
> Jan already suggested a bunch of good things, i'm sure there are more ideas 
> over there :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2415) Change XMLWriter version parameter to "wt.xml.version"

2011-03-23 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010308#comment-13010308
 ] 

Hoss Man commented on SOLR-2415:


bq. how should we handle the desire to change the faceting format (to make it 
easier to add metadata like total number of constraints, etc)? "version" would 
be one way. "facet.format" would be another way.

i don't think the *structure* of the response (ie: the facet response section) 
should be driven by the same param as the *format* of the response, which is 
what "version" currently is.  Something like facet.format seems more 
appropriate when dealing with a specific component like that ... but i don't 
think it should be a numeric "version" equse property, i think it should be 
descriptive (ie: "flat", vs "nested" or something)


bq. perhaps we should add a getVersion() parameter on SolrQueryRequest and have 
that used across all components.

when i suggested we have a common wt.version param that all of the response 
writers could use, i didn't mean to suggest that it should have a singular id 
space. my suggestion was that the specific values specified for "version" or 
"wt.version" or whatever would only be meaningful to the specific response 
writer used -- just as the current values of the version param that the 
XMLResponseWriter uses are meaninless to the JSONResponseWriter.  the overlap 
would only be in reusing the param name (in the same way that "q" is the common 
param name for the main query, regardless of what query parser is specified by 
"defType")


bq. Look at how long the existing response writers have hung around in their 
current format, independent of the version # changes (1.2, 1.3, 1.4, and now 
3.1)

the version param of the XML response writer has never been in sync with the 
solr version, it was never intended to be.  it's always been the version number 
of the xml format.

> Change XMLWriter version parameter to "wt.xml.version"
> --
>
> Key: SOLR-2415
> URL: https://issues.apache.org/jira/browse/SOLR-2415
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Priority: Trivial
> Fix For: 4.0
>
>
> The XMLWriter has a parameter called 'version'.  This controls some specifics 
> about how the XMLWriter works.  Using the parameter name 'version' made sense 
> back when the XMLWriter was the only option, but with all the various writers 
> and different places where 'version' makes sense, I think we should change 
> this parameter name to "wt.xml.version" so that it specifically refers to the 
> XMLWriter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: write byte[] directly to TokenStream

2011-03-23 Thread Ryan McKinley

works great - thanks!


On Wed, Mar 23, 2011 at 1:04 AM, Robert Muir  wrote:
>
> On Mar 22, 2011 11:38 PM, "Ryan McKinley"  wrote:
>>
>> I'm messing with putting binary data directly in the index.  I have a
>> field class with:
>>
>>  @Override
>>  public TokenStream tokenStreamValue() {
>>    byte[] value = (byte[])fieldsData;
>>
>>    Token token = new Token( 0, value.length, "geo" );
>>    token.resizeBuffer( value.length );
>>    BytesRef ref = token.getBytesRef();
>>    ref.bytes = value;
>>    ref.length = value.length;
>>    ref.offset = 0;
>>    token.setLength( ref.length );
>>    return new SingleTokenTokenStream( token );
>>  }
>>
>> but that is just writing an empty token.  Is it possible to set the
>> Token value without converting to char[]?
>>
>
> check out Test2BTerms for an example...
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010268#comment-13010268
 ] 

Peter Sturge commented on SOLR-2438:


If you're like me, you may have often wondered why MyTerm, myterm, myter* and 
MyTer* can return different, and sometimes empty results.
This patch addresses this for wildcard queries by adding an attribute to 
relevant solr.TextField entries in schema.xml.
The new attribute is called:  {{ignoreCaseForWildcards}}

Example entry in schema.xml:
{code:title=schema.xml [excerpt]|borderStyle=solid}

  


  
  
  
  
  
  

{code}

It's worth noting that this will lower-case text for ALL terms that match the 
field type - including synonyms and stemmers.

For backward compatibility, the default behaviour is as before - i.e. a case 
sensitive wildcard search ({{ignoreCaseForWildcards=false}}).

The patch was created against the lucene_solr_3_1 branch. I've not applied it 
yet on trunk.

[caveat emptor] I freely admit I'm no schema expert, so commiters and community 
members may see use cases where this approach could pose problems. I'm all for 
feedback to enhance the functionality...

The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches 
in Solr - in line with the 'it just works' Solr philosophy.

Enjoy!


> Case Insensitive Search for Wildcard Queries
> 
>
> Key: SOLR-2438
> URL: https://issues.apache.org/jira/browse/SOLR-2438
> Project: Solr
>  Issue Type: Improvement
>Reporter: Peter Sturge
> Attachments: SOLR-2438.patch
>
>
> This patch adds support to allow case-insensitive queries on wildcard 
> searches for configured TextField field types.
> This patch extends the excellent work done Yonik and Michael in SOLR-219.
> The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010263#comment-13010263
 ] 

Shai Erera commented on LUCENE-2977:


Patch looks good !

In StreamUtils you have ".bz" -- it should be ".bz2"

bq. Any opinions on removing this "force-bzip" option?

+1 (you mean the bzip.compression property in WLDT right?). I think that it's 
reasonable to request the user to specify an output file with .bz2 extension if 
he wants bzip compression. I don't see how it will simplify StreamUtils though, 
but I trust you :) (perhaps you meant it will simplify WLDT?)

> WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
> specified output file name
> -
>
> Key: LUCENE-2977
> URL: https://issues.apache.org/jira/browse/LUCENE-2977
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2977.patch
>
>
> Since the readers behave this way it would be nice and handy if also this 
> line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-2438:
---

Attachment: SOLR-2438.patch

Attached patch file

> Case Insensitive Search for Wildcard Queries
> 
>
> Key: SOLR-2438
> URL: https://issues.apache.org/jira/browse/SOLR-2438
> Project: Solr
>  Issue Type: Improvement
>Reporter: Peter Sturge
> Attachments: SOLR-2438.patch
>
>
> This patch adds support to allow case-insensitive queries on wildcard 
> searches for configured TextField field types.
> This patch extends the excellent work done Yonik and Michael in SOLR-219.
> The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)

Case Insensitive Search for Wildcard Queries


 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge


This patch adds support to allow case-insensitive queries on wildcard searches 
for configured TextField field types.

This patch extends the excellent work done Yonik and Michael in SOLR-219.
The approach here is different enough (imho) to warrant a separate JIRA issue.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-23 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010218#comment-13010218
 ] 

Paul Elschot edited comment on LUCENE-2945 at 3/23/11 5:01 PM:
---

New -2945d patch that also has the changes to SpanNearClauseFactory.

  was (Author: paul.elsc...@xs4all.nl):
Also has the changes to SpanNearClauseFactory.
  
> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
> LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
> LUCENE-2945d.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-23 Thread Paul Elschot (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-2945:
-

Attachment: LUCENE-2945d.patch

Also has the changes to SpanNearClauseFactory.

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
> LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, 
> LUCENE-2945d.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Attachment: LUCENE-2977.patch

Patch for auto-detecting output compression mode of result line file:

- getInputStream() moved from ContentSource to a new class StreamUtils under 
util. It is now named inputStream(File).
- outputStream() method added to StreamUtils.

Before applying this patch *svn mv 
modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/ContentSourceTest.java
 
modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/utils/StreamUtilsTest.java*

I kept for now the "force-bzip" logic in WriteLineDocTask but I would like to 
remove it - it is strange, and in any case LineDocSource would only auto-detect 
bzip input format if WriteLineDocTask was able to auto-detect bzip output 
format. Removing it will also simplify StreamUtils. Any opinions on removing 
this "force-bzip" option?


> WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
> specified output file name
> -
>
> Key: LUCENE-2977
> URL: https://issues.apache.org/jira/browse/LUCENE-2977
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2977.patch
>
>
> Since the readers behave this way it would be nice and handy if also this 
> line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-23 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2986:


Attachment: LUCENE-2986.patch

Attached is a patch: adds DefaultSimilarityProvider, which has our default 
implementations of the non-field-specific methods (coord/queryNorm/etc), and 
always returns DefaultSimilarity.

> divorce defaultsimilarityprovider from defaultsimilarity
> 
>
> Key: LUCENE-2986
> URL: https://issues.apache.org/jira/browse/LUCENE-2986
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2986.patch
>
>
> In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
> factory interface (SimilarityProvider), and also extends Similarity.
> Its factory interface just returns itself always by default.
> Doron mentioned it would be cleaner to split the two, and I thought it would 
> be good to revisit it later.
> Today as I was looking at SOLR-2338, it became pretty clear that we should do 
> this, it makes things a lot cleaner. I think currently its confusing to users 
> to see the two apis mixed if they are trying to subclass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-23 Thread Robert Muir (JIRA)

divorce defaultsimilarityprovider from defaultsimilarity


 Key: LUCENE-2986
 URL: https://issues.apache.org/jira/browse/LUCENE-2986
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0


In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
factory interface (SimilarityProvider), and also extends Similarity.

Its factory interface just returns itself always by default.

Doron mentioned it would be cleaner to split the two, and I thought it would be 
good to revisit it later.

Today as I was looking at SOLR-2338, it became pretty clear that we should do 
this, it makes things a lot cleaner. I think currently its confusing to users 
to see the two apis mixed if they are trying to subclass.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

multifield search using dismax

2011-03-23 Thread Gastone Penzo

Hi,
is it possible, USING DISMAX SEARCH HANDLER, to make a search like:

search value1 in field1 & value 2 in field 2 &??

it's like q=field1:value1 field2:value2 in standard search, but i want to do
this in dismax

Thanx



-- 
Gastone Penzo

*www.solr-italia.it*
*The first italian blog about Apache Solr *

[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-23 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

here is my current state on this issue. I did't add all JDocs needed (by far) 
and I will wait until we settled on the API for FlushPolicy.

* I removed the complex TieredFlushPolicy entirely and added one 
DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT 
pending.
* DW will stall threads if we reach 2 x maxNetRam which is retrieved from 
FlushPolicy so folks can lower that depending on their env.

* DWFlushControl checks if a single DWPT grows too large and sets it forcefully 
pending once its ram consumption is > 1.9 GB. That should be enough buffer to 
not reach the 2048MB limit. We should consider making this configurable.

* FlushPolicy has now three methods onInsert, onUpdate and onDelete while 
DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base 
class just calls those on an update.

* I removed FlushControl from IW
* added documentation on IWC for FlushPolicy and removed the jdocs for the RAM 
limit. I think we should add some lines about how RAM is now used and that 
users should balance the RAM with the number of threads they are using. Will do 
that later on though.

* For testing I added a ThrottledIndexOutput that makes flushing slow so I can 
test if we are stalled and / or blocked. This is passed to 
MockDirectoryWrapper. Its currently under util but it rather should go under 
store, no?

* byte consumption is now committed before FlushPolicy is called since we don't 
have the multitier flush which required that to reliably proceed across tier 
boundaries (not required but it was easier to test really). So FP doesn't need 
to take care of the delta

* FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered 
delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which 
causes some failures though. I added //nocommit & @Ignore to those tests.

* this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy 
which I couldn't figure out why it is failing but it could be due to an old 
version of LUCENE-2881 on this branch. I will see if it still fails once we 
merged.

* Healthiness now doesn't stall if we are not flushing on RAM consumption to 
ensure we don't lock in threads. 


over all this seems much closer now. I will start writing jdocs. Flush on 
buffered delete terms might need some tests and I should also write a more 
reliable test for Healthiness... current it relies on that the 
ThrottledIndexOutput is slowing down indexing enough to block which might not 
be true all the time. It didn't fail yet. 



> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll



On Mar 23, 2011, at 9:20 AM, Dawid Weiss wrote:

> Sure, I'll change it. Can I alter branch_3x too?

That's fine to change 3_x, the 3.1 release is on lucene_solr_3_1 (or something 
similar).  This way it will be on in 3.2.

-Grant

> Don't know what the
> policy is after the RCs have been published.
> 
> Dawid
> 
> On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll  wrote:
>> Hey Dawid,
>> 
>> Thanks for doing this.  It would be good, too, if we no longer had to pass 
>> in -Dsolr.clustering.enabled=true as there is no reason why we can't just 
>> have it on like the other components.
>> 
>> -Grant
>> 
>> On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:
>> 
>>> Author: dweiss
>>> Date: Tue Mar 22 20:44:21 2011
>>> New Revision: 1084345
>>> 
>>> URL: http://svn.apache.org/viewvc?rev=1084345&view=rev
>>> Log:
>>> Removing the note about excluded JARs (everything is included).
>>> 
>>> Modified:
>>>lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
>>> 
>>> Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
>>> URL: 
>>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff
>>> ==
>>> --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
>>> +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
>>> 20:44:21 2011
>>> @@ -1183,12 +1183,10 @@
>>> 
>>>http://wiki.apache.org/solr/ClusteringComponent
>>> 
>>> -   This relies on third party jars which are notincluded in the
>>> -   release.  To use this component (and the "/clustering" handler)
>>> -   Those jars will need to be downloaded, and you'll need to set
>>> -   the solr.cluster.enabled system property when running solr...
>>> +   You'll need to set the solr.cluster.enabled system property
>>> +   when running solr to run with clustering enabled:
>>> 
>>> -  java -Dsolr.clustering.enabled=true -jar start.jar
>>> +   java -Dsolr.clustering.enabled=true -jar start.jar
>>> -->
>>>   >>enable="${solr.clustering.enabled:false}"
>>> 
>>> 
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-03-23 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010112#comment-13010112
 ] 

Tommaso Teofili edited comment on SOLR-2436 at 3/23/11 1:26 PM:


Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion the solution you're proposing is better than the current one as 
it reflects the Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.



  was (Author: teofili):
Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion this solution is better than the current one as it reflects the 
Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.


  
> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml

2011-03-23 Thread Tommaso Teofili (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-2436:
--

Attachment: SOLR-2436_2.patch

Hello Koji,
I've tested your patch, I needed to align it to latest patch applied (see 
SOLR-2387) to make tests work (see attached patch). 

In my opinion this solution is better than the current one as it reflects the 
Solr way of specifying parameters in Handlers.

However I think it should be good if it was possible to alternatively get rid 
of the uimaConfig file defining each parameter inside the Processor with Solr 
elements (str/lst/int etc.) as well.



> move uimaConfig to under the uima's update processor in solrconfig.xml
> --
>
> Key: SOLR-2436
> URL: https://issues.apache.org/jira/browse/SOLR-2436
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch
>
>
> Solr contrib UIMA has its config just beneath . I think it should 
> move to uima's update processor tag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2454) Nested Document query support

2011-03-23 Thread Mark Harwood (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010110#comment-13010110
 ] 

Mark Harwood commented on LUCENE-2454:
--

bq. I have not looked this patch so this comment may be off base.

The slideshare deck gives a good overview: 
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

As a simple Lucene-focused addition I'd prefer not to explore all the possible 
implications for Solr adoption here. The affected areas in Solr are extensive 
and would include schema definitions, query syntax, facets/filter caching, 
result-fetching, DIH etc etc. Probably best discussed elsewhere.



> Nested Document query support
> -
>
> Key: LUCENE-2454
> URL: https://issues.apache.org/jira/browse/LUCENE-2454
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 3.0.2
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Attachments: LuceneNestedDocumentSupport.zip
>
>
> A facility for querying nested documents in a Lucene index as outlined in 
> http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Dawid Weiss

Sure, I'll change it. Can I alter branch_3x too? Don't know what the
policy is after the RCs have been published.

Dawid

On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll  wrote:
> Hey Dawid,
>
> Thanks for doing this.  It would be good, too, if we no longer had to pass in 
> -Dsolr.clustering.enabled=true as there is no reason why we can't just have 
> it on like the other components.
>
> -Grant
>
> On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:
>
>> Author: dweiss
>> Date: Tue Mar 22 20:44:21 2011
>> New Revision: 1084345
>>
>> URL: http://svn.apache.org/viewvc?rev=1084345&view=rev
>> Log:
>> Removing the note about excluded JARs (everything is included).
>>
>> Modified:
>>    lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
>>
>> Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff
>> ==
>> --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
>> +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
>> 20:44:21 2011
>> @@ -1183,12 +1183,10 @@
>>
>>        http://wiki.apache.org/solr/ClusteringComponent
>>
>> -       This relies on third party jars which are notincluded in the
>> -       release.  To use this component (and the "/clustering" handler)
>> -       Those jars will need to be downloaded, and you'll need to set
>> -       the solr.cluster.enabled system property when running solr...
>> +       You'll need to set the solr.cluster.enabled system property
>> +       when running solr to run with clustering enabled:
>>
>> -          java -Dsolr.clustering.enabled=true -jar start.jar
>> +       java -Dsolr.clustering.enabled=true -jar start.jar
>>     -->
>>   >                    enable="${solr.clustering.enabled:false}"
>>
>>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-23 Thread Erik Hatcher

+1

  * Ran Solr example
  * Perused entire structure of both binary and source distros

Noticed the minor issues others have reported, to echo Ryan, none seem like 
blockers to me.

And also to echo Ryan's thanks huge thanks to everyone's hard work on the 
3.1 Lucene/Solr release(s).  This is a big milestone for the technology and 
community.

Erik

On Mar 22, 2011, at 23:42 , Ryan McKinley wrote:

> +1
> 
> * Walked through the solr example
> * Tested a simple maven project, worked well
> 
> I don't think the minor issues listed so far are blockers
> 
> Thanks to everyone who worked on this!
> 
> ryan
> 
> 
> On Tue, Mar 22, 2011 at 10:21 AM, Yonik Seeley
>  wrote:
>> Please vote to release the artifacts at
>> http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2
>> as Lucene 3.1 and Solr 3.1
>> 
>> Thanks for everyone's help pulling all this together!
>> 
>> -Yonik
>> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
>> 25-26, San Francisco
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-23 Thread Dawid Weiss (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-2967.
-

   Resolution: Won't Fix
Lucene Fields:   (was: [New])

I spent some time on this. It's quite fascinating: the number of collisions for 
the default probing is smaller than:

a) linear probing with murmurhash mix of the original hash
b) linear probing without murmurhash mix (start from raw hash only).

Curiously, the number of collisions for (b) is smaller than for (a) -- this 
could be explained if we assume bits are spread evently throughout the entire 
32-bit range after murmurhash, so after masking to table size there should be 
more collisions on lower bits compared to a raw hash (this would have more 
collisions on upper bits and fewer on lower bits because it is 
multiplicative... or at least I think so).

Anyway, I tried many different versions and I don't see any significant 
difference in favor of linear probing here. Measured the GC overhead during my 
tests too, but it is not the primary factor contributing to the total cost of 
constructing the FST (about 3-5% of the total time, running in parallel, 
typically).

> Use linear probing with an additional good bit avalanching function in FST's 
> NodeHash.
> --
>
> Key: LUCENE-2967
> URL: https://issues.apache.org/jira/browse/LUCENE-2967
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-2967.patch
>
>
> I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
> who suggested that linear probing, given a hash mixing function with good 
> avalanche properties, is a way better method of constructing lookups in 
> associative arrays compared to quadratic probing. Indeed, with linear probing 
> you can implement removals from a hash map without removed slot markers and 
> linear probing has nice properties with respect to modern CPUs (caches). I've 
> reimplemented HPPC's hash maps to use linear probing and we observed a nice 
> speedup (the same applies for fastutils of course).
> This patch changes NodeHash's implementation to use linear probing. The code 
> is a bit simpler (I think :). I also moved the load factor to a constant -- 
> 0.5 seems like a generous load factor, especially if we allow large FSTs to 
> be built. I don't see any significant speedup in constructing large automata, 
> but there is no slowdown either (I checked on one machine only for now, but 
> will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll

Hey Dawid,

Thanks for doing this.  It would be good, too, if we no longer had to pass in 
-Dsolr.clustering.enabled=true as there is no reason why we can't just have it 
on like the other components.

-Grant

On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:

> Author: dweiss
> Date: Tue Mar 22 20:44:21 2011
> New Revision: 1084345
> 
> URL: http://svn.apache.org/viewvc?rev=1084345&view=rev
> Log:
> Removing the note about excluded JARs (everything is included).
> 
> Modified:
>lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
> 
> Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff
> ==
> --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
> +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
> 20:44:21 2011
> @@ -1183,12 +1183,10 @@
> 
>http://wiki.apache.org/solr/ClusteringComponent
> 
> -   This relies on third party jars which are notincluded in the
> -   release.  To use this component (and the "/clustering" handler)
> -   Those jars will need to be downloaded, and you'll need to set
> -   the solr.cluster.enabled system property when running solr...
> +   You'll need to set the solr.cluster.enabled system property 
> +   when running solr to run with clustering enabled:
> 
> -  java -Dsolr.clustering.enabled=true -jar start.jar
> +   java -Dsolr.clustering.enabled=true -jar start.jar
> -->
>   enable="${solr.clustering.enabled:false}"
> 
> 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #70: POMs out of sync

2011-03-23 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-3.x/70/

No tests ran.

Build Log (for compile errors):
[...truncated 22 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing

2011-03-23 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2985:


Attachment: LUCENE-2985.patch

here is an initial patch that uses a SegmentCodecBuilder to assign codec IDs 
during indexing in DocFieldProcessorPerThread.

> Build SegmentCodecs incrementally for consistent codecIDs during indexing
> -
>
> Key: LUCENE-2985
> URL: https://issues.apache.org/jira/browse/LUCENE-2985
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Codecs, Index
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: CSF branch, 4.0
>
> Attachments: LUCENE-2985.patch
>
>
> currently we build the SegementCodecs during flush which is fine as long as 
> no codec needs to know which fields it should handle. This will change with 
> DocValues or when we expose StoredFields / TermVectors via Codec (see 
> LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a 
> consistent view of which codec belongs to which field during indexing and all 
> FieldInfo instances are unassigned (set to -1). Instead we should build the 
> SegmentCodecs incrementally as fields come in so no matter when a codec needs 
> to be selected to process a document / field we have the right codec ID 
> assigned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread Simon Willnauer

On Wed, Mar 23, 2011 at 9:37 AM, David Nemeskey
 wrote:
> Hey Simon and all,
>
> May we get an update on this? I understand that Google has published the list
> of accepted organizations, which -- not surprisingly -- includes the ASF. Is
> there any information on how many slots Apache got, and which issues will be
> selected?
>
> The student application period opens on the 28th, so I'm just wondering if I
> should go ahead and apply or wait for the decision.

David,

you should go ahead and apply via the GSoC website and reference the
issue there this is how I understand it works.
We will later rate the proposals from the GSoC website and decide
which we choose. This is also when slots get assigned.

simon
>
> Thanks,
> David
>
> On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote:
>> Hey folks,
>>
>> Google Summer of Code 2011 is very close and the Project Applications
>> Period has started recently. Now it's time to get some excited students
>> on board for this year's GSoC.
>>
>> I encourage students to submit an application to the Google Summer of Code
>> web-application. Lucene & Solr are amazing projects and GSoC is an
>> incredible opportunity to join the community and push the project
>> forward.
>>
>> If you are a student and you are interested spending some time on a
>> great open source project while getting paid for it, you should submit
>> your application from March 28 - April 8, 2011. There are only 3
>> weeks until this process starts!
>>
>> Quote from the GSoC website: "We hear almost universally from our
>> mentoring organizations that the best applications they receive are
>> from students who took the time to interact and discuss their ideas
>> before submitting an application, so make sure to check out each
>> organization's Ideas list to get to know a particular open source
>> organization better."
>>
>> So if you have any ideas what Lucene & Solr should have, or if you
>> find any of the GSoC pre-selected projects [1] interesting, please
>> join us on dev@lucene.apache.org [2].  Since you as a student must
>> apply for a certain project via the GSoC website [3], it's a good idea
>> to work on it ahead of time and include the community and possible
>> mentors as soon as possible.
>>
>> Open source development here at the Apache Software
>> Foundation happens almost exclusively in the public and I encourage you to
>> follow this. Don't mail folks privately; please use the mailing list to
>> get the best possible visibility and attract interested community
>> members and push your idea forward. As always, it's the idea that
>> counts not the person!
>>
>> That said, please do not underestimate the complexity of even small
>> "GSoC - Projects". Don't try to rewrite Lucene or Solr!  A project
>> usually gains more from a smaller, well discussed and carefully
>> crafted & tested feature than from a half baked monster change that's
>> too large to work with.
>>
>> Once your proposal has been accepted and you begin work, you should
>> give the community the opportunity to iterate with you.  We prefer
>> "progress over perfection" so don't hesitate to describe your overall
>> vision, but when the rubber meets the road let's take it in small
>> steps.  A code patch of 20 KB is likely to be reviewed very quickly so
>> get fast feedback, while a patch even 60kb in size can take very
>> - Hide quoted text -
>> long. So try to break up your vision and the community will work with
>> you to get things done!
>>
>> On behalf of the Lucene & Solr community,
>>
>> Go! join the mailing list and apply for GSoC 2011,
>>
>> Simon
>>
>> [1]
>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQu
>> ery=labels+%3D+lucene-gsoc-11 [2]
>> http://lucene.apache.org/java/docs/mailinglists.html
>> [3] http://www.google-melange.com
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing

2011-03-23 Thread Simon Willnauer (JIRA)

Build SegmentCodecs incrementally for consistent codecIDs during indexing
-

 Key: LUCENE-2985
 URL: https://issues.apache.org/jira/browse/LUCENE-2985
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Codecs, Index
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0


currently we build the SegementCodecs during flush which is fine as long as no 
codec needs to know which fields it should handle. This will change with 
DocValues or when we expose StoredFields / TermVectors via Codec (see 
LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a 
consistent view of which codec belongs to which field during indexing and all 
FieldInfo instances are unassigned (set to -1). Instead we should build the 
SegmentCodecs incrementally as fields come in so no matter when a codec needs 
to be selected to process a document / field we have the right codec ID 
assigned.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2982) Get rid of ContenSource's workaround for closing b/gzip input stream once this is fixed in CommonCompress

2011-03-23 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010086#comment-13010086
 ] 

Doron Cohen commented on LUCENE-2982:
-

COMPRESS-127 was fixed, so whenever a new CommonsCompress release is available 
should be able to complete this one.
I subscribed to annou...@apache.org to be notified when that happens...

> Get rid of ContenSource's workaround for closing b/gzip input stream once 
> this is fixed in CommonCompress
> -
>
> Key: LUCENE-2982
> URL: https://issues.apache.org/jira/browse/LUCENE-2982
> Project: Lucene - Java
>  Issue Type: Task
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Priority: Minor
>
> Once COMPRESS-127 is fixed get rid of the entire workaround method 
> ContentSource.closableCompressorInputStream(). It would simplify the code and 
> would perform better without that delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-2980.
-

   Resolution: Fixed
Lucene Fields:   (was: [New])

Committed:
- trunk: r1084544, r1084549
- 3x: r1084552

> Benchmark's ContentSource should not rely on file suffixes to be lower cased 
> when detecting file type (gzip/bzip2/text)
> ---
>
> Key: LUCENE-2980
> URL: https://issues.apache.org/jira/browse/LUCENE-2980
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch
>
>
> file.gz is correctly handled as gzip, but file.GZ handled as text which is 
> wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2980:


Attachment: LUCENE-2980.patch

Updated patch applies workaround only for GZIP format, as other types do close 
their wrapped stream (COMPRESS-127).

> Benchmark's ContentSource should not rely on file suffixes to be lower cased 
> when detecting file type (gzip/bzip2/text)
> ---
>
> Key: LUCENE-2980
> URL: https://issues.apache.org/jira/browse/LUCENE-2980
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch
>
>
> file.gz is correctly handled as gzip, but file.GZ handled as text which is 
> wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010064#comment-13010064
 ] 

Shai Erera commented on LUCENE-2980:


Agreed.

> Benchmark's ContentSource should not rely on file suffixes to be lower cased 
> when detecting file type (gzip/bzip2/text)
> ---
>
> Key: LUCENE-2980
> URL: https://issues.apache.org/jira/browse/LUCENE-2980
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2980.patch, LUCENE-2980.patch
>
>
> file.gz is correctly handled as gzip, but file.GZ handled as text which is 
> wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-23 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2977:


Summary: WriteLineDocTask should write gzip/bzip2/txt according to the 
extension of specified output file name  (was: WriteLineDocTask should write 
gzip/bzip2/txt according to the extension of specifie output file name)

> WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
> specified output file name
> -
>
> Key: LUCENE-2977
> URL: https://issues.apache.org/jira/browse/LUCENE-2977
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
>
> Since the readers behave this way it would be nice and handy if also this 
> line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010043#comment-13010043
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Changed my mind about adding this test to ContentSourceTest - I think such a 
test fits more to the CommonCompress project, because it should directly call 
CompressorStreamFactory.createCompressorInputStream(in). In our test we invoke 
ContentSource.getInputStream(File) and so we cannot pass such a close-sensing 
stream. 

But this is a valid point, especially, the test case I provided to COMPRESS-127 
will fail on Windows but will likely pass on Linux. I'll add a reference to 
your comment in COMPRESS-127.

> Benchmark's ContentSource should not rely on file suffixes to be lower cased 
> when detecting file type (gzip/bzip2/text)
> ---
>
> Key: LUCENE-2980
> URL: https://issues.apache.org/jira/browse/LUCENE-2980
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2980.patch, LUCENE-2980.patch
>
>
> file.gz is correctly handled as gzip, but file.GZ handled as text which is 
> wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010039#comment-13010039
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Agree, I'll add one.

> Benchmark's ContentSource should not rely on file suffixes to be lower cased 
> when detecting file type (gzip/bzip2/text)
> ---
>
> Key: LUCENE-2980
> URL: https://issues.apache.org/jira/browse/LUCENE-2980
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2980.patch, LUCENE-2980.patch
>
>
> file.gz is correctly handled as gzip, but file.GZ handled as text which is 
> wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2984) Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos

2011-03-23 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2984:


Description: Spin-off from LUCENE-2881 which had this change already but 
due to some random failures related to this change I remove this part of the 
patch to make it more isolated and easier to test.   (was: Spin-off from 
LUCENe-2881 which had this change already but due to some random failures 
related to this change I remove this part of the patch to make it more isolated 
and easier to test. )

> Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos 
> --
>
> Key: LUCENE-2984
> URL: https://issues.apache.org/jira/browse/LUCENE-2984
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
> Fix For: 4.0
>
>
> Spin-off from LUCENE-2881 which had this change already but due to some 
> random failures related to this change I remove this part of the patch to 
> make it more isolated and easier to test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010036#comment-13010036
 ] 

Chris Male commented on LUCENE-2310:


bq. So, what is the reason for doing this in 3.x at all, can't we simply drop 
stuff in 4.0 and let 3.x alone?

Very good question.  Certainly we are simplifying the codebase and I feel that 
Field is what most users use (not AbstractField).  But I know some expert users 
do use AbstractField.  But maybe they can handle the hard change?

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010035#comment-13010035
 ] 

Simon Willnauer commented on LUCENE-2310:
-

{quote}
Yeah but not in 3x unfortunately. As it stands people can retrieve the List of 
Fieldables via getFields() and add whatever implementation of Fieldable they 
like. Consequently we need to continue to support Fieldable in IW for example. 
Once this code has been committed I will create a new patch for trunk which 
moves all of Solr and Lucene over to the Field. I could do this in many places 
already of course, but that core classes like IW would have to remain as they 
are.
{quote}

So, what is the reason for doing this in 3.x at all, can't we simply drop stuff 
in 4.0 and let 3.x alone?

Simon

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2984) Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos

2011-03-23 Thread Simon Willnauer (JIRA)

Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos 
--

 Key: LUCENE-2984
 URL: https://issues.apache.org/jira/browse/LUCENE-2984
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


Spin-off from LUCENe-2881 which had this change already but due to some random 
failures related to this change I remove this part of the patch to make it more 
isolated and easier to test. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010032#comment-13010032
 ] 

Chris Male commented on LUCENE-2310:


Yes Field would still compile if you removed the extends.  However if we empty 
AbstractField then any client code that also extends AbstractField would break. 
 Thats why I deprecate the whole class but leave its code in.  We could empty 
it and change it to extend Field, I think that would still work.

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2983) FieldInfos should be read-only if loaded from disk

2011-03-23 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2983:


Attachment: LUCENE-2983.patch

here is a patch with tests. All tests pass

> FieldInfos should be read-only if loaded from disk
> --
>
> Key: LUCENE-2983
> URL: https://issues.apache.org/jira/browse/LUCENE-2983
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2983.patch
>
>
> Currently FieldInfos create a private FieldNumberBiMap when they are loaded 
> from a directory which is necessary due to some limitation we need to face 
> with IW#addIndexes(Dir). If we add an index via a directory to an existing 
> index field number can conflict with the global field numbers in the IW 
> receiving the directories. Those field number conflicts will remain until 
> those segments are merged and we stabilize again based on the IW global field 
> numbers. Yet, we unnecessarily creating a BiMap here where we actually should 
> enforce read-only semantics since nobody should modify this FieldInfos 
> instance we loaded from the directory. If somebody needs to get a modifiable 
> copy they should simply create a new one and all all FieldInfo instances to 
> it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010030#comment-13010030
 ] 

Simon Willnauer commented on LUCENE-2310:
-

bq. I don't really understand what you're suggesting here. In 3x where the 
deprecations will be occurring Field has to continue to extend AbstractField. 
Yes in 4.0 we can drop that extension but addressing the deprecations is not in 
the scope of 3x.

What I mean here is that if I would simply remove the extends AbstractField 
from Field would it still compile or are there any dependencies from 
AbstractField? IMO AbstractField should just be empty now right?

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2983) FieldInfos should be read-only if loaded from disk

2011-03-23 Thread Simon Willnauer (JIRA)

FieldInfos should be read-only if loaded from disk
--

 Key: LUCENE-2983
 URL: https://issues.apache.org/jira/browse/LUCENE-2983
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0


Currently FieldInfos create a private FieldNumberBiMap when they are loaded 
from a directory which is necessary due to some limitation we need to face with 
IW#addIndexes(Dir). If we add an index via a directory to an existing index 
field number can conflict with the global field numbers in the IW receiving the 
directories. Those field number conflicts will remain until those segments are 
merged and we stabilize again based on the IW global field numbers. Yet, we 
unnecessarily creating a BiMap here where we actually should enforce read-only 
semantics since nobody should modify this FieldInfos instance we loaded from 
the directory. If somebody needs to get a modifiable copy they should simply 
create a new one and all all FieldInfo instances to it.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]

2011-03-23 Thread David Nemeskey

Hey Simon and all,

May we get an update on this? I understand that Google has published the list 
of accepted organizations, which -- not surprisingly -- includes the ASF. Is 
there any information on how many slots Apache got, and which issues will be 
selected?

The student application period opens on the 28th, so I'm just wondering if I 
should go ahead and apply or wait for the decision.

Thanks,
David

On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote:
> Hey folks,
> 
> Google Summer of Code 2011 is very close and the Project Applications
> Period has started recently. Now it's time to get some excited students
> on board for this year's GSoC.
> 
> I encourage students to submit an application to the Google Summer of Code
> web-application. Lucene & Solr are amazing projects and GSoC is an
> incredible opportunity to join the community and push the project
> forward.
> 
> If you are a student and you are interested spending some time on a
> great open source project while getting paid for it, you should submit
> your application from March 28 - April 8, 2011. There are only 3
> weeks until this process starts!
> 
> Quote from the GSoC website: "We hear almost universally from our
> mentoring organizations that the best applications they receive are
> from students who took the time to interact and discuss their ideas
> before submitting an application, so make sure to check out each
> organization's Ideas list to get to know a particular open source
> organization better."
> 
> So if you have any ideas what Lucene & Solr should have, or if you
> find any of the GSoC pre-selected projects [1] interesting, please
> join us on dev@lucene.apache.org [2].  Since you as a student must
> apply for a certain project via the GSoC website [3], it's a good idea
> to work on it ahead of time and include the community and possible
> mentors as soon as possible.
> 
> Open source development here at the Apache Software
> Foundation happens almost exclusively in the public and I encourage you to
> follow this. Don't mail folks privately; please use the mailing list to
> get the best possible visibility and attract interested community
> members and push your idea forward. As always, it's the idea that
> counts not the person!
> 
> That said, please do not underestimate the complexity of even small
> "GSoC - Projects". Don't try to rewrite Lucene or Solr!  A project
> usually gains more from a smaller, well discussed and carefully
> crafted & tested feature than from a half baked monster change that's
> too large to work with.
> 
> Once your proposal has been accepted and you begin work, you should
> give the community the opportunity to iterate with you.  We prefer
> "progress over perfection" so don't hesitate to describe your overall
> vision, but when the rubber meets the road let's take it in small
> steps.  A code patch of 20 KB is likely to be reviewed very quickly so
> get fast feedback, while a patch even 60kb in size can take very
> - Hide quoted text -
> long. So try to break up your vision and the community will work with
> you to get things done!
> 
> On behalf of the Lucene & Solr community,
> 
> Go! join the mailing list and apply for GSoC 2011,
> 
> Simon
> 
> [1]
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQu
> ery=labels+%3D+lucene-gsoc-11 [2]
> http://lucene.apache.org/java/docs/mailinglists.html
> [3] http://www.google-melange.com
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010027#comment-13010027
 ] 

Chris Male commented on LUCENE-2310:


Thanks for taking a look at this Simon.

bq. Why do you reformat all the stuff in Field, is that necessary here at all? 
I mean its needed eventually but for the deprecation of things it only bloats 
the patch really doesn't it?

Because for me this issue is about reducing the complexity of these classes and 
Field is a mess.  Making it more readable reduces the complexity.  If needs be 
I will do this in two patches, but I don't feel this issue is resolved till the 
code in Field is readable.

bq. When you deprecate AbstractField and Fieldable, Field should ideally be a 
standalone class. So I see that this still needs to subclass Fieldable / 
AbstractField but could it stand alone now so that we can simply remove the 
extends / implements on Field once we drop things in 4.0? I think it looks good 
from looking at the patch though

I don't really understand what you're suggesting here.  In 3x where the 
deprecations will be occurring Field has to continue to extend AbstractField.  
Yes in 4.0 we can drop that extension but addressing the deprecations is not in 
the scope of 3x.

bq. I don't like the name getAllFields on Document since it implies that we 
have a getPartialFields or something. I see that you can not use getFields 
since it only differs in return type which doesn't belong to the signature 
though. Maybe we should implement Iterable here and offer an additional 
method getFieldsAsList or maybe getFields(List fields)

Yeah good call.  I think implementing Iterable is best, but it will also 
require adding a count() method to Document since often people retrieve the 
List to get the number of fields.

bq. once we have this in what are the next steps towards FieldType? Will we 
have only one class Field that is backed by a FieldType but still offers the 
methods it has now? Or doe we have two totally new classes FieldTyps and 
FieldValue

Once FieldType is in, all the various metadata properties (isIndexed, isStored 
etc) will be moved to FieldType, leaving Field as what you suggest as 
FieldValue.  Field will contain its type, boost, name, value.  If we have 
Analyzers on FieldTypes, then we will be able to remove the TokenStream from 
Field.

bq. I wonder if this patch raises tons of deprecation warnings all over lucene 
where Fieldable was used? In IW we use it all over the place though. We must 
fix that in this issue too otherwise uwe will go mad I guess

Yeah but not in 3x unfortunately.  As it stands people can retrieve the List of 
Fieldables via getFields() and add whatever implementation of Fieldable they 
like.  Consequently we need to continue to support Fieldable in IW for example. 
 Once this code has been committed I will create a new patch for trunk which 
moves all of Solr and Lucene over to the Field.  I could do this in many places 
already of course, but that core classes like IW would have to remain as they 
are.

I will wait for your thoughts on the reformating and then make a new patch.



> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2011-03-23 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010022#comment-13010022
 ] 

Simon Willnauer commented on LUCENE-2310:
-

Hey Chris,

good that you reactivate this issue! I was looking into similar stuff while 
working on docvalues since it really needs to add stuff to Field / Fieldable. 
With a cleanup and eventually FieldType this would be way less painless I 
guess. I have a couple of questions and comments to the current patch. 
Btw. I like the fact that the previous patch was uploaded March 21 2010 and the 
latest took 1 year to come up on march 23 2011 :)

* Why do you reformat all the stuff in Field, is that necessary here at all? I 
mean its needed eventually but for the deprecation of things it only bloats the 
patch really doesn't it?

* When you deprecate AbstractField and Fieldable, Field should ideally be a 
standalone class. So I see that this still needs to subclass Fieldable / 
AbstractField but could it stand alone now so that we can simply remove the 
extends / implements on Field once we drop things in 4.0? I think it looks good 
from looking at the patch though

* I don't like the name getAllFields on Document since it implies that we have 
a getPartialFields or something. I see that you can not use getFields since it 
only differs in return type which doesn't belong to the signature though. Maybe 
we should implement Iterable here and offer an additional method 
getFieldsAsList or maybe getFields(List fields)

* once we have this in what are the next steps towards FieldType? Will we have 
only one class Field that is backed by a FieldType but still offers the methods 
it has now? Or doe we have two totally new classes FieldTyps and FieldValue, 
something like this:
{code} 
class FieldValue {
  FieldType type;
  float boost;
  String name;
  Object value;
}
{code}

* I wonder if this patch raises tons of deprecation warnings all over lucene 
where Fieldable was used? In IW we use it all over the place though. We must 
fix that in this issue too otherwise uwe will go mad I guess :)

thanks for bringing this up again!

> Reduce Fieldable, AbstractField and Field complexity
> 
>
> Key: LUCENE-2310
> URL: https://issues.apache.org/jira/browse/LUCENE-2310
> Project: Lucene - Java
>  Issue Type: Sub-task
>  Components: Index
>Reporter: Chris Male
> Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-AbstractField.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, 
> LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch
>
>
> In order to move field type like functionality into its own class, we really 
> need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
> Currently AbstractField depends on Field, and does not provide much more 
> functionality that storing fields, most of which are being moved over to 
> FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
> possible Fieldable), moving much of the functionality into Field and 
> FieldType.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

59 matches

Mail list logo