[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap

2010-12-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967121#action_12967121
 ] 

Uwe Schindler commented on LUCENE-2235:
---

To come back to the original issue:

bq. Should this be checking that a field is indeed analyzed before calling 
getOffsetGap ?

In my opinion this should be done (and so this issue would disappear). Can you 
open another issue requesting this check and link it to this one?

One problem coming from not checking for analyzed is this:
You add a field indexed and it gets analyzed by PFAW - After that you add the 
same field name stored-only (which is perfectly legal and often used, e.g. when 
the stored value is binary or in some other format and does not correspond to 
the indexed text), the positionIncrement is increased. After that you again add 
another instance of the same field as indexed-only, which also increases 
posIncr. So you have 2 times the gap between both indexed sub-fields. This is 
definitely wrong.

 implement PerFieldAnalyzerWrapper.getOffsetGap
 --

 Key: LUCENE-2235
 URL: https://issues.apache.org/jira/browse/LUCENE-2235
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.0
 Environment: Any
Reporter: Javier Godoy
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9.4, 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch


 PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), 
 instead it returns the default values from the implementation of Analyzer. 
 (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement 
 getPositionIncrementGap)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1395) Integrate Katta

2010-12-06 Thread JohnWu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967125#action_12967125
 ] 

JohnWu commented on SOLR-1395:
--

Tomliu:
so in proxy , not in sub-proxy, katta startNode need add the class 
org.apache.solr.katta.DeployableSolrKattaServer ?

in katta's lib, there are too many differents: solr, lucene, zookeeper, 
the worst is lucene!

can you give me a mailbox? I can contect you directly (mine is 
pangla...@gmail.com).

   now, in workqueue of katta, NodeInteraction 135 row:
   T result = (T) _method.invoke(proxy, _args);
   
   proxy is a IPC of hadoop, it can not find the pc-slavo2:2,
   I lose some config in hadoop? or I need patch Hadoop with your 
https://issues.apache.org/jira/browse/HADOOP-7017?

   please reply, thanks!

JohnWu

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, 
 katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
 solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, 
 solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
 solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, 
 SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile

2010-12-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967127#action_12967127
 ] 

Shai Erera commented on LUCENE-2790:


bq. How Lucene manages the index files is under-the-hood so we are free to 
change it.

That's correct. However, sadly, the backwards tests do not agree with you :). 
Because the runtime behavior has changed, the tests fail. If you try to call 
LMP.setNoCFSRation, you get a NoSuchMethodError because the tests are compiled 
against 3.0's source, where indeed it does not exist.

I'm trying to resolve it by fetching the method using reflection, but this 
shows another problem w/ how we maintain the backwards tests.

 IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
 ---

 Key: LUCENE-2790
 URL: https://issues.apache.org/jira/browse/LUCENE-2790
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch


 Spin off from here: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/112311.
 I will attach a patch shortly that addresses the issue on trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: API Semantics and Backwards

2010-12-06 Thread Shai Erera
I've hit another backwards tests problem.

Over in LUCENE-2790 we've changed LogMergePolicy.useCompoundFile's behavior
to factor in the newly added noCFSRatio. After some discussion we decided
that even though it breaks back-compat's runtime behavior, it's ok in this
case because how Lucene manages the internal representation of segments
(compound or not) is up to it. And you can override it by disabling the
CFSRatio setting.

And indeed some tests failed (backwards as well as core stream) and the way
to fix them was to force CFS creation. However, on backwards this is not
doable because the tests are compiled against 3.0's source, where
setNoCFSRatio does not exist on LogMergePolicy, even though we agree that
this change is allowed back-compat wise.

I ended up fixing it by querying for the method using reflection and the
tests now pass.

Now, regardless of this change (whether it's ok or not), I think this shows
another problem with how we maintain backwards tests. Internal changes like
this, especially for @experimental / @internal classes are allowed, but we
need to revert to reflection hacks to resolve the tests.

So either we delete the offending tests, because like Uwe says - they
duplicate the test efforts, or we maintain a source for backwards.

I personally am in favor of removing all non backwards tests, and keep
those that do actually test backwards behavior. But I know the opinions are
divided here.

Shai

On Wed, Dec 1, 2010 at 4:48 PM, Shai Erera ser...@gmail.com wrote:

 While I'm not against going back towards a checkout backwards that we can
 modify, I wonder if all the tests there should be there and how much do we
 actually duplicate.

 Lucene 3x should include all of 3.0 tests + new ones that test new
 functionality, or assert bug fixes etc. There shouldn't be a test in 3.0
 that does not exist in 3x, unless the missing test/feature was an agreed
 upon backwards break.

 So I think it would be really nice if backwards tested exactly what it
 should. For example, testing index format backcompat is done twice today, in
 test-core and test-backwards, while it should only be run by backwards.
 There are a bunch of test classes I've created once that impl/extend
 'search' related classes, for back-compat compilation only. They should also
 be run in backwards only.

 The downside of this is that maintenance is going to be difficult - it's
 much easier to copy tests over to backwards, then decide which ones should
 go there and which shouldn't. Also, adding new API requires a matching
 backwards test etc. Not non doable, but difficult - requires discipline.

 Shai


 On Tue, Nov 30, 2010 at 2:02 PM, Robert Muir rcm...@gmail.com wrote:

 On Tue, Nov 30, 2010 at 4:47 AM, Shai Erera ser...@gmail.com wrote:
 
  Like you said, the rest of the tests just increase the test running
 time.
 

 I'm not completely sure about this: do we always switch over our tests
 to do the equivalent checks both against the new API and the old API
 when we make API changes? There could be bugs in our 'backwards
 handling' that are actually logic bugs that the new tests dont detect.

 So I'm a little concerned about only running pure simplistic API
 tests in backwards.

 On the other hand, I'm really worried about what Shai brings up here:
 we are doing some refactoring of the tests system and there is more
 shared code at the moment: similar to MockRAMDirectory.
 Because we worry about preventing things like index corruption, its my
 opinion we need things like MockRAMDirectory, and they should be able
 to break all the rules/etc (use pkg-private APIS) if we can prevent
 bugs.
 Just look at our trunk or 3.x tests and imagine them as backwards
 tests... these sort of utilities like RandomIndexWriter will be more
 fragile to internal/experimental/pkg-private changes, but as mentioned
 above i think these are good to have in backwards tests.

 So, I think at the moment I'm leaning towards the idea of going back
 towards a checkout that we can modify, in combination with us all
 soliciting more reviews / longer time for any backports to stable
 branches that require backwards tests modifications?  I understand
 Uwe's point too, its dangerous to modify the code and seems to defeat
 the purpose of backwards, but I think this is going to be a more
 serious problem after releasing 3.1!

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





SearchBlox is now FREE. No limitations!

2010-12-06 Thread Lucene
SearchBlox is pleased to announce the availability of SearchBlox Search 
Software as a completely FREE product. The product is now available with no 
limitations in terms of number of documents indexed and no restrictions in 
product functionality. SearchBlox will support the free product with a number 
of new paid support packages and free forum-based support. 

SearchBlox is an Enterprise Search Server built on top of Apache Lucene and 
includes:

- Integrated crawlers for HTTP/HTTPS, filesystems and feeds
- Web based Admin Console to configure and manage upto 250 indexes
- REST API 
- Multilingual support to index content in 37 languages
- Packaged for deployment to Linux/Unix, Windows, Mac OS X and Amazon Web 
Services(AWS)

Since 2003, SearchBlox has been continually enhanced leveraging new features in 
Apache Lucene and has been deployed by more than 300 customers in 30 countries.

The product can be downloaded from www.searchblox.com
  
Best regards,

The SearchBlox Team
www.searchblox.com
http://twitter.com/search_software
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1384 - Failure

2010-12-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1384/

All tests passed

Build Log (for compile errors):
[...truncated 18318 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2801) getOffsetGap should not be called for non-anaylyzed fields

2010-12-06 Thread Nick Pellow (JIRA)
getOffsetGap should not be called for non-anaylyzed fields
--

 Key: LUCENE-2801
 URL: https://issues.apache.org/jira/browse/LUCENE-2801
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.0.3
Reporter: Nick Pellow


from: LUCENE-2235

Since Lucene 3.0.3, when a PerFieldAnalyzerWrapper is constructed with a null 
defaultAnalyzer it will NPE when DocInverterPerField calls:
{code}
 fieldState.offset += docState.analyzer.getOffsetGap(field);
{code}
This block should first check that the field is analyzed, or the javadoc on 
PerFieldAnalyzerWrapper could mention that a null defaultAnalyzer is disallowed.

Also, the main reason for checking for isAnalyzed, from Uwe Schindler in 
LUCENE-2235
{quote}
One problem coming from not checking for analyzed is this:
You add a field indexed and it gets analyzed by PFAW - After that you add the 
same field name stored-only (which is perfectly legal and often used, e.g. when 
the stored value is binary or in some other format and does not correspond to 
the indexed text), the positionIncrement is increased. After that you again add 
another instance of the same field as indexed-only, which also increases 
posIncr. So you have 2 times the gap between both indexed sub-fields. This is 
definitely wrong.

{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap

2010-12-06 Thread Nick Pellow (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967148#action_12967148
 ] 

Nick Pellow commented on LUCENE-2235:
-

Thanks for the clarification, Uwe. I wasn't sure if null Analyzers were meant 
to be accepted or not. I was upgrading some existing code from 3.0.2 to 3.0.3 
and stumbled across that, so its good to know.

I've created LUCENE-2801 to track the real reason the check should be done too!

 implement PerFieldAnalyzerWrapper.getOffsetGap
 --

 Key: LUCENE-2235
 URL: https://issues.apache.org/jira/browse/LUCENE-2235
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.0
 Environment: Any
Reporter: Javier Godoy
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9.4, 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch


 PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), 
 instead it returns the default values from the implementation of Analyzer. 
 (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement 
 getPositionIncrementGap)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2235) implement PerFieldAnalyzerWrapper.getOffsetGap

2010-12-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967151#action_12967151
 ] 

Uwe Schindler commented on LUCENE-2235:
---

Thanks, Nick!

 implement PerFieldAnalyzerWrapper.getOffsetGap
 --

 Key: LUCENE-2235
 URL: https://issues.apache.org/jira/browse/LUCENE-2235
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.0
 Environment: Any
Reporter: Javier Godoy
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9.4, 3.0.3, 3.1, 4.0

 Attachments: LUCENE-2235.patch, PerFieldAnalyzerWrapper.patch


 PerFieldAnalyzerWrapper does not delegates calls to getOffsetGap(Fieldable), 
 instead it returns the default values from the implementation of Analyzer. 
 (Similar to LUCENE-659 PerFieldAnalyzerWrapper fails to implement 
 getPositionIncrementGap)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile

2010-12-06 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2790:
---

Attachment: LUCENE-2790-3x.patch

Backport to 3x. Note the reflection hack I had to use to make the backwards 
tests run. I don't commit yet - waiting for some response about the backwards 
tests. If you're ok with it, I'll commit.

 IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
 ---

 Key: LUCENE-2790
 URL: https://issues.apache.org/jira/browse/LUCENE-2790
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch


 Spin off from here: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/112311.
 I will attach a patch shortly that addresses the issue on trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile

2010-12-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967183#action_12967183
 ] 

Uwe Schindler commented on LUCENE-2790:
---

I would supply disable the tests. Reflection should only be used when mock 
classes are used that affect thousands of tests. There are already lots of 
tests disabled.

 IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
 ---

 Key: LUCENE-2790
 URL: https://issues.apache.org/jira/browse/LUCENE-2790
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch


 Spin off from here: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/112311.
 I will attach a patch shortly that addresses the issue on trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2267) Using query function in bf parameter in the DisMaxQParser forces the use of parameter dereferencing

2010-12-06 Thread Uri Boness (JIRA)
Using query function in bf parameter in the DisMaxQParser forces the use of 
parameter dereferencing
---

 Key: SOLR-2267
 URL: https://issues.apache.org/jira/browse/SOLR-2267
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Uri Boness
 Fix For: 3.1


The DisMaxQParser parses the bf parameter using the 
{{SolrPluginUtils.parseFieldBoosts(...)}} function. This function tokenizes the 
string based on whitespaces and then bulilds a map mapping fields to their 
boost values. Unfortunately, the the *{!...}* form of a query contains 
whitespaces and therefore the parsing of the boost function fails. 

This should be considered as a bug as effectively it forces the use of 
parameter dereferencing which in many cases is not ideal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1395) Integrate Katta

2010-12-06 Thread Eric Pugh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967184#action_12967184
 ] 

Eric Pugh commented on SOLR-1395:
-

Tom, John,

Just wanted to comment that having your conversation on this ticket in public 
has been great!  I am a couple steps behind you, having started up Katta, and 
started Solr with the patch, but not having success on searching.

My current error is that Solr can't find the katta.zk.properties file, where 
did you put it so it would be found on the class path?

Eric


 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, 
 katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
 solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, 
 solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
 solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, 
 SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Simon Willnauer (JIRA)
DirectoryReader ignores NRT SegmentInfos in #isOptimized()
--

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer


DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
DirectoryReader#isOptimized(). This can return true even if the actual realtime 
reader sees more than one segments. 

{code}
public boolean isOptimized() {
ensureOpen();
   // if segmentsInfos changes in IW this can return false positive
return segmentInfos.size() == 1  !hasDeletions();
  }
{code}

DirectoryReader should check if this reader has a non-nul segmentInfosStart and 
use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile

2010-12-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967208#action_12967208
 ] 

Shai Erera commented on LUCENE-2790:


I don't mind disabling the tests, but I think we should discuss the bigger 
issue (on that thread on the mailing list). If we decide to make it a 'policy' 
to disable backwards tests that break due to legal changes to the API and 
behavior, let's at least reach a consensus.

 IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
 ---

 Key: LUCENE-2790
 URL: https://issues.apache.org/jira/browse/LUCENE-2790
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch


 Spin off from here: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/112311.
 I will attach a patch shortly that addresses the issue on trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Changes Mess

2010-12-06 Thread Mattmann, Chris A (388J)

 CHANGES file:
 LUCENE-2658: Exceptions while processing term vectors enabled for
 multiple fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
 
 JIRA description:
 LUCENE-2658: TestIndexWriterExceptions random failure: AIOOBE in
 ByteBlockPool.allocSlice
 
 So you see the story, i hit a random test failure and just opened an
 issue describing that the test randomly failed.
 Mike then went and fixed it and wrote up a CHANGES.txt entry thats
 significantly better to the users.
 
 In order for us to use JIRA here, we would have to do a lot of
 JIRA-editing and re-organizing I think, and probably create a lot of
 unnecessary issues.

What's the difference between Mike going and writing up a more informative 
CHANGES.txt entry than say updating JIRA with the information from that entry 
to have a more descriptive title?

Also, besides issue titles, there is also a way to capture anyone that's been 
involved in the JIRA cycle (comment, issue created, etc.) as part of the 
contribution report that's probably *even more* inclusive than what you guys 
are currently doing.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Changes Mess

2010-12-06 Thread Mattmann, Chris A (388J)

 Would you mind naming these Apache projects?  I'd like to take a look.

Tika, Nutch, OODT.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

2010-12-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967210#action_12967210
 ] 

Shai Erera commented on LUCENE-2471:


bq. I think the problem actually wasn't interrupting but some sort of race 
condition? 

Could be, I don't remember the exact details.

I totally agree with you, though it's like a hen and egg situation - we 
cannot develop anything safe until we have good threaded unit tests, and we can 
never know we have those until we have any implementation that might break. So 
I personally don't mind if we pursue implementation of FileChannel copying, in 
NIOFSDirectory only, and then investigate the current threaded indexing/search 
tests and add some if we think something's missing. But currently we're in sort 
of a limbo :).

Anyway, I don't think it's related to that issue and can be handled in a 
separate issue. If you agree, and assuming nothing more should be done here, we 
can close this one.

 Supporting bulk copies in Directory
 ---

 Key: LUCENE-2471
 URL: https://issues.apache.org/jira/browse/LUCENE-2471
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Earwin Burrfoot
 Fix For: 3.1, 4.0


 A method can be added to IndexOutput that accepts IndexInput, and writes 
 bytes using it as a source.
 This should be used for bulk-merge cases (offhand - norms, docstores?). Some 
 Directories can then override default impl and skip intermediate buffers 
 (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967211#action_12967211
 ] 

Jan Høydahl commented on SOLR-1979:
---

@Grant: I dropped the outputField setting and a number of other settings

There should be a way to output the language for the whole document to some 
field as some applications need to filter on language.

I like making most things configurable, but with good defaults which fits most 
needs. The default could be to detect a document wide langauge from all input 
fields and output this to a language_s field, unless you specify params 
docLangInputFields=f1,f2.. and docLangOutputField=nn. Likewise make it easy to 
disable field renaming.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Changes Mess

2010-12-06 Thread Robert Muir
On Mon, Dec 6, 2010 at 9:56 AM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:

 What's the difference between Mike going and writing up a more informative 
 CHANGES.txt entry than say updating JIRA with the information from that entry 
 to have a more descriptive title?


Well, you are right, but its another modification to JIRA (an edit).

And then there are more examples like this:
CHANGES:
* LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent
accessing an unmapped buffer if the input is closed
JIRA:
* LUCENE-2650: improve windows defaults in FSDirectory

The jira is *CORRECT*. While working on the issue i discovered we
could trivially add some extra safety. So i backported the extra
safety to all branches.
In this case i would have to split my patch in half and create another
JIRA issue for this very trivial change?

Just saying, to do what you are saying (by the way, I'm not opposed to
the idea!), we would have to change the way we use JIRA and increase
noise to the mailing list.

There are quite a few examples like this: e.g. this jira release
notes say this: [LUCENE-2055] - Fix buggy stemmers and Remove
duplicate analysis functionality. But i certainly didn't do this in a
bugfix release!

what actually happened is in contrib/CHANGES.txt:
* LUCENE-2055: Add documentation noting that the Dutch and French
stemmers in contrib/analyzers do not implement the Snowball algorithm
correctly, and recommend to use the equivalents in contrib/snowball if
possible.

So I don't know how jira would handle this case? because we merged
contrib/snowball with contrib/analyzers in 3.1 i would have to create
a separate jira issue just so that 3.1 has the correct
description/path name in its release notes? and in 4.0 i'd have to
create a third duplicate JIRA issue because we merged all the
analyzers, so there it needs to refer to modules/analysis?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967213#action_12967213
 ] 

Michael McCandless commented on LUCENE-2802:


Nice catch Simon!  This is also a thread safety issue since IR should not touch 
the writer's segmentInfos outside of sync(IW).

 DirectoryReader ignores NRT SegmentInfos in #isOptimized()
 --

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Attachments: LUCENE-2802.patch


 DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
 DirectoryReader#isOptimized(). This can return true even if the actual 
 realtime reader sees more than one segments. 
 {code}
 public boolean isOptimized() {
 ensureOpen();
// if segmentsInfos changes in IW this can return false positive
 return segmentInfos.size() == 1  !hasDeletions();
   }
 {code}
 DirectoryReader should check if this reader has a non-nul segmentInfosStart 
 and use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-06 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967214#action_12967214
 ] 

Grant Ingersoll commented on SOLR-1979:
---

bq. There should be a way to output the language for the whole document to some 
field as some applications need to filter on language.

There is.  It's the langField.

bq. Can't we validate the output mapping (and log it!) at initialization time?

To some extent, but users can also pass it in.  

bq. We should not be using 639-1 codes in any APIs!!!

I'll update the patch.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-06 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1979:
--

Attachment: SOLR-1979.patch

Removes mentions of ISO 639.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Changes Mess

2010-12-06 Thread Grant Ingersoll

On Dec 6, 2010, at 9:58 AM, Mattmann, Chris A (388J) wrote:

 
 Would you mind naming these Apache projects?  I'd like to take a look.
 
 Tika, Nutch, OODT.

Add in Mahout.  I believe Hadoop does too.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2803) FieldCache should not pay attention to deleted docs when creating entries

2010-12-06 Thread Yonik Seeley (JIRA)
FieldCache should not pay attention to deleted docs when creating entries
-

 Key: LUCENE-2803
 URL: https://issues.apache.org/jira/browse/LUCENE-2803
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley


The FieldCache uses a key that ignores deleted docs, so it's actually a bug to 
use deleted docs when creating an entry.  It can lead to incorrect values when 
the same entry is used with a different reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2763) Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer

2010-12-06 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2763:


Attachment: LUCENE-2763.patch

Updated patch to fix {{solr/CHANGES.txt}}, {{lucene/CHANGES.txt}}, and 
{{analysis/standard/READ_BEFORE_REGENERATING.txt}}.

I will commit later today if there are no objections.

 Swap URL+Email recognizing StandardTokenizer and UAX29Tokenizer
 ---

 Key: LUCENE-2763
 URL: https://issues.apache.org/jira/browse/LUCENE-2763
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2763.patch, LUCENE-2763.patch


 Currently, in addition to implementing the UAX#29 word boundary rules, 
 StandardTokenizer recognizes email adresses and URLs, but doesn't provide a 
 way to turn this behavior off and/or provide overlapping tokens with the 
 components (username from email address, hostname from URL, etc.).
 UAX29Tokenizer should become StandardTokenizer, and current StandardTokenizer 
 should be renamed to something like UAX29TokenizerPlusPlus (or something like 
 that).
 For rationale, see [the discussion at the reopened 
 LUCENE-2167|https://issues.apache.org/jira/browse/LUCENE-2167?focusedCommentId=12929325page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12929325].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Changes Mess

2010-12-06 Thread Grant Ingersoll

On Dec 5, 2010, at 12:18 PM, Robert Muir wrote:

 On Sun, Dec 5, 2010 at 12:08 PM, Mattmann, Chris A (388J)
 chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Mark,
 
 RE: the credit system. JIRA provides a contribution report here, like this 
 one that I generated for Lucene 3.1:
 
 
 My concern with this is that it leaves out important email contributors.

I think we probably miss these as is too.

br/

Note, however, in my proposal, one can still call out specific things.  We 
could for instance have a Contributors section and just add names to it.  I 
just think we put too much minutiae in CHANGES and it is a real burden to deal 
with it across branches b/c there are always massive conflicts and it requires 
you to look up every last change to recall which version it is in.   IMO, JIRA 
should be the system of record for all bug discussions.  Discussions that 
happen on email can easily be pointed to using any one of our many mail archive 
systems.

Our new Changes could be structured like below.  The important thing about this 
approach is that it can all more or less be written at release time other than 
the contributor list and perhaps the back compat section.

/snip
= Version X.Y  =

Brief Intro

== Dependencies ==
Junit 4.4

== New Features ==

* Magic search was implemented

== Backward Compatibility Breaks ==

* Blah, blah, blah

== Significant Changes ==

* We've replaced the inverted index with a giant array

== Contributors ==
(alphabetical order)
Joe Schmoe (optionally cite an issue number)
Jane Doe

Optionally paste in the list from JIRA

== Full Changes List ==

* LINK TO JIRA
/snip


-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967227#action_12967227
 ] 

Simon Willnauer commented on LUCENE-2802:
-

bq. Nice catch Simon! This is also a thread safety issue since IR should not 
touch the writer's segmentInfos outside of sync(IW).
it seem like there is more about all that in DR - we should really only use the 
uncloned SegmentInfos if we are not in NRT mode #getVersion uses it too which 
is wrong.
I actually rely on the isOptimized in several tests and run into a NPE due to 
that though so we should really fix DR to use a private SegmentInfos or 
restrict the uncloned one for the isCurrent comparison



 DirectoryReader ignores NRT SegmentInfos in #isOptimized()
 --

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Attachments: LUCENE-2802.patch


 DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
 DirectoryReader#isOptimized(). This can return true even if the actual 
 realtime reader sees more than one segments. 
 {code}
 public boolean isOptimized() {
 ensureOpen();
// if segmentsInfos changes in IW this can return false positive
 return segmentInfos.size() == 1  !hasDeletions();
   }
 {code}
 DirectoryReader should check if this reader has a non-nul segmentInfosStart 
 and use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2804) check all tests that use FSDirectory.open

2010-12-06 Thread Robert Muir (JIRA)
check all tests that use FSDirectory.open
-

 Key: LUCENE-2804
 URL: https://issues.apache.org/jira/browse/LUCENE-2804
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir


In LUCENE-2471 we were discussing the copyBytes issue, and Shai and I had a 
discussion about how we could prevent such bugs in the future.

One thing that lead to the bug existing in our code for so long, was that it 
only happened on windows (e.g. never failed in hudson!)
This was because the bug only happened if you were copying from 
SimpleFSDirectory, and the test used FSDirectory.open

Today the situation is improving: most tests use newDirectory() which is random 
by default and never use FSDir.open,
it always uses SimpleFS or NIOFS so that the same random seed will reproduce 
across both windows and unix.

So I think we need to review all uses of FSDirectory.open in our tests, and 
minimize these.
In general tests should use newDirectory().
If the test comes with say a zip-file and wants to explicitly open stuff from 
disk, I think it should open the contents with say SimpleFSDir,
and then call newDirectory(Directory) to copy into a new random 
implementation for actual testing. This method already exists:
{noformat}
  /**
   * Returns a new Dictionary instance, with contents copied from the
   * provided directory. See {...@link #newDirectory()} for more
   * information.
   */
  public static MockDirectoryWrapper newDirectory(Directory d) throws 
IOException {
{noformat}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2790) IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile

2010-12-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12967183#action_12967183
 ] 

Uwe Schindler edited comment on LUCENE-2790 at 12/6/10 10:54 AM:
-

I would simply disable the tests. Reflection should only be used when mock 
classes are used that affect thousands of tests. There are already lots of 
tests disabled.

  was (Author: thetaphi):
I would supply disable the tests. Reflection should only be used when mock 
classes are used that affect thousands of tests. There are already lots of 
tests disabled.
  
 IndexWriter should call MP.useCompoundFile and not LogMP.getUseCompoundFile
 ---

 Key: LUCENE-2790
 URL: https://issues.apache.org/jira/browse/LUCENE-2790
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2790-3x.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, LUCENE-2790.patch, 
 LUCENE-2790.patch, LUCENE-2790.patch


 Spin off from here: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/112311.
 I will attach a patch shortly that addresses the issue on trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENENET-383) System.IO.IOException: read past EOF while deleting the file from upload folder of filemanager.

2010-12-06 Thread Digy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968383#action_12968383
 ] 

Digy commented on LUCENENET-383:


Hi Chaitanya,

Even tought it seems to be a Lucene.Net bug, I don't think that you will find 
anyone willing to fix that 4 years old version.
It is probably fixed in 2.9.2

DIGY



 System.IO.IOException: read past EOF while deleting the file from upload 
 folder of filemanager.
 ---

 Key: LUCENENET-383
 URL: https://issues.apache.org/jira/browse/LUCENENET-383
 Project: Lucene.Net
  Issue Type: Bug
 Environment: production
Reporter: chaitanya

 We are getting System.IO.IOException: read past EOF when deleting the file 
 from upload folder of filemanager.It used to work fine earlier.But from fast 
 few days we are getting this error.
 We are using episerver content management system and episerver inturn uses 
 Lucene for indexing.
 Please find the following stack trace of the error.Help me inorder to 
 overcome this error.Thanks in advance
 [IOException: read past EOF]
Lucene.Net.Store.BufferedIndexInput.Refill() +233
Lucene.Net.Store.BufferedIndexInput.ReadByte() +21
Lucene.Net.Store.IndexInput.ReadInt() +13
Lucene.Net.Index.SegmentInfos.Read(Directory directory) +60
Lucene.Net.Index.AnonymousClassWith.DoBody() +45
Lucene.Net.Store.With.Run() +67
Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean 
 closeDirectory) +110
Lucene.Net.Index.IndexReader.Open(String path) +65

 EPiServer.Web.Hosting.Versioning.Store.FileOperations.DeleteItemIdFromIndex(String
  filePath, Object fileId) +78
EPiServer.Web.Hosting.Versioning.Store.FileOperations.DeleteFile(Object 
 dirId, Object fileId) +118
EPiServer.Web.Hosting.Versioning.VersioningFileHandler.Delete() +28
EPiServer.Web.Hosting.VersioningFile.Delete() +118
EPiServer.UI.Hosting.UploadFile.ConfirmReplaceButton_Click(Object sender, 
 EventArgs e) +578
EPiServer.UI.WebControls.ToolButton.OnClick(EventArgs e) +107
EPiServer.UI.WebControls.ToolButton.RaisePostBackEvent(String 
 eventArgument) +135
System.Web.UI.Page.RaisePostBackEvent(IPostBackEventHandler sourceControl, 
 String eventArgument) +13
System.Web.UI.Page.RaisePostBackEvent(NameValueCollection postData) +36
System.Web.UI.Page.ProcessRequestMain(Boolean 
 includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +1565

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
Hey Yonik.
Thanks for clarifying.
The reason I went rolling my own way - I asked previously is there's any
plan to back-port the field collapse to solr 1.4 and
I understood that its not at all straight forward.

If you think it'll be fairly easy to look at the new code in Solr 4.0 trunk
and use that as basis for example I'd go ahead and do that.

Q - does the field collapse componet expect the field to collapse on to be
stored? or does it also try to use field cache trickery?

Thanks,
Adam

On Mon, Dec 6, 2010 at 9:42 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Sun, Dec 5, 2010 at 6:12 PM, Adam H. jimmoe...@gmail.com wrote:
  StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader,
  collapseField);
 
  where 'reader' is the instance of the SolrIndexReader passed along to the
  component with the ResponseBuilder.SolrQueryRequest object.
 
  As I understand, this can double memory usage due to (re)loading this
  fieldcache on a reader-wide basis rather than on a per segment basis?

 Yep.  Sorting and function queries use per-segment FieldCache entries.
 So If you also request a FieldCache from the top level reader, it
 won't reuse the per-segment caches and hence will take up 2x memory
 over just using per-segment.

 Solr's field collapsing already works on a per-segment basis... if
 your needs are at all general, it could make sense to try and get it
 rolled into solr rather than implementing custom code.

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Yonik Seeley
On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote:
 Hey Yonik.
 Thanks for clarifying.
 The reason I went rolling my own way - I asked previously is there's any
 plan to back-port the field collapse to solr 1.4 and
 I understood that its not at all straight forward.

Ahhh... I'd just use trunk if possible ;-)

The risks to being in production on custom code that no one else uses
is perhaps greater than running on a widely used development version.

But yes... I don't see a backport happening for 1.4

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
Fair enough - I might give it a shot if most functionality is compatible to
solr 1.4.1 to your mind? and is fairly stable?

One last Q regarding correct usage of per-segment FieldCache in Solr
components -

since this is something I might also have issues with elsewhere, and I
suspect other people who work on custom logic as well,
i think it might be useful to have some documentation and/or a simple
programmatic interface for implementing
correct access path to these inside a custom SolrComponent.

I looked around the Grouping code abit and have yet to fully understand
whats going on, but is the ValueSource supposed to take care of access to
underlying field?

On Mon, Dec 6, 2010 at 12:34 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Dec 6, 2010 at 3:24 PM, Adam H. jimmoe...@gmail.com wrote:
  Hey Yonik.
  Thanks for clarifying.
  The reason I went rolling my own way - I asked previously is there's any
  plan to back-port the field collapse to solr 1.4 and
  I understood that its not at all straight forward.

 Ahhh... I'd just use trunk if possible ;-)

 The risks to being in production on custom code that no one else uses
 is perhaps greater than running on a widely used development version.

 But yes... I don't see a backport happening for 1.4

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Yonik Seeley
On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote:
 Fair enough - I might give it a shot if most functionality is compatible to
 solr 1.4.1 to your mind? and is fairly stable?

Yes, the external APIs are very compatible.
The internal APIs - not so much.
You should reindex also.

 One last Q regarding correct usage of per-segment FieldCache in Solr
 components -

 since this is something I might also have issues with elsewhere, and I
 suspect other people who work on custom logic as well,
 i think it might be useful to have some documentation and/or a simple
 programmatic interface for implementing
 correct access path to these inside a custom SolrComponent.

 I looked around the Grouping code abit and have yet to fully understand
 whats going on, but is the ValueSource supposed to take care of access to
 underlying field?

Yes - you can actually group on arbitrary function queries even.
That will be more useful when we add some bucketing functions.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2186) First cut at column-stride fields (index values storage)

2010-12-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968416#action_12968416
 ] 

Simon Willnauer commented on LUCENE-2186:
-

bq. Whew... this interface is more expansive than I thought it would be (but I 
guess it's really many issues rolled into one... like sorting, caching, etc).
sorry about that :)

bq. So it seems like DocValuesEnum is the traditional lowest level read the 
index, and Source is a cached version of that?
Not quiet DocValuesEnum is an iterator based access to the DocValues which does 
not load everything to memory while Source is a entirely Ram-Resident offering 
random access to values similar to field cache. Yet, you can also obtain a 
DocValuesEnum from a Source since its already in memory. 

bq. A higher level question I have is why we're not reusing the FieldCache for 
caching/sorting?
You mean as a replacement for Source? - For caching what we did in here is to 
leave it to the user to do the caching or cache based on Source instance how 
would that relate to FieldCache in your opinion?


 First cut at column-stride fields (index values storage)
 

 Key: LUCENE-2186
 URL: https://issues.apache.org/jira/browse/LUCENE-2186
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: CSF branch, 4.0

 Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, 
 LUCENE-2186.patch, LUCENE-2186.patch, mem.py


 I created an initial basic impl for storing index values (ie
 column-stride value storage).  This is still a work in progress... but
 the approach looks compelling.  I'm posting my current status/patch
 here to get feedback/iterate, etc.
 The code is standalone now, and lives under new package
 oal.index.values (plus some util changes, refactorings) -- I have yet
 to integrate into Lucene so eg you can mark that a given Field's value
 should be stored into the index values, sorting will use these values
 instead of field cache, etc.
 It handles 3 types of values:
   * Six variants of byte[] per doc, all combinations of fixed vs
 variable length, and stored either straight (good for eg a
 title field), deref (good when many docs share the same value,
 but you won't do any sorting) or sorted.
   * Integers (variable bit precision used as necessary, ie this can
 store byte/short/int/long, and all precisions in between)
   * Floats (4 or 8 byte precision)
 String fields are stored as the UTF8 byte[].  This patch adds a
 BytesRef, which does the same thing as flex's TermRef (we should merge
 them).
 This patch also adds basic initial impl of PackedInts (LUCENE-1990);
 we can swap that out if/when we get a better impl.
 This storage is dense (like field cache), so it's appropriate when the
 field occurs in all/most docs.  It's just like field cache, except the
 reading API is a get() method invocation, per document.
 Next step is to do basic integration with Lucene, and then compare
 sort performance of this vs field cache.
 For the sort by String value case, I think RAM usage  GC load of
 this index values API should be much better than field caache, since
 it does not create object per document (instead shares big long[] and
 byte[] across all docs), and because the values are stored in RAM as
 their UTF8 bytes.
 There are abstract Writer/Reader classes.  The current reader impls
 are entirely RAM resident (like field cache), but the API is (I think)
 agnostic, ie, one could make an MMAP impl instead.
 I think this is the first baby step towards LUCENE-1231.  Ie, it
 cannot yet update values, and the reading API is fully random-access
 by docID (like field cache), not like a posting list, though I
 do think we should add an iterator() api (to return flex's DocsEnum)
 -- eg I think this would be a good way to track avg doc/field length
 for BM25/lnu.ltc scoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968418#action_12968418
 ] 

Simon Willnauer commented on LUCENE-2802:
-

I changed DirectoryReader to use the cloned version of SegmentInfos instead of 
two of them inconsistently. The only failure I get is on TestIndexWriterReader 
line 105 
{code}
r1.close();
writer.close();
assertTrue(r2.isCurrent());
{code}

where the writer is closed and afterwards it checks if the r2 reader still is 
the current one which failes since the writer.close() method changes the 
version of the SegmentInfos. In my opinion this is actually the semantics I 
would expect from #isCurrent(), the question is if we would want to change the 
semantics to return false from #isCurrent if the writer we used to obtain the 
reader from is closed.

I think we should consider it for consistency and simplicity though. 

 DirectoryReader ignores NRT SegmentInfos in #isOptimized()
 --

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Attachments: LUCENE-2802.patch


 DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
 DirectoryReader#isOptimized(). This can return true even if the actual 
 realtime reader sees more than one segments. 
 {code}
 public boolean isOptimized() {
 ensureOpen();
// if segmentsInfos changes in IW this can return false positive
 return segmentInfos.size() == 1  !hasDeletions();
   }
 {code}
 DirectoryReader should check if this reader has a non-nul segmentInfosStart 
 and use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Ryan McKinley
On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote:
 Fair enough - I might give it a shot if most functionality is compatible to
 solr 1.4.1 to your mind? and is fairly stable?

 Yes, the external APIs are very compatible.
 The internal APIs - not so much.
 You should reindex also.

And not be (too) surprised if things change before the official 4.x
release -- the chances are good that something will change that may
require reindexing.

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2186) DataImportHandler multi-threaded option throws exception

2010-12-06 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968439#action_12968439
 ] 

Grant Ingersoll commented on SOLR-2186:
---

Lance, can you update this patch and add a unit test?

 DataImportHandler multi-threaded option throws exception
 

 Key: SOLR-2186
 URL: https://issues.apache.org/jira/browse/SOLR-2186
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Reporter: Lance Norskog
Assignee: Grant Ingersoll
 Attachments: TikaResolver.patch


 The multi-threaded option for the DataImportHandler throws an exception and 
 the entire operation fails. This is true even if only 1 thread is configured 
 via *threads='1'*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968445#action_12968445
 ] 

Yonik Seeley commented on SOLR-1979:


bq. In skimming the current patch, it looks like fields get mapped no matter 
what. What if I just want the language detected and added as another field, but 
no field mapping desired?

Yeah, that's sort of in line with my:
bq. And just because you can detect a language doesn't mean you know how to 
handle it differently... so also have an optional catchall that handles all 
languages not specifically mapped.

So for all unmapped languages, you may want to map to a single generic field, 
or not map at all (leave field as is).
I guess it also depends on the general strategy... if you are detecting 
language on the body field, are we using a copyField type approach and only 
storing the body field while indexing as body_enText, or are we moving the 
field from body to body_enText?

bq. Also, if there are multiple input fields, the current patch would create 
multiple language field values requiring that field to be multi-valued. Is the 
goal here to identify a single language for a document?

I could see both making sense.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-12-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968451#action_12968451
 ] 

Yonik Seeley commented on LUCENE-2649:
--

For the sort-missing-last type of functionality, the current comparator code 
looks like this (see IntComparator for more context):
{code}
final int v2 = (checkMissing  !cached.valid.get(doc)) 
   ? missingValue : cached.values[doc];
{code}
And I was thinking of changing it to this:
{code}
int v2 = cached.values[doc];
if (valid != null  v2==0  !valid.get(doc))
  v2 = missingValue;
{code}

This should make the common case faster by both eliminating an unneeded 
variable (checkMissing)
and checking that the value is the Java default value before checking the 
bitset.

Thoughts?

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-06 Thread Burton-West, Tom
Lucene has this method to set the maximum size of a segment when merging: 
LogByteSizeMergePolicy.setMaxMergeMB   
(http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29
 )

I would like to be able to set this in my solrconfig.xml.  Is this possible?  
If not should I open a JIRA issue or is there some gotcha I am unaware of?

Tom

Tom Burton-West



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-12-06 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968458#action_12968458
 ] 

Ryan McKinley commented on LUCENE-2649:
---

looks good to me

bq. we instantiate vals.values lazily for some reason... and then at the end, 
if it still hasn't been instantiated, we do it anyway?

I don't know about this, I just copied from the existing code...  

We could make the case where Bits.MatchNoBits( maxDoc ), have a null array.  
This would make your proposed change invalid though since it checks the array 
first.


bq.  I'm still trying to grok the logic of calling checkMatchAllBits only if 
vals.valid == null... seems like it will always return null in that case?

The assumption is that once vals.valid is set, it should not be recalculated.

The reasons for the if vals.valie == null in the validate function are:
 - the vals.valid Bits may have been set in fillXXValues
 - the first call may have excluded checkMatchAllBits, and  a subsequet call 
has it set

Are you asking about in the validate function?  If so, fillXXXValues can set 
the vals.valid, so it does not do it again.  


 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
So,
summing up all the information i now have, and the fact I have some
additional custom components that use fieldcache,
such that the specific answer for field collapsing by migrating to solr 4.0
is not a complete solution to my problems,

it seems to me more and more like I might have to actually implement a
custom solr QueryComponent, whereby I will pass it
multiple collectors (perhaps via some kind of MultiCollector interface,
similar to Grouping uses) which will do their appropriate field value
collection/aggregation
as results are being fetched.

In other words, using a per-segment fieldcache collection as a
post-processing step (e.g after QueryComponent did its collection) does not
seem at all trivial, if at all possible ( is it possible? )
Is this accurate?

Thanks again for all the info here..

Adam

On Mon, Dec 6, 2010 at 1:48 PM, Ryan McKinley ryan...@gmail.com wrote:

 On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:
  On Mon, Dec 6, 2010 at 3:41 PM, Adam H. jimmoe...@gmail.com wrote:
  Fair enough - I might give it a shot if most functionality is compatible
 to
  solr 1.4.1 to your mind? and is fairly stable?
 
  Yes, the external APIs are very compatible.
  The internal APIs - not so much.
  You should reindex also.

 And not be (too) surprised if things change before the official 4.x
 release -- the chances are good that something will change that may
 require reindexing.

 ryan

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Yonik Seeley
On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote:
 In other words, using a per-segment fieldcache collection as a
 post-processing step (e.g after QueryComponent did its collection) does not
 seem at all trivial, if at all possible ( is it possible? )

Sure, it's possible, and not too hard (as long as no sort field involves score).
Just instruct the QueryComponent to retrieve the set of all matching
documents, then you can use that to run then through whatever
collectors you want again.  I've been meaning to implement this
optimization to field collapsing...

Depending on the details, either replacing the QueryComponent with
your custom one, or inserting an additional component after the query
component could make sense.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2803) FieldCache should not pay attention to deleted docs when creating entries

2010-12-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2803:
-

Attachment: LUCENE-2803.patch

Here's the patch... pretty simple, so I plan on committing shortly.

 FieldCache should not pay attention to deleted docs when creating entries
 -

 Key: LUCENE-2803
 URL: https://issues.apache.org/jira/browse/LUCENE-2803
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
 Attachments: LUCENE-2803.patch


 The FieldCache uses a key that ignores deleted docs, so it's actually a bug 
 to use deleted docs when creating an entry.  It can lead to incorrect values 
 when the same entry is used with a different reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
ah! so just so I can get cracking on this - Can you be alittle more
specific? e.g

in my component implementation that runs in the request handling after the
normal QueryComponent,
How would I access the specific field value for the documents that were
retrieved?

i.e how would it fit in a code like this if at all:

// docList is the matching documents for given offset/rows/query
DocIterator it = docList.iterator();

while (it.hasNext()) {
docId = it.next();
score = it.score();


// this would've worked if this was stored field:
// reader.document(docId).get(fieldName)
??
}



On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote:
  In other words, using a per-segment fieldcache collection as a
  post-processing step (e.g after QueryComponent did its collection) does
 not
  seem at all trivial, if at all possible ( is it possible? )

 Sure, it's possible, and not too hard (as long as no sort field involves
 score).
 Just instruct the QueryComponent to retrieve the set of all matching
 documents, then you can use that to run then through whatever
 collectors you want again.  I've been meaning to implement this
 optimization to field collapsing...

 Depending on the details, either replacing the QueryComponent with
 your custom one, or inserting an additional component after the query
 component could make sense.

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] Commented: (LUCENE-2803) FieldCache should not pay attention to deleted docs when creating entries

2010-12-06 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968488#action_12968488
 ] 

Ryan McKinley commented on LUCENE-2803:
---

if checkMatchAllBits always has a null first parameter, should we just take it 
out?

 FieldCache should not pay attention to deleted docs when creating entries
 -

 Key: LUCENE-2803
 URL: https://issues.apache.org/jira/browse/LUCENE-2803
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Yonik Seeley
 Attachments: LUCENE-2803.patch


 The FieldCache uses a key that ignores deleted docs, so it's actually a bug 
 to use deleted docs when creating an entry.  It can lead to incorrect values 
 when the same entry is used with a different reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-2802:
---

Assignee: Simon Willnauer

 DirectoryReader ignores NRT SegmentInfos in #isOptimized()
 --

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2802.patch


 DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
 DirectoryReader#isOptimized(). This can return true even if the actual 
 realtime reader sees more than one segments. 
 {code}
 public boolean isOptimized() {
 ensureOpen();
// if segmentsInfos changes in IW this can return false positive
 return segmentInfos.size() == 1  !hasDeletions();
   }
 {code}
 DirectoryReader should check if this reader has a non-nul segmentInfosStart 
 and use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2802:


Attachment: LUCENE-2802.patch

here is a patch that removes the mutable state from DirectoryReader in the NRT 
case. The actual reason IMO why this has been introduced was that the RT reader 
returns true from #isCurrent() if the wirter was closed which is actually wrong 
since closing a writer changes the index and the reader should see that change.

I also added a testcase for is current to check the semantics

 DirectoryReader ignores NRT SegmentInfos in #isOptimized()
 --

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2802.patch, LUCENE-2802.patch


 DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
 DirectoryReader#isOptimized(). This can return true even if the actual 
 realtime reader sees more than one segments. 
 {code}
 public boolean isOptimized() {
 ensureOpen();
// if segmentsInfos changes in IW this can return false positive
 return segmentInfos.size() == 1  !hasDeletions();
   }
 {code}
 DirectoryReader should check if this reader has a non-nul segmentInfosStart 
 and use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Changes Mess

2010-12-06 Thread Koji Sekiguchi

This is out of thread, but I realized that some entries for DIH are in
solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt
(Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt
have 1.5-dev title. These should be 4.0-dev or 3.1-dev.

I'll open a ticket.

Koji
--
http://www.rondhuit.com/en/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2802:


Affects Version/s: 3.1

we need to backport to 3.x too

 DirectoryReader ignores NRT SegmentInfos in #isOptimized()
 --

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.1, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2802.patch, LUCENE-2802.patch


 DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
 DirectoryReader#isOptimized(). This can return true even if the actual 
 realtime reader sees more than one segments. 
 {code}
 public boolean isOptimized() {
 ensureOpen();
// if segmentsInfos changes in IW this can return false positive
 return segmentInfos.size() == 1  !hasDeletions();
   }
 {code}
 DirectoryReader should check if this reader has a non-nul segmentInfosStart 
 and use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2802) DirectoryReader ignores NRT SegmentInfos in #isOptimized()

2010-12-06 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968503#action_12968503
 ] 

Earwin Burrfoot commented on LUCENE-2802:
-

Patch looks cool.

 DirectoryReader ignores NRT SegmentInfos in #isOptimized()
 --

 Key: LUCENE-2802
 URL: https://issues.apache.org/jira/browse/LUCENE-2802
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.1, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2802.patch, LUCENE-2802.patch


 DirectoryReader  only takes shared (with IW) SegmentInfos into account in 
 DirectoryReader#isOptimized(). This can return true even if the actual 
 realtime reader sees more than one segments. 
 {code}
 public boolean isOptimized() {
 ensureOpen();
// if segmentsInfos changes in IW this can return false positive
 return segmentInfos.size() == 1  !hasDeletions();
   }
 {code}
 DirectoryReader should check if this reader has a non-nul segmentInfosStart 
 and use that instead

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: FieldCache usage for custom field collapse in solr 1.4

2010-12-06 Thread Adam H.
One more comment/question -
Having looked at the Solr stats panel, I do not see detailed memory usage
for the field i'm collapsing on in the lucene FieldCache entries listings.

As I understand ( after having looked through this ticket:
https://issues.apache.org/jira/browse/SOLR-1292 ), this means that its not
an 'insanity' instance,
and so actually I am not using double the memory, but rather only have this
field in the FieldCache on the whole index level.

This got me thinking - If i'm not using any segment-level fieldcaching for
this field, there's no reason not to use an index-wide one,
as long as I can guarantee thats the only use case for this field in the
fieldcache.. is this correct?

Thanks again for helping me out with this delicate subject :)

Adam

On Mon, Dec 6, 2010 at 3:21 PM, Adam H. jimmoe...@gmail.com wrote:

 ah! so just so I can get cracking on this - Can you be alittle more
 specific? e.g

 in my component implementation that runs in the request handling after the
 normal QueryComponent,
 How would I access the specific field value for the documents that were
 retrieved?

 i.e how would it fit in a code like this if at all:

 // docList is the matching documents for given offset/rows/query
 DocIterator it = docList.iterator();

 while (it.hasNext()) {
 docId = it.next();
 score = it.score();


 // this would've worked if this was stored field:
 // reader.document(docId).get(fieldName)
 ??

 }



 On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley 
 yo...@lucidimagination.comwrote:

 On Mon, Dec 6, 2010 at 5:48 PM, Adam H. jimmoe...@gmail.com wrote:
  In other words, using a per-segment fieldcache collection as a
  post-processing step (e.g after QueryComponent did its collection) does
 not
  seem at all trivial, if at all possible ( is it possible? )

 Sure, it's possible, and not too hard (as long as no sort field involves
 score).
 Just instruct the QueryComponent to retrieve the set of all matching
 documents, then you can use that to run then through whatever
 collectors you want again.  I've been meaning to implement this
 optimization to field collapsing...

 Depending on the details, either replacing the QueryComponent with
 your custom one, or inserting an additional component after the query
 component could make sense.

 -Yonik
 http://www.lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





[jira] Created: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit

2010-12-06 Thread Michael McCandless (JIRA)
SegmentInfos shouldn't blindly increment version on commit
--

 Key: LUCENE-2805
 URL: https://issues.apache.org/jira/browse/LUCENE-2805
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
 Fix For: 3.1, 4.0


SegmentInfos currently increments version on the assumption that there are 
always changes.

But, both DirReader and IW are more careful about tracking whether there are 
changes.  DirReader has hasChanges and IW has changeCount.  I think these 
classes should notify the SIS when there are in fact changes; this will fix the 
case Simon hit on fixing LUCENE-2082 when the NRT reader thought there were 
changes, but in fact there weren't because IW simply committed the exact SIS it 
already had.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2269) contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt

2010-12-06 Thread Koji Sekiguchi (JIRA)
contrib entries in solr/CHANGES.txt should go solr/contrib/*/CHANGES.txt


 Key: SOLR-2269
 URL: https://issues.apache.org/jira/browse/SOLR-2269
 Project: Solr
  Issue Type: Task
  Components: contrib - Clustering, contrib - DataImportHandler, 
contrib - Solr Cell (Tika extraction)
Affects Versions: 3.1, 4.0
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0


http://www.lucidimagination.com/search/document/b8c19488a691265c/changes_mess

{quote}
I realized that some entries for DIH are in
solr/CHANGES.txt. These should go solr/contrib/dataimporthandler/CHANGES.txt
(Some of them are my fault). I also found that solr/contrib/*/CHANGES.txt
have 1.5-dev title. These should be 4.0-dev or 3.1-dev.
{quote}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit

2010-12-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968521#action_12968521
 ] 

Michael McCandless commented on LUCENE-2805:


Duh, make that LUCENE-2802.

 SegmentInfos shouldn't blindly increment version on commit
 --

 Key: LUCENE-2805
 URL: https://issues.apache.org/jira/browse/LUCENE-2805
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2805.patch


 SegmentInfos currently increments version on the assumption that there are 
 always changes.
 But, both DirReader and IW are more careful about tracking whether there are 
 changes.  DirReader has hasChanges and IW has changeCount.  I think these 
 classes should notify the SIS when there are in fact changes; this will fix 
 the case Simon hit on fixing LUCENE-2082 when the NRT reader thought there 
 were changes, but in fact there weren't because IW simply committed the exact 
 SIS it already had.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit

2010-12-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2805:
---

Attachment: LUCENE-2805.patch

Attached first cut patch, just moving the .version++ responsibility into 
DirReader/IW.

But I haven't verified if it fixes the case in LUCENE-2802.

 SegmentInfos shouldn't blindly increment version on commit
 --

 Key: LUCENE-2805
 URL: https://issues.apache.org/jira/browse/LUCENE-2805
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2805.patch


 SegmentInfos currently increments version on the assumption that there are 
 always changes.
 But, both DirReader and IW are more careful about tracking whether there are 
 changes.  DirReader has hasChanges and IW has changeCount.  I think these 
 classes should notify the SIS when there are in fact changes; this will fix 
 the case Simon hit on fixing LUCENE-2082 when the NRT reader thought there 
 were changes, but in fact there weren't because IW simply committed the exact 
 SIS it already had.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2010-12-06 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968528#action_12968528
 ] 

Grant Ingersoll commented on SOLR-1979:
---

bq. So for all unmapped languages, you may want to map to a single generic 
field, or not map at all (leave field as is).

It currently leaves it in the original field.

bq. Also, if there are multiple input fields, the current patch would create 
multiple language field values requiring that field to be multi-valued. Is the 
goal here to identify a single language for a document? Or a separate language 
value for each of the input fields (which seems odd to me)?

Current patch requires multivalued language field.  I figure the main thing you 
want the lang. field for is faceting and filtering, but it can be changed.  As 
for the broader goal, I think it makes sense to detect languages per field and 
not per document.  In other words, you can have multiple languages in a single 
document.

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch


 We need the ability to detect language of some random text in order to act 
 upon it, such as indexing the content into language aware fields. Another 
 usecase is to be able to filter/facet on language on random unstructured 
 content.
 To do this, we wrap the Tika LanguageIdentifier in an UpdateProcessor. The 
 processor is configurable like this:
 {code:xml} 
   processor 
 class=org.apache.solr.update.processor.LanguageIdentifierUpdateProcessorFactory
 str name=inputFieldsname,subject/str
 str name=outputFieldlanguage_s/str
 str name=idFieldid/str
 str name=fallbacken/str
   /processor
 {code} 
 It will then read the text from inputFields name and subject, perform 
 language identification and output the ISO code for the detected language in 
 the outputField. If no language was detected, fallback language is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1721) Add explicit option to run DataImportHandler in synchronous mode

2010-12-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1721:
-

Affects Version/s: 1.3
   1.4
Fix Version/s: 4.0
   3.1

 Add explicit option to run DataImportHandler in synchronous mode
 

 Key: SOLR-1721
 URL: https://issues.apache.org/jira/browse/SOLR-1721
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
Reporter: Alexey Serba
Assignee: Noble Paul
Priority: Trivial
 Fix For: 3.1, 4.0

 Attachments: SOLR-1721.patch


 There's no explicit option to run DataImportHandler in a synchronous mode / 
 blocking call. It could be useful to run DIH from SolrJ ( EmbeddedSolrServer 
 ) in the same thread. Currently one can pass dummy stream (or enable debug 
 mode) as a workaround to achieve the same behavior, but I think it makes 
 sense to add specific option for that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2268) Add support for Point in Polygon searches

2010-12-06 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968545#action_12968545
 ] 

Grant Ingersoll commented on SOLR-2268:
---

This is a work in progress.  Here are a few ideas:
I think this can all be accomplished via a few things:

For the case where the field is a polygon and the user supplies a point, we 
need a new FieldType, PolygonType.

I would propose the following format: vertices are separated by semi-colons, 
points are separated by commas just as they are for the other capabilities, 
i.e.: 1.0,1.0;0.0,0.0;3.0,3.0 gives the vertices 1.0,1.0 0,0, 3, 3.  
Lines are assumed between each point.  See the java.awt.Polygon class 


Next, I think we can cover everything else through some function queries:
For case one above
{code}
pip(pt, dimension, boost) -- pt can be a PointType or a Vector.  Boost says how 
much score to give if a point is in a polygon

pipll(latlonPt, boost) -- Use spherical calculations to determine if the lat 
lon point is in the polygon, as it is laid on a sphere 
//Note, we may just fold this into the one above, but I think the calculations 
could be different enough that we would want to avoid instanceof checks.  Plus 
the parsing is simpler
{code}

For case two above, the user would pass in a polygon as defined above for the 
PolygonType.  In this case, we still need a function query:
{code}
pip(poly, boost) -- poly is the passed in polygon, boost is the value to give 
if the point is in a polygon
{code}

For PointType, we can just use capabilities of java.awt.Polygon, for lat lon, 
I'm still investigating.  It could be we still use Polygon, but maybe we can 
just scale it a little bit bigger and live with some error.  Otherwise, there 
seems to be some decent algorithms for doing it w/ lat/lon 
(http://msdn.microsoft.com/en-us/library/cc451895.aspx for one).  Not sure that 
one is practical at scale, but it could be a start.

While we are at it, it shouldn't be that hard to do the same for lines, i.e. is 
the point on a line.

 Add support for Point in Polygon searches
 -

 Key: SOLR-2268
 URL: https://issues.apache.org/jira/browse/SOLR-2268
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll

 In spatial applications, it is common to ask whether a point is inside of a 
 polygon.  Solr could support two forms of this: 
 # A field contains a polygon and the user supplies a point.  If it does, the 
 doc is returned.  
 # A document contains a point and the user supplies a polygon.  If the point 
 is in the polygon, return the document
 With both of these case, it would be good to support the negative assertion, 
 too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2268) Add support for Point in Polygon searches

2010-12-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968553#action_12968553
 ] 

Lance Norskog commented on SOLR-2268:
-

2 tricks for speeding up document holds polygons, using vertex-based hashing 
of lat/long values. (It's a variation on a kind of bitwise filtering whose name 
I cannot remember: if the bit is off, there is no match, but if the bit is on 
there may be a match.)

Master data: A field with one or more polygon descriptions.
Bitwise data: Two bit fields, latitudelongitude, with a string of bits for 
each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there 
would be 360 bits in either bitfield. The document would have one of each 
bitfield. Each degree's bit is true if any polygon has area within that bit's 
degree. 

The first phase of searching for point in all polygons is to check the latitude 
and longitude bitfields for that point.

 Add support for Point in Polygon searches
 -

 Key: SOLR-2268
 URL: https://issues.apache.org/jira/browse/SOLR-2268
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll

 In spatial applications, it is common to ask whether a point is inside of a 
 polygon.  Solr could support two forms of this: 
 # A field contains a polygon and the user supplies a point.  If it does, the 
 doc is returned.  
 # A document contains a point and the user supplies a polygon.  If the point 
 is in the polygon, return the document
 With both of these case, it would be good to support the negative assertion, 
 too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1385 - Still Failing

2010-12-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1385/

All tests passed

Build Log (for compile errors):
[...truncated 18318 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-06 Thread Lance Norskog
I have not tried this, but some parts of the solrconfig elements
support setters for sub-elements. So, this might work but probably
won't.

mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy
maxMergeMB1024/maxMergeMB
/mergePolicy

On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote:
 Lucene has this method to set the maximum size of a segment when merging:
 LogByteSizeMergePolicy.setMaxMergeMB
 (http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/index/LogByteSizeMergePolicy.html#setMaxMergeMB%28double%29
 )

 I would like to be able to set this in my solrconfig.xml.  Is this
 possible?  If not should I open a JIRA issue or is there some gotcha I am
 unaware of?

 Tom

 Tom Burton-West




-- 
Lance Norskog
goks...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2268) Add support for Point in Polygon searches

2010-12-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968553#action_12968553
 ] 

Lance Norskog edited comment on SOLR-2268 at 12/6/10 10:08 PM:
---

1 trick for speeding up document holds polygons, using vertex-based hashing 
of lat/long values. (It's a variation on a kind of bitwise filtering whose name 
I cannot remember: if the bit is off, there is no match, but if the bit is on 
there may be a match.)

Master data: A field with one or more polygon descriptions.
Bitwise data: Two bit fields, latitudelongitude, with a string of bits for 
each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there 
would be 360 bits in either bitfield. The document would have one of each 
bitfield. Each degree's bit is true if any polygon has area within that bit's 
degree. 

The first phase of searching for point in all polygons is to check the latitude 
and longitude bitfields for that point.

  was (Author: lancenorskog):
2 tricks for speeding up document holds polygons, using vertex-based 
hashing of lat/long values. (It's a variation on a kind of bitwise filtering 
whose name I cannot remember: if the bit is off, there is no match, but if the 
bit is on there may be a match.)

Master data: A field with one or more polygon descriptions.
Bitwise data: Two bit fields, latitudelongitude, with a string of bits for 
each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there 
would be 360 bits in either bitfield. The document would have one of each 
bitfield. Each degree's bit is true if any polygon has area within that bit's 
degree. 

The first phase of searching for point in all polygons is to check the latitude 
and longitude bitfields for that point.
  
 Add support for Point in Polygon searches
 -

 Key: SOLR-2268
 URL: https://issues.apache.org/jira/browse/SOLR-2268
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll

 In spatial applications, it is common to ask whether a point is inside of a 
 polygon.  Solr could support two forms of this: 
 # A field contains a polygon and the user supplies a point.  If it does, the 
 doc is returned.  
 # A document contains a point and the user supplies a polygon.  If the point 
 is in the polygon, return the document
 With both of these case, it would be good to support the negative assertion, 
 too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2268) Add support for Point in Polygon searches

2010-12-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12968556#action_12968556
 ] 

Lance Norskog commented on SOLR-2268:
-

A second variation: a multiValued field of vertex pairs which contain a 
polygon. The incoming point searches for vertex point. This is faster than the 
bitwise filter, but uses more space for larger polygons. The bitwise filter 
uses constant memory for each document.


 Add support for Point in Polygon searches
 -

 Key: SOLR-2268
 URL: https://issues.apache.org/jira/browse/SOLR-2268
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll

 In spatial applications, it is common to ask whether a point is inside of a 
 polygon.  Solr could support two forms of this: 
 # A field contains a polygon and the user supplies a point.  If it does, the 
 doc is returned.  
 # A document contains a point and the user supplies a polygon.  If the point 
 is in the polygon, return the document
 With both of these case, it would be good to support the negative assertion, 
 too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 2255 - Failure

2010-12-06 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2255/

15 tests failed.
REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:441)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:243)
at 
org.apache.solr.cloud.CloudStateUpdateTest.setUp(CloudStateUpdateTest.java:131)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /collections
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:199)
at 
org.apache.solr.common.cloud.ZkStateReader.makeShardZkNodeWatches(ZkStateReader.java:184)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:430)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.CloudStateUpdateTest

Error Message:
ERROR: SolrIndexSearcher opens=24 closes=23

Stack Trace:
junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=24 
closes=23
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:128)
at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:302)
at 
org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:79)


REGRESSION:  
org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave

Error Message:
Error executing query

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119)
at 
org.apache.solr.handler.TestReplicationHandler.query(TestReplicationHandler.java:142)
at 
org.apache.solr.handler.TestReplicationHandler.clearIndexWithReplication(TestReplicationHandler.java:85)
at 
org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave(TestReplicationHandler.java:165)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)


request: http://localhost:31325/solr/select?q=*:*wt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)


REGRESSION:  
org.apache.solr.handler.TestReplicationHandler.testIndexAndConfigReplication

Error Message:
Error executing query

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119)
at 
org.apache.solr.handler.TestReplicationHandler.query(TestReplicationHandler.java:142)
at 
org.apache.solr.handler.TestReplicationHandler.clearIndexWithReplication(TestReplicationHandler.java:85)
at 
org.apache.solr.handler.TestReplicationHandler.testIndexAndConfigReplication(TestReplicationHandler.java:230)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)


request: http://localhost:31325/solr/select?q=*:*wt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)


REGRESSION:  org.apache.solr.handler.TestReplicationHandler.testStopPoll

Error Message:
Error executing query

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119)
at 

[jira] Updated: (LUCENE-2805) SegmentInfos shouldn't blindly increment version on commit

2010-12-06 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2805:


Attachment: LUCENE-2805.patch

here is a slightly updated patch that removes the blind increment from 
DefaultSegmentInfosWriter, adds #changed() calles to contrib classes and adds a 
missing #changed() call to IW#deleteAll() 

test pass also for LUCENE-2802

 SegmentInfos shouldn't blindly increment version on commit
 --

 Key: LUCENE-2805
 URL: https://issues.apache.org/jira/browse/LUCENE-2805
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2805.patch, LUCENE-2805.patch


 SegmentInfos currently increments version on the assumption that there are 
 always changes.
 But, both DirReader and IW are more careful about tracking whether there are 
 changes.  DirReader has hasChanges and IW has changeCount.  I think these 
 classes should notify the SIS when there are in fact changes; this will fix 
 the case Simon hit on fixing LUCENE-2082 when the NRT reader thought there 
 were changes, but in fact there weren't because IW simply committed the exact 
 SIS it already had.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Changes Mess

2010-12-06 Thread Shai Erera
Jumping in late to this thread, though I've read most of it.

As a user and committer, I find the current CHANGES very convenient!
It's very easy for me to read what has changed in 3.0, and very easy
for me to put a CHANGES entry whenever I work on something that
warrants such entry.

And if an issue is back ported all the way 'till 1.4, then IMO it
should contain an entry in each CHANGES (of each release). Users who
download 2.9.4 need to be able to read what has changed since 2.9.3,
in a clear and concise way. Which as far as I'm concerned is the
current situation and I'm happy with it.

Back porting issues is usually a simple svn merge, and in more complex
cases, even if it's done manually, the CHANGES entry is the easiest to
copy over.

I don't think we should work hard to make JIRA produce the CHANGES for
us - in the end of the day, JIRA is our bug tracking system, and it
should remain like that. The CHANGES entry need to summarize the
change to the reader, and combined with the issue number it gives
enough info. If one wants, one can load the issue in JIRA and read the
full correspondence.

So I'm +1 for keeping things as they are, and paying attention to put
the entries in all applicable CHANGES.

Shai

On Monday, December 6, 2010, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 Hey Robert,

 I feel ya. +1 to releasing more often! :)

 Cheers,
 Chris

 On Dec 6, 2010, at 8:31 AM, Robert Muir wrote:

 On Mon, Dec 6, 2010 at 11:20 AM, Mattmann, Chris A (388J)
 chris.a.mattm...@jpl.nasa.gov wrote:

 Yeah in the end all I can say is that you basically get out of JIRA what 
 you put into it. What you call extra work is just something that I would do 
 anyways working on some of my projects. I'm not saying it's *not difficult* 
 and super easy, but we've just decided in those cases to invest time into 
 the issue management system so that we can get the reports we want out of 
 it.

 I've seen this work both ways: in the early days of Nutch there were 
 intense debates on simply moving everything to JIRA versus maintaining a 
 disconnected CHANGES.txt file. I've heard all the arguments (many times 
 over) on both sides including tidbits like oh I don't want to go to a 
 separate URL as a consumer of software just to see what changed in it to 
 what's so hard about doing a curl or wget on an Internet-connected system 
 which most of our software assumes nowadays?, those types of things.

 When the dust cleared, I think I like the approach we use in Tika (and that 
 I use in a number of projects at JPL) which is just to maintain the 
 information in JIRA. It's worked for us since it's a single source to 
 curate that type of information; it produces very useable reports (not 
 perfect, but useable) that are good enough for us in terms of trading 
 between the different properties we want to maximize (user contribution 
 acknowledgement, change history, etc.)


 I agree with what you said, and as I mentioned before I'm not opposed
 to the idea at all.

 But if we are going to rely on JIRA more to produce this
 documentation, we need to make some major changes to how we use it, to
 avoid some of the problems I mentioned...

 The scariest part to me about this approach is that we unfortunately
 have very long release cycles. So i'm worried about this documentation
 being generated and fixed at release time versus incrementally where
 its fresh in our mind... thats a lot of editing and filtering to do.

 Obviously I feel this would be mitigated and other things much better
 if we released more often but thats a harder problem, this is just the
 situation as it is now.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org