date:20120807

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-07 Thread Eks Dev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429985#comment-13429985
 ] 

Eks Dev commented on SOLR-3684:
---

We did it a long time ago on tomcat, as we use particularly expensive 
analyzers, so even for searching optimum is around Noo cores. Actually, that 
was the only big problem with solr we had.  
 
Actually, anything that keeps insane thread churn low helps. Not only max 
number of threads, but TTL time for idle threads should be also somehow 
increased. The longer threads live, the better. Solr is completely safe due to 
core-reloading and smart Index management, no point in renewing threads.   

If one needs to queue requests, that is just another problem,  but for this 
there no need to up max worker threads to more than number of cores plus some 
smallish constant

What we would like to achieve is to keep separate thread pools for searching, 
indexing and "the rest"... but we never managed to figure out how to do it. 
even benign, /ping, /status whatever are increasing thread churn... If we 
were able to configure separate pools , we could keep small number of 
long-living threads for searching, even smaller number for indexing and one 
"who cares" pool for the rest. It is somehow possible on tomcat, if someone 
knows how to do it, please share. 

> Frequently full gc while do pressure index
> --
>
> Key: SOLR-3684
> URL: https://issues.apache.org/jira/browse/SOLR-3684
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.0-ALPHA
> Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads 
> Index: 20 field
> Core: 5
>Reporter: Raintung Li
>Priority: Critical
>  Labels: garbage, performance
> Fix For: 4.0
>
> Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 
> fields do test, the field type is normal text_general, start 1000 threads for 
> Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very 
> quickly. After check the root cause, find the java process always do the full 
> GC. 
> Check the heap dump, the main object is StandardTokenizer, it is be saved in 
> the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse 
> component strategy, that means one field has one own StandardTokenizer if it 
> use standard analyzer,  and standardtokenizer will occur 32KB memory because 
> of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
> and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only 
> analyses by one thread.  For one thread will parse the document’s field step 
> by step, so the same field type can use the same reused component. While 
> thread switches the same type’s field analyzes only reset the same component 
> input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification 
> patch for IndexSchema.java: 
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
> 
>   private class SolrFieldReuseStrategy extends ReuseStrategy {
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public TokenStreamComponents getReusableComponents(String 
> fieldName) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   return componentsPerField != null ? 
> componentsPerField.get(analyzers.get(fieldName)) : null;
> }
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public void setReusableComponents(String fieldName, 
> TokenStreamComponents components) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   if (componentsPerField == null) {
> componentsPerField = new HashMap TokenStreamComponents>();
> setStoredValue(componentsPerField);
>   }
>   componentsPerField.put(analyzers.get(fieldName), components);
> }
>   }
>   
> protected final static HashMap analyzers;
> /**
>  * Implementation of {@link ReuseStrategy} that reuses components 
> per-field by
>  * maintaining a Map of TokenStreamComponent per field name.
>  */
> 
> SolrI

[jira] [Created] (LUCENE-4293) ArabicRootsAnalyzer

2012-08-07 Thread Ibrahim (JIRA)

Ibrahim created LUCENE-4293:
---

 Summary: ArabicRootsAnalyzer
 Key: LUCENE-4293
 URL: https://issues.apache.org/jira/browse/LUCENE-4293
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ibrahim
Priority: Minor


ArabicRootsAnalyzer is using an index of Arabic terms associated with its 
roots. each Arabic word has a root. There is no automatic way of deciding the 
root.

This Analyzer will match any term with its root, searching/indexing will be 
based on roots. It gives me great results in my application.

attached all the required files with the db. the problem with it is the size of 
the db (16MB). number of terms is around 300,000. I have another db with 
600,000 but the attached one is summarized and better i believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4293) ArabicRootsAnalyzer

2012-08-07 Thread Ibrahim (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ibrahim updated LUCENE-4293:


Attachment: rootsTableIndex.zip
ArabicTokens.txt
ArabicTokenizer.java
ArabicRootsAnalyzer.java
ArabicRootFilter.java

> ArabicRootsAnalyzer
> ---
>
> Key: LUCENE-4293
> URL: https://issues.apache.org/jira/browse/LUCENE-4293
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Ibrahim
>Priority: Minor
> Attachments: ArabicRootFilter.java, ArabicRootsAnalyzer.java, 
> ArabicTokenizer.java, ArabicTokens.txt, rootsTableIndex.zip
>
>
> ArabicRootsAnalyzer is using an index of Arabic terms associated with its 
> roots. each Arabic word has a root. There is no automatic way of deciding the 
> root.
> This Analyzer will match any term with its root, searching/indexing will be 
> based on roots. It gives me great results in my application.
> attached all the required files with the db. the problem with it is the size 
> of the db (16MB). number of terms is around 300,000. I have another db with 
> 600,000 but the attached one is summarized and better i believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-4.x - Build # 60 - Failure

2012-08-07 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-4.x/60/

No tests ran.

Build Log:
[...truncated 11595 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-4.x/checkout/lucene/build.xml:413:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-4.x/checkout/lucene/build.xml:514:
 exec returned: 7

Total time: 10 minutes 51 seconds
Build step 'Execute shell' marked build as failure
[TASKS] Scanning folder '/home/hudson/hudson-slave/workspace/Lucene-4.x' for 
files matching the pattern '**/*.java' - excludes: 
[TASKS] Found 6731 files to scan for tasks
[TASKS] Found 2357 open tasks.
[TASKS] Computing warning deltas based on reference build #59
Archiving artifacts
Publishing Clover coverage report...
No Clover report will be published due to a Build Failure
Recording test results
Publishing Javadoc
Email was triggered for: Failure
Sending email for trigger: Failure




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.6.0_33) - Build # 126 - Failure!

2012-08-07 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/126/
Java: 32bit/jdk1.6.0_33 -client -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 19153 lines...]
javadocs-lint:

[...truncated 1675 lines...]
BUILD FAILED
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\build.xml:47: The following error 
occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:524: The 
following error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:514: exec 
returned: 1

Total time: 42 minutes 9 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.6.0_33) - Build # 130 - Failure!

2012-08-07 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/130/
Java: 32bit/jdk1.6.0_33 -server -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 19005 lines...]
javadocs-lint:

[...truncated 1670 lines...]
BUILD FAILED
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\build.xml:47: The following 
error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:524: The 
following error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:514: exec 
returned: 1

Total time: 41 minutes 29 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: 4.0-BETA

2012-08-07 Thread Martijn v Groningen

+1 SmokeTestRelease.py ran successfully.

On 7 August 2012 05:32, Robert Muir  wrote:
> Artifacts here:
> http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0bRC0-rev1370099/
>
> The list of changes since 4.0-ALPHA is pretty large: lots of important
> bugs were fixed.
>
> This passes the smoketester (if you use it, you must use python3 now),
> so here is my +1. I think we should get it out and iterate towards the
> final release.
>
> P.S.: I will clean up JIRA etc as discussed before, so I don't ruin
> Hossman's day. If we need to respin we can just move the additional
> issues into CHANGES/JIRA section and then respin.
>
> --
> lucidimagination.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4069) Segment-level Bloom filters

2012-08-07 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430282#comment-13430282
 ] 

Mikhail Khludnev commented on LUCENE-4069:
--

Mark,

I see that several issues were fixed to land it to trunk/4.0. Is there bugs 
which should be fixed in initial 3.6 patch? I'm asking because I need his 
feature but I'm stuck with 3.x for a while. Do you recommend 
MHBloomFilterOn3.6Branch.patch for production use? 

PS thanks for your contribution. it's awesome! 

> Segment-level Bloom filters
> ---
>
> Key: LUCENE-4069
> URL: https://issues.apache.org/jira/browse/LUCENE-4069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.6, 4.0-ALPHA
>Reporter: Mark Harwood
>Assignee: Mark Harwood
>Priority: Minor
> Fix For: 4.0-BETA, 5.0
>
> Attachments: 4069Failure.zip, BloomFilterPostingsBranch4x.patch, 
> LUCENE-4069-tryDeleteDocument.patch, LUCENE-4203.patch, 
> MHBloomFilterOn3.6Branch.patch, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PKLookupUpdatePerfTest.java, 
> PKLookupUpdatePerfTest.java, PrimaryKeyPerfTest40.java
>
>
> An addition to each segment which stores a Bloom filter for selected fields 
> in order to give fast-fail to term searches, helping avoid wasted disk access.
> Best suited for low-frequency fields e.g. primary keys on big indexes with 
> many segments but also speeds up general searching in my tests.
> Overview slideshow here: 
> http://www.slideshare.net/MarkHarwood/lucene-bloomfilteredsegments
> Benchmarks based on Wikipedia content here: http://goo.gl/X7QqU
> Patch based on 3.6 codebase attached.
> There are no 3.6 API changes currently - to play just add a field with "_blm" 
> on the end of the name to invoke special indexing/querying capability. 
> Clearly a new Field or schema declaration(!) would need adding to APIs to 
> configure the service properly.
> Also, a patch for Lucene4.0 codebase introducing a new PostingsFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-07 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430293#comment-13430293
 ] 

Yonik Seeley commented on SOLR-3684:


bq. What we would like to achieve is to keep separate thread pools for 
searching, indexing and "the rest".

Yeah, exactly.  I'd love to be able to assign different thread pools to 
different URLs, but I don't know if that's doable in Jetty or not.

> Frequently full gc while do pressure index
> --
>
> Key: SOLR-3684
> URL: https://issues.apache.org/jira/browse/SOLR-3684
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.0-ALPHA
> Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads 
> Index: 20 field
> Core: 5
>Reporter: Raintung Li
>Priority: Critical
>  Labels: garbage, performance
> Fix For: 4.0
>
> Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 
> fields do test, the field type is normal text_general, start 1000 threads for 
> Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very 
> quickly. After check the root cause, find the java process always do the full 
> GC. 
> Check the heap dump, the main object is StandardTokenizer, it is be saved in 
> the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse 
> component strategy, that means one field has one own StandardTokenizer if it 
> use standard analyzer,  and standardtokenizer will occur 32KB memory because 
> of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
> and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only 
> analyses by one thread.  For one thread will parse the document’s field step 
> by step, so the same field type can use the same reused component. While 
> thread switches the same type’s field analyzes only reset the same component 
> input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification 
> patch for IndexSchema.java: 
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
> 
>   private class SolrFieldReuseStrategy extends ReuseStrategy {
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public TokenStreamComponents getReusableComponents(String 
> fieldName) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   return componentsPerField != null ? 
> componentsPerField.get(analyzers.get(fieldName)) : null;
> }
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public void setReusableComponents(String fieldName, 
> TokenStreamComponents components) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   if (componentsPerField == null) {
> componentsPerField = new HashMap TokenStreamComponents>();
> setStoredValue(componentsPerField);
>   }
>   componentsPerField.put(analyzers.get(fieldName), components);
> }
>   }
>   
> protected final static HashMap analyzers;
> /**
>  * Implementation of {@link ReuseStrategy} that reuses components 
> per-field by
>  * maintaining a Map of TokenStreamComponent per field name.
>  */
> 
> SolrIndexAnalyzer() {
>   super(new solrFieldReuseStrategy());
>   analyzers = analyzerCache();
> }
> protected HashMap analyzerCache() {
>   HashMap cache = new HashMap();
>   for (SchemaField f : getFields().values()) {
> Analyzer analyzer = f.getType().getAnalyzer();
> cache.put(f.getName(), analyzer);
>   }
>   return cache;
> }
> @Override
> protected Analyzer getWrappedAnalyzer(String fieldName) {
>   Analyzer analyzer = analyzers.get(fieldName);
>   return analyzer != null ? analyzer : 
> getDynamicFieldType(fieldName).getAnalyzer();
> }
> @Override
> protected TokenStreamComponents wrapComponents(String fieldName, 
> TokenStreamComponents components) {
>   return components;
> }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
> @Override
> protected HashMap analyzerCache() {
>

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-07 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430294#comment-13430294
 ] 

Robert Muir commented on SOLR-3684:
---

What about http://docs.codehaus.org/display/JETTY/Quality+of+Service+Filter ?
 

> Frequently full gc while do pressure index
> --
>
> Key: SOLR-3684
> URL: https://issues.apache.org/jira/browse/SOLR-3684
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.0-ALPHA
> Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads 
> Index: 20 field
> Core: 5
>Reporter: Raintung Li
>Priority: Critical
>  Labels: garbage, performance
> Fix For: 4.0
>
> Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 
> fields do test, the field type is normal text_general, start 1000 threads for 
> Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very 
> quickly. After check the root cause, find the java process always do the full 
> GC. 
> Check the heap dump, the main object is StandardTokenizer, it is be saved in 
> the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse 
> component strategy, that means one field has one own StandardTokenizer if it 
> use standard analyzer,  and standardtokenizer will occur 32KB memory because 
> of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
> and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only 
> analyses by one thread.  For one thread will parse the document’s field step 
> by step, so the same field type can use the same reused component. While 
> thread switches the same type’s field analyzes only reset the same component 
> input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification 
> patch for IndexSchema.java: 
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
> 
>   private class SolrFieldReuseStrategy extends ReuseStrategy {
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public TokenStreamComponents getReusableComponents(String 
> fieldName) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   return componentsPerField != null ? 
> componentsPerField.get(analyzers.get(fieldName)) : null;
> }
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public void setReusableComponents(String fieldName, 
> TokenStreamComponents components) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   if (componentsPerField == null) {
> componentsPerField = new HashMap TokenStreamComponents>();
> setStoredValue(componentsPerField);
>   }
>   componentsPerField.put(analyzers.get(fieldName), components);
> }
>   }
>   
> protected final static HashMap analyzers;
> /**
>  * Implementation of {@link ReuseStrategy} that reuses components 
> per-field by
>  * maintaining a Map of TokenStreamComponent per field name.
>  */
> 
> SolrIndexAnalyzer() {
>   super(new solrFieldReuseStrategy());
>   analyzers = analyzerCache();
> }
> protected HashMap analyzerCache() {
>   HashMap cache = new HashMap();
>   for (SchemaField f : getFields().values()) {
> Analyzer analyzer = f.getType().getAnalyzer();
> cache.put(f.getName(), analyzer);
>   }
>   return cache;
> }
> @Override
> protected Analyzer getWrappedAnalyzer(String fieldName) {
>   Analyzer analyzer = analyzers.get(fieldName);
>   return analyzer != null ? analyzer : 
> getDynamicFieldType(fieldName).getAnalyzer();
> }
> @Override
> protected TokenStreamComponents wrapComponents(String fieldName, 
> TokenStreamComponents components) {
>   return components;
> }
>   }
>   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
> @Override
> protected HashMap analyzerCache() {
>   HashMap cache = new HashMap();
>for (SchemaField f : getFields().values()) {
> Analyzer analyzer = f.getType().getQueryAnalyzer();
> cache.put(f

Re: VOTE: 4.0-BETA

2012-08-07 Thread Michael McCandless

+1, SmokeTestRelease was happy for me too.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Aug 6, 2012 at 11:32 PM, Robert Muir  wrote:
> Artifacts here:
> http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0bRC0-rev1370099/
>
> The list of changes since 4.0-ALPHA is pretty large: lots of important
> bugs were fixed.
>
> This passes the smoketester (if you use it, you must use python3 now),
> so here is my +1. I think we should get it out and iterate towards the
> final release.
>
> P.S.: I will clean up JIRA etc as discussed before, so I don't ruin
> Hossman's day. If we need to respin we can just move the additional
> issues into CHANGES/JIRA section and then respin.
>
> --
> lucidimagination.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3719) Add "instant search" capability to /browse

2012-08-07 Thread Erik Hatcher (JIRA)

Erik Hatcher created SOLR-3719:
--

 Summary: Add "instant search" capability to /browse
 Key: SOLR-3719
 URL: https://issues.apache.org/jira/browse/SOLR-3719
 Project: Solr
  Issue Type: New Feature
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Fix For: 4.0


Once upon a time I tinkered with this in a personal github fork 
https://github.com/erikhatcher/lucene-solr/commits/instant_search/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Stemming Indonesian in Lucene

2012-08-07 Thread Emiliana Suci

ok thanx a lot.
I hope to implement in lucene.
I'll try again.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-Indonesian-in-Lucene-tp3999321p3999574.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4258) Incremental Field Updates through Stacked Segments

2012-08-07 Thread Sivan Yogev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430322#comment-13430322
 ] 

Sivan Yogev commented on LUCENE-4258:
-

Working on the details, it seems that we need to add a new layer of information 
for stacked segments. For each field that was added with REPLACE_FIELDS, we 
need to hold the documents in which a replace took place, with the number of 
the latest generation that had the replacement. Name this list the "generation 
vector". That way, TermDocs provided by StackedSegmentReader for a certain term 
is a special merge of that term's TermDocs for all stacked segments. The 
"special" part about it is that we ignore occurrences from documents in which 
the term's field was replaced in a later generation.

An example. Assume we have doc 1 with title "I love bananas" and doc 2 with 
title "I love oranges", and the segment is flushed. We will have the following 
base segment (ignoring positions):

bananas: doc 1
I: doc1, doc 2
love: doc 1, doc 2
oranges: doc2

Now we add to doc 1 additional title field "I hate apples", and replace the 
title of doc 2 with "I love lemons", and flush. We will have the following 
segment for generation 1:

apples: doc 1
hate: doc 1
I: doc 1, doc 2
lemons: doc 2
love: doc 2
generation vector for field "title": (doc 2, generation 1)

TermDocs for a few terms: 
* title:bananas : {1}, uses the TermDocs of the base segment and not affected 
by the field title generation vector.
* title:oranges : {}, uses the TermDocs of the base segment, doc 2 title 
affected for generations < 1, and the generation is 0.
* title:lemons : {2}, uses the TermDocs of generation 1. Doc 2 title affected 
for generations < 1, but the term appears in generation 1.
* title:love : {1,2}, uses the TermDocs of both segments. Doc 2 title affected 
for generations < 1, but the term appears in generation 1.

I propose to initially use PackedInts for the generation vector, since we know 
how many generations the curent segment has upon flushing. Later we might 
consider special treatment for sparse vectors.


> Incremental Field Updates through Stacked Segments
> --
>
> Key: LUCENE-4258
> URL: https://issues.apache.org/jira/browse/LUCENE-4258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Sivan Yogev
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Windows ([[ Exception while replacing ENV. Please report this as a bug. ]]

2012-08-07 Thread Policeman Jenkins Server

{{ java.lang.NullPointerException }})
 - Build # 129 - Failure!
MIME-Version: 1.0
Content-Type: multipart/mixed; 
boundary="=_Part_2_1669005297.1344346489066"
Precedence: bulk

--=_Part_2_1669005297.1344346489066
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/129/
Java: [[ Exception while replacing ENV. Please report this as a bug. ]]
{{ java.lang.NullPointerException }}

No tests ran.

Build Log:
[...truncated 6832 lines...]
FATAL: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:663)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
at $Proxy71.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861)
at hudson.Launcher$ProcStarter.join(Launcher.java:345)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:717)
at hudson.model.Build$BuildExecution.build(Build.java:199)
at hudson.model.Build$BuildExecution.doRun(Build.java:160)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1488)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:236)
Caused by: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:719)
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:835)
at hudson.remoting.Channel$2.handle(Channel.java:433)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
Caused by: hudson.remoting.Channel$OrderlyShutdown
... 3 more
Caused by: Command close created at
at hudson.remoting.Command.(Command.java:54)
at hudson.remoting.Channel$CloseCommand.(Channel.java:829)
at hudson.remoting.Channel$CloseCommand.(Channel.java:827)
at hudson.remoting.Channel.close(Channel.java:894)
at hudson.remoting.Channel.close(Channel.java:877)
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:834)
... 2 more


--=_Part_2_1669005297.1344346489066--

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3579) SolrCloud view should default to cluster graphical view rather than zk nodes

2012-08-07 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430363#comment-13430363
 ] 

Mark Miller commented on SOLR-3579:
---

Thanks for the patch Stefan!

> SolrCloud view should default to cluster graphical view rather than zk nodes
> 
>
> Key: SOLR-3579
> URL: https://issues.apache.org/jira/browse/SOLR-3579
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Mark Miller
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3579.patch
>
>
> This seems more user friendly to me - other opinions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: 4.0-BETA

2012-08-07 Thread David Smiley (@MITRE.org)

Can you please remind us on the ramifications of a beta release for us
developers?  In particular, I mean are there limitations on the sorts of
changes?  The alpha introduced no index change, for example.  Sorry if
you've answered this before but I had trouble looking for it.

Cheers,
~ David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/VOTE-4-0-BETA-tp3999508p3999583.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: 4.0-BETA

2012-08-07 Thread Robert Muir

On Tue, Aug 7, 2012 at 9:59 AM, David Smiley (@MITRE.org)
 wrote:
> Can you please remind us on the ramifications of a beta release for us
> developers?  In particular, I mean are there limitations on the sorts of
> changes?  The alpha introduced no index change, for example.  Sorry if
> you've answered this before but I had trouble looking for it.
>

Wait: no index change isn't correct. We can always change it. We just
do it in a backwards compatible way.
I propose the same guarantees for beta (nothing more, nothing less).

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430373#comment-13430373
 ] 

Adrien Grand commented on LUCENE-3892:
--

I backported Mike's changes to the {{BlockPacked}} codec and tried to 
understand why it was slower than {{Block}}...

The use of {{java.nio.*Buffer}} seemed to be the bottleneck 
({{ByteBuffer.asLongBuffer}} and {{ByteBuffer.getLong}} especially are _very_ 
slow) of the decoding step so I switched back to decoding from long[] (instead 
of LongBuffer) and added direct decoding from byte[] to avoid having to convert 
the bytes to longs before decoding.

Tests passed with -Dtests.postingsformat=BlockPacked. Here are the results of 
the benchmark (unfortunately, it started before Mike committed r1370179):

{noformat}
TaskQPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed   
   Pct diff
PKLookup  259.419.06  255.778.89   -8% -
5%
  AndHighLow 1656.30   50.44 1653.85   55.05   -6% -
6%
 AndHighHigh   82.901.82   83.472.52   -4% -
6%
  AndHighMed  274.76   11.11  278.51   13.42   -7% -   
10%
 Prefix3  285.414.82  289.606.31   -2% -
5%
HighTerm  230.78   14.33  235.16   20.61  -12% -   
18%
  IntNRQ   55.911.03   57.132.73   -4% -
9%
 LowTerm 1720.10   47.06 1759.16   55.47   -3% -
8%
Wildcard  290.543.82  297.395.420% -
5%
 MedTerm  733.01   35.38  750.46   50.37   -8% -   
14%
HighSpanNear6.930.237.120.39   -6% -   
11%
  HighPhrase6.460.226.650.46   -7% -   
14%
 Respell   96.112.84   99.003.98   -3% -   
10%
  OrHighHigh   38.072.53   39.233.06  -10% -   
19%
  Fuzzy2   50.291.70   51.872.25   -4% -   
11%
   MedPhrase   26.200.94   27.031.07   -4% -   
11%
   OrHighMed  138.837.76  143.549.79   -8% -   
16%
  Fuzzy1  100.582.15  104.213.99   -2% -
9%
HighSloppyPhrase5.260.115.450.24   -3% -   
10%
   OrHighLow   78.435.55   81.806.89  -10% -   
21%
 MedSpanNear   32.751.13   34.281.73   -3% -   
13%
   LowPhrase   90.273.20   95.063.58   -2% -   
13%
 LowSpanNear   46.401.95   48.892.40   -3% -   
15%
 MedSloppyPhrase   36.291.00   38.591.460% -   
13%
 LowSloppyPhrase   37.411.11   40.481.391% -   
15%
{noformat}

Mike, Billy, could you check that {{BLockPacked}} is at least as fast as 
{{Block}} on your computer too?

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Created] (LUCENE-4294) expose CheckIndex to work on AtomicReader

2012-08-07 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-4294:
---

 Summary: expose CheckIndex to work on AtomicReader
 Key: LUCENE-4294
 URL: https://issues.apache.org/jira/browse/LUCENE-4294
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-4294.patch

The main test methods in checkindex can just work on AtomicReader, so expose 
these as static methods, and add _TestUtil.checkReader (similar to checkIndex).

This would allow for us to verify consistency of things like 
ParallelReader/SlowWrapper (and thus Multi*) or whatever.

Its a simple patch, but I'm not sure where to inject the check so we do it 
automagically (or at least sometimes automagically)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4294) expose CheckIndex to work on AtomicReader

2012-08-07 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4294:


Attachment: LUCENE-4294.patch

> expose CheckIndex to work on AtomicReader
> -
>
> Key: LUCENE-4294
> URL: https://issues.apache.org/jira/browse/LUCENE-4294
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
> Attachments: LUCENE-4294.patch
>
>
> The main test methods in checkindex can just work on AtomicReader, so expose 
> these as static methods, and add _TestUtil.checkReader (similar to 
> checkIndex).
> This would allow for us to verify consistency of things like 
> ParallelReader/SlowWrapper (and thus Multi*) or whatever.
> Its a simple patch, but I'm not sure where to inject the check so we do it 
> automagically (or at least sometimes automagically)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_05) - Build # 134 - Failure!

2012-08-07 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/134/
Java: 64bit/jdk1.7.0_05 -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 19791 lines...]
javadocs-lint:

[...truncated 1670 lines...]
BUILD FAILED
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\build.xml:47: The following 
error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:524: The 
following error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:514: exec 
returned: 1

Total time: 40 minutes 42 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4294) expose CheckIndex to work on AtomicReader

2012-08-07 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4294:


Attachment: LUCENE-4294.patch

Here's what I've been testing... this is an evil place to put the check but it 
works (just a hack).

I fixed an outdated bogon in #unique terms assertion (slow wrapper returns -1).

> expose CheckIndex to work on AtomicReader
> -
>
> Key: LUCENE-4294
> URL: https://issues.apache.org/jira/browse/LUCENE-4294
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
> Attachments: LUCENE-4294.patch, LUCENE-4294.patch
>
>
> The main test methods in checkindex can just work on AtomicReader, so expose 
> these as static methods, and add _TestUtil.checkReader (similar to 
> checkIndex).
> This would allow for us to verify consistency of things like 
> ParallelReader/SlowWrapper (and thus Multi*) or whatever.
> Its a simple patch, but I'm not sure where to inject the check so we do it 
> automagically (or at least sometimes automagically)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3717) DirectoryFactory.close() is never called

2012-08-07 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430395#comment-13430395
 ] 

Mark Miller commented on SOLR-3717:
---

started looking into this - I see a dir factory close in DefaultSolrCoreState 
around line 148?

> DirectoryFactory.close() is never called
> 
>
> Key: SOLR-3717
> URL: https://issues.apache.org/jira/browse/SOLR-3717
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Fix For: 5.0, 4.0
>
>
> While working on SOLR-3699 i noticed that DirectoryFactory implements 
> Closable (and thus: has a close() method) but (unless i'm missing something) 
> never gets closed.
> I suspect the code that use to close() the DirectoryFactory got refactored 
> into oblivion when SolrCoreState was introduced, and reloading a SolrCore 
> started reusing the same DirectoryFactory.
> it seems like either DirectoryFactory should no longer have a close() method, 
> or something at the CoreContainer level should ensure that all 
> DirectoryFactories are closed when shuting down

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3685) solrcloud crashes on startup due to excessive memory consumption

2012-08-07 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430400#comment-13430400
 ] 

Mark Miller commented on SOLR-3685:
---

I was off a bit - even a non graceful shutdown should not cause this - if you 
are not indexing when you shutdown, at worst nodes should sync - not replicate.

In my testing, I could easily replicate this though - replication recoveries 
when it should be a sync.

Yonik recently committed a fix to this on trunk.

> solrcloud crashes on startup due to excessive memory consumption
> 
>
> Key: SOLR-3685
> URL: https://issues.apache.org/jira/browse/SOLR-3685
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Debian GNU/Linux Squeeze 64bit
> Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
>Reporter: Markus Jelsma
>Priority: Critical
> Fix For: 4.1
>
> Attachments: info.log
>
>
> There's a serious problem with restarting nodes, not cleaning old or unused 
> index directories and sudden replication and Java being killed by the OS due 
> to excessive memory allocation. Since SOLR-1781 was fixed index directories 
> get cleaned up when a node is being restarted cleanly, however, old or unused 
> index directories still pile up if Solr crashes or is being killed by the OS, 
> happening here.
> We have a six-node 64-bit Linux test cluster with each node having two 
> shards. There's 512MB RAM available and no swap. Each index is roughly 27MB 
> so about 50MB per node, this fits easily and works fine. However, if a node 
> is being restarted, Solr will consistently crash because it immediately eats 
> up all RAM. If swap is enabled Solr will eat an additional few 100MB's right 
> after start up.
> This cannot be solved by restarting Solr, it will just crash again and leave 
> index directories in place until the disk is full. The only way i can restart 
> a node safely is to delete the index directories and have it replicate from 
> another node. If i then restart the node it will crash almost consistently.
> I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3717) DirectoryFactory.close() is never called

2012-08-07 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3717.


Resolution: Not A Problem

thanks mark, i definitely missed seeing that.

> DirectoryFactory.close() is never called
> 
>
> Key: SOLR-3717
> URL: https://issues.apache.org/jira/browse/SOLR-3717
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Fix For: 5.0, 4.0
>
>
> While working on SOLR-3699 i noticed that DirectoryFactory implements 
> Closable (and thus: has a close() method) but (unless i'm missing something) 
> never gets closed.
> I suspect the code that use to close() the DirectoryFactory got refactored 
> into oblivion when SolrCoreState was introduced, and reloading a SolrCore 
> started reusing the same DirectoryFactory.
> it seems like either DirectoryFactory should no longer have a close() method, 
> or something at the CoreContainer level should ensure that all 
> DirectoryFactories are closed when shuting down

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430423#comment-13430423
 ] 

Han Jiang commented on LUCENE-3892:
---

Thanks Adrien! Your codes are really clean!

At first glance, I think we should still support all-value-the-same case? For 
some applications(like index with payloads), that might be helpful.

And, I'm a little confused about your performance test. Did you use BlockPF 
before r1370179 as a baseline, and compare it with your latest commit? Here, I 
tested these two PF under latest versions(r1370345).

{noformat}
TaskQPS base StdDev baseQPS comp StdDev comp  Pct 
diff
 AndHighHigh  124.539.36  100.463.31  -27% -   
-9%
  AndHighLow 2141.08   63.93 1922.73   36.32  -14% -   
-5%
  AndHighMed  281.48   36.49  218.68   13.10  -35% -   
-5%
  Fuzzy1   84.332.56   83.941.67   -5% -
4%
  Fuzzy2   30.491.13   30.480.71   -5% -
6%
  HighPhrase9.080.287.560.20  -21% -  
-11%
HighSloppyPhrase5.460.214.880.23  -17% -   
-2%
HighSpanNear   10.120.219.210.30  -13% -   
-3%
HighTerm  176.526.13  146.135.43  -22% -  
-11%
  IntNRQ   59.561.98   51.051.33  -19% -   
-9%
   LowPhrase   40.021.03   32.750.37  -21% -  
-15%
 LowSloppyPhrase   59.592.85   51.491.33  -19% -   
-6%
 LowSpanNear   73.863.17   61.981.45  -21% -  
-10%
 LowTerm 1755.38   15.56 1622.61   26.87   -9% -   
-5%
   MedPhrase   25.990.47   21.010.17  -21% -  
-16%
 MedSloppyPhrase   30.520.89   24.770.55  -22% -  
-14%
 MedSpanNear   22.260.43   18.730.47  -19% -  
-12%
 MedTerm  651.90   18.97  573.34   19.25  -17% -   
-6%
  OrHighHigh   26.750.33   23.530.50  -14% -   
-9%
   OrHighLow  151.692.13  134.173.19  -14% -   
-8%
   OrHighMed  102.481.48   90.732.01  -14% -   
-8%
PKLookup  216.595.70  215.992.99   -4% -
3%
 Prefix3  166.000.78  145.251.29  -13% -  
-11%
 Respell   82.013.01   82.801.66   -4% -
6%
Wildcard  151.662.22  141.141.57   -9% -   
-4%
{noformat}

Strange that it isn't working well on my computer. And results are similar when 
I change MMapDirectory to NIOFSDirectory.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---

[jira] [Assigned] (SOLR-3229) TermVectorComponent does not return terms in distributed search

2012-08-07 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-3229:
--

Assignee: Hoss Man

> TermVectorComponent does not return terms in distributed search
> ---
>
> Key: SOLR-3229
> URL: https://issues.apache.org/jira/browse/SOLR-3229
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.0-ALPHA
> Environment: Ubuntu 11.10, openjdk-6
>Reporter: Hang Xie
>Assignee: Hoss Man
>  Labels: patch
> Fix For: 4.0
>
> Attachments: TermVectorComponent.patch
>
>
> TermVectorComponent does not return terms in distributed search, the 
> distributedProcess() incorrectly uses Solr Unique Key to do subrequests, 
> while process() expects Lucene document ids. Also, parameters are transferred 
> in different format thus making distributed search returns no result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3718) /tvrh request handler is not working

2012-08-07 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3718.


Resolution: Duplicate

> /tvrh request handler is not working
> 
>
> Key: SOLR-3718
> URL: https://issues.apache.org/jira/browse/SOLR-3718
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.0-ALPHA
> Environment: ubuntu 12.04 LTS
> OpenJDK 64-Bit Server VM (20.0-b12)
>Reporter: Liu Chao
>
> I checkout the 4.0-ALPHA version and run the default example with all data 
> from xml files imported. When I try 
> "http://localhost:8983/solr/collection1/tvrh?shards.qt=/tvrh&collection=collection1&shards=shard1&q=includes%3AUSB&tv=true&tv.all=true&f.includes.tv.offsets=false&tv.fl=includes";
>  I got error in TermVectorComponent:
> INFO: [collection1] webapp=/solr path=/tvrh 
> params={shards.qt=/tvrh&distrib=false&f.includes.tv.offsets=false&tv.all=true&collection=collection1&tv.docIds=9885A004,MA147LL/A,3007&wt=javabin&version=2&NOW=1344321467766&shard.url=ubuntu:8983/solr/collection1/&df=includes&tv=true&tv.fl=includes&qt=/tvrh&isShard=true}
>  hits=0 status=400 QTime=1 
> Aug 7, 2012 2:37:47 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: For input string: 
> "9885A004,MA147LL/A,3007"
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:397)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
>   at 
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:165)
>   at 
> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:132)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>   at java.lang.Thread.run(Thread.java:679)
> If I remove "shards.qt=/tvrh" from query I can get search result without any 
> term vector information.
> I debug the code and find out that TermVectorComponent are expecting integer 
> document id instead of unique key specified in schema.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430439#comment-13430439
 ] 

Michael McCandless commented on LUCENE-3892:


Hmm also not great results on my env (base=Block, packed=BlockPacked), based on 
current branch head:

{noformat}
TaskQPS base StdDev base  QPS packedStdDev packed  Pct 
diff
  AndHighMed   59.233.07   34.240.69  -46% -  
-37%
  AndHighLow  576.35   21.09  349.577.44  -42% -  
-35%
 AndHighHigh   23.830.72   15.530.29  -37% -  
-31%
   MedPhrase   12.560.208.870.31  -32% -  
-25%
   LowPhrase   20.520.21   14.890.43  -30% -  
-24%
 MedSloppyPhrase7.460.205.410.13  -31% -  
-23%
 LowSloppyPhrase6.730.184.920.12  -30% -  
-22%
 LowSpanNear7.630.325.650.19  -31% -  
-20%
HighSloppyPhrase1.900.081.520.05  -25% -  
-14%
  HighPhrase1.570.041.260.08  -26% -  
-12%
 MedSpanNear3.840.183.140.14  -25% -  
-10%
 LowTerm  433.22   34.89  364.03   15.63  -25% -   
-4%
HighSpanNear1.400.071.190.06  -23% -   
-6%
  IntNRQ9.500.438.090.92  -27% -
0%
HighTerm   29.474.89   25.462.35  -32% -   
13%
 MedTerm  148.76   21.53  129.179.59  -29% -
9%
 Prefix3   72.812.20   63.653.88  -20% -   
-4%
Wildcard   44.790.92   39.912.20  -17% -   
-4%
   OrHighMed   16.810.48   15.280.21  -12% -   
-5%
   OrHighLow   21.850.67   20.030.32  -12% -   
-3%
  OrHighHigh8.490.287.800.14  -12% -   
-3%
  Fuzzy1   61.331.95   58.911.11   -8% -
1%
PKLookup  156.871.14  154.082.13   -3% -
0%
 Respell   58.721.57   59.601.28   -3% -
6%
  Fuzzy2   60.982.34   62.031.89   -5% -
9%
{noformat}

I think optimizing the all-values-same case is actually quite important for 
payloads (but luceneutil doesn't test this today).

But, curiously, my BlockPacked index is a bit smaller than my Block index (4643 
MB vs 4650 MB).

I do wonder about using long[] to hold the uncompressed results (they only need 
int[]); that's one big difference still.  Also: I'd love to see how 
acceptableOverheadRatio > 0 does ... (and, using PACKED_SINGLE_BLOCK ... we'd 
have to put a bit in the header to record the format).

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JI

[jira] [Commented] (SOLR-3685) solrcloud crashes on startup due to excessive memory consumption

2012-08-07 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430438#comment-13430438
 ] 

Markus Jelsma commented on SOLR-3685:
-

When exactly? Do you have an issue?

> solrcloud crashes on startup due to excessive memory consumption
> 
>
> Key: SOLR-3685
> URL: https://issues.apache.org/jira/browse/SOLR-3685
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Debian GNU/Linux Squeeze 64bit
> Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
>Reporter: Markus Jelsma
>Priority: Critical
> Fix For: 4.1
>
> Attachments: info.log
>
>
> There's a serious problem with restarting nodes, not cleaning old or unused 
> index directories and sudden replication and Java being killed by the OS due 
> to excessive memory allocation. Since SOLR-1781 was fixed index directories 
> get cleaned up when a node is being restarted cleanly, however, old or unused 
> index directories still pile up if Solr crashes or is being killed by the OS, 
> happening here.
> We have a six-node 64-bit Linux test cluster with each node having two 
> shards. There's 512MB RAM available and no swap. Each index is roughly 27MB 
> so about 50MB per node, this fits easily and works fine. However, if a node 
> is being restarted, Solr will consistently crash because it immediately eats 
> up all RAM. If swap is enabled Solr will eat an additional few 100MB's right 
> after start up.
> This cannot be solved by restarting Solr, it will just crash again and leave 
> index directories in place until the disk is full. The only way i can restart 
> a node safely is to delete the index directories and have it replicate from 
> another node. If i then restart the node it will crash almost consistently.
> I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3685) solrcloud crashes on startup due to excessive memory consumption

2012-08-07 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430448#comment-13430448
 ] 

Mark Miller commented on SOLR-3685:
---

It was tagged to this issue number:

+* SOLR-3685: Solr Cloud sometimes skipped peersync attempt and replicated 
instead due
+  to tlog flags not being cleared when no updates were buffered during a 
previous
+  replication.  (Markus Jelsma, Mark Miller, yonik)

> solrcloud crashes on startup due to excessive memory consumption
> 
>
> Key: SOLR-3685
> URL: https://issues.apache.org/jira/browse/SOLR-3685
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Debian GNU/Linux Squeeze 64bit
> Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
>Reporter: Markus Jelsma
>Priority: Critical
> Fix For: 4.1
>
> Attachments: info.log
>
>
> There's a serious problem with restarting nodes, not cleaning old or unused 
> index directories and sudden replication and Java being killed by the OS due 
> to excessive memory allocation. Since SOLR-1781 was fixed index directories 
> get cleaned up when a node is being restarted cleanly, however, old or unused 
> index directories still pile up if Solr crashes or is being killed by the OS, 
> happening here.
> We have a six-node 64-bit Linux test cluster with each node having two 
> shards. There's 512MB RAM available and no swap. Each index is roughly 27MB 
> so about 50MB per node, this fits easily and works fine. However, if a node 
> is being restarted, Solr will consistently crash because it immediately eats 
> up all RAM. If swap is enabled Solr will eat an additional few 100MB's right 
> after start up.
> This cannot be solved by restarting Solr, it will just crash again and leave 
> index directories in place until the disk is full. The only way i can restart 
> a node safely is to delete the index directories and have it replicate from 
> another node. If i then restart the node it will crash almost consistently.
> I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3685) solrcloud crashes on startup due to excessive memory consumption

2012-08-07 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430450#comment-13430450
 ] 

Mark Miller commented on SOLR-3685:
---

I think we still need to make an issue for cleaning up replication directories 
on non graceful shutdown.

I'll rename this issue to match the recovery issue.

And we can create a new issue for the memory thing (I tried to spot that 
locally, but have not yet).

> solrcloud crashes on startup due to excessive memory consumption
> 
>
> Key: SOLR-3685
> URL: https://issues.apache.org/jira/browse/SOLR-3685
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Debian GNU/Linux Squeeze 64bit
> Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
>Reporter: Markus Jelsma
>Priority: Critical
> Fix For: 4.1
>
> Attachments: info.log
>
>
> There's a serious problem with restarting nodes, not cleaning old or unused 
> index directories and sudden replication and Java being killed by the OS due 
> to excessive memory allocation. Since SOLR-1781 was fixed index directories 
> get cleaned up when a node is being restarted cleanly, however, old or unused 
> index directories still pile up if Solr crashes or is being killed by the OS, 
> happening here.
> We have a six-node 64-bit Linux test cluster with each node having two 
> shards. There's 512MB RAM available and no swap. Each index is roughly 27MB 
> so about 50MB per node, this fits easily and works fine. However, if a node 
> is being restarted, Solr will consistently crash because it immediately eats 
> up all RAM. If swap is enabled Solr will eat an additional few 100MB's right 
> after start up.
> This cannot be solved by restarting Solr, it will just crash again and leave 
> index directories in place until the disk is full. The only way i can restart 
> a node safely is to delete the index directories and have it replicate from 
> another node. If i then restart the node it will crash almost consistently.
> I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3685) Solr Cloud sometimes skipped peersync attempt and replicated instead due to tlog flags not being cleared when no updates were buffered during a previous replication.

2012-08-07 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3685:
--

Fix Version/s: (was: 4.1)
   4.0
   5.0
 Assignee: Yonik Seeley
  Summary: Solr Cloud sometimes skipped peersync attempt and replicated 
instead due to tlog flags not being cleared when no updates were buffered 
during a previous replication.  (was: solrcloud crashes on startup due to 
excessive memory consumption)

> Solr Cloud sometimes skipped peersync attempt and replicated instead due to 
> tlog flags not being cleared when no updates were buffered during a previous 
> replication.
> -
>
> Key: SOLR-3685
> URL: https://issues.apache.org/jira/browse/SOLR-3685
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Debian GNU/Linux Squeeze 64bit
> Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
>Reporter: Markus Jelsma
>Assignee: Yonik Seeley
>Priority: Critical
> Fix For: 5.0, 4.0
>
> Attachments: info.log
>
>
> There's a serious problem with restarting nodes, not cleaning old or unused 
> index directories and sudden replication and Java being killed by the OS due 
> to excessive memory allocation. Since SOLR-1781 was fixed index directories 
> get cleaned up when a node is being restarted cleanly, however, old or unused 
> index directories still pile up if Solr crashes or is being killed by the OS, 
> happening here.
> We have a six-node 64-bit Linux test cluster with each node having two 
> shards. There's 512MB RAM available and no swap. Each index is roughly 27MB 
> so about 50MB per node, this fits easily and works fine. However, if a node 
> is being restarted, Solr will consistently crash because it immediately eats 
> up all RAM. If swap is enabled Solr will eat an additional few 100MB's right 
> after start up.
> This cannot be solved by restarting Solr, it will just crash again and leave 
> index directories in place until the disk is full. The only way i can restart 
> a node safely is to delete the index directories and have it replicate from 
> another node. If i then restart the node it will crash almost consistently.
> I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4295) smoke test an unsigned release in hudson nightly

2012-08-07 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-4295:
---

 Summary: smoke test an unsigned release in hudson nightly
 Key: LUCENE-4295
 URL: https://issues.apache.org/jira/browse/LUCENE-4295
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir


Currently to build a release, its a huge battle to get the smoke tester "up to 
speed"
so the checks are actually current and everything works, and its python so
its pretty fragile (no compile time checking).

So I think its time to do this in the nightly build, otherwise release 
managers, python
gurus, policeman, and whoever just likes punishment has to do a lot of work 
before the release
to adapt to all the changes: its easier to keep this stuff maintained 
incrementally.

We need a top-level task 'nightly-smoke' that does prepare-release-no-sign for 
lucene + solr,
then smoke tests it and fails if the smoke tester fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-07 Thread Han Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-4283:
--

Attachment: LUCENE-4283-record-next-skip.patch

now we record next skip point, and try not to use skipper when target is still 
within current buffer.

> Support more frequent skip with Block Postings Format
> -
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Han Jiang
>Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, 
> LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch, 
> LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, 
> LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
> Every time the skipper reaches the last level 0 skip point, we'll have to 
> decode a whole block to read doc/freq data. Also,  a higher level skip list 
> will be created only for those df>blockSize^k, which means for most terms, 
> skipping will just be a linear scan. If we increase current blockSize for 
> better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set 
> skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4295) smoke test an unsigned release in hudson nightly

2012-08-07 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4295:


Attachment: LUCENE-4295.patch

this isn't quite working yet, but its close. 

gotta make sure smoketester is really working with isSigned = false

> smoke test an unsigned release in hudson nightly
> 
>
> Key: LUCENE-4295
> URL: https://issues.apache.org/jira/browse/LUCENE-4295
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Robert Muir
> Attachments: LUCENE-4295.patch
>
>
> Currently to build a release, its a huge battle to get the smoke tester "up 
> to speed"
> so the checks are actually current and everything works, and its python so
> its pretty fragile (no compile time checking).
> So I think its time to do this in the nightly build, otherwise release 
> managers, python
> gurus, policeman, and whoever just likes punishment has to do a lot of work 
> before the release
> to adapt to all the changes: its easier to keep this stuff maintained 
> incrementally.
> We need a top-level task 'nightly-smoke' that does prepare-release-no-sign 
> for lucene + solr,
> then smoke tests it and fails if the smoke tester fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-3429) new GatherTransformer

2012-08-07 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened SOLR-3429:



Giovanni: resolving an issue signifies that there is nothing left to do, but in 
this case nothing has been committed, so it should certainly be left open for 
consideration.


> new GatherTransformer
> -
>
> Key: SOLR-3429
> URL: https://issues.apache.org/jira/browse/SOLR-3429
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA
>Reporter: Giovanni Bricconi
>Priority: Minor
>  Labels: json
> Fix For: 4.0-ALPHA
>
> Attachments: SOLR-3429.patch
>
>
> This is a new transformer for dih.
> I'm often asked to import a lot of fields, many of these fields are read only 
> and sould not be searched.
> I found useful to gather them in a single json field, and returning them 
> untouched to the client.
> This patch provides a transformer that collects a list of db columns an 
> writes out a json map that contains all of them.
> A regression test is included. 
> A new dependency for jsonic has been added to dih, (already used by langid), 
> I can use a different library if needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430507#comment-13430507
 ] 

Michael McCandless commented on LUCENE-3892:


I tried smaller block sizes than 128.  Here's 128 (base) vs 64:
{noformat}
TaskQPS base StdDev base QPS block64StdDev block64  Pct 
diff
 AndHighHigh   23.910.57   22.280.27  -10% -   
-3%
  AndHighMed   60.631.02   56.961.13   -9% -   
-2%
 MedSloppyPhrase7.690.017.300.13   -6% -   
-3%
HighSloppyPhrase1.930.021.830.04   -8% -   
-1%
 LowSloppyPhrase6.840.036.570.11   -6% -   
-1%
  Fuzzy1   65.490.85   63.501.68   -6% -
0%
  HighPhrase1.570.041.530.04   -7% -
3%
   OrHighLow   22.890.98   22.380.61   -8% -
4%
   OrHighMed   17.650.70   17.270.43   -8% -
4%
  IntNRQ9.500.489.330.36  -10% -
7%
  OrHighHigh8.980.368.840.19   -7% -
4%
HighTerm   29.602.64   29.161.44  -13% -   
13%
  Fuzzy2   65.540.86   64.632.13   -5% -
3%
Wildcard   45.271.27   44.780.48   -4% -
2%
 MedTerm  150.40   12.65  148.996.63  -12% -   
12%
 Prefix3   72.552.55   72.311.02   -5% -
4%
 LowTerm  421.62   38.27  422.409.47  -10% -   
12%
 LowSpanNear7.550.347.620.22   -6% -
8%
HighSpanNear1.340.091.350.06   -9% -   
12%
   MedPhrase   12.450.24   12.660.13   -1% -
4%
 Respell   59.541.80   60.951.86   -3% -
8%
 MedSpanNear3.700.243.800.15   -7% -   
14%
PKLookup  154.562.45  158.961.890% -
5%
   LowPhrase   20.210.33   20.950.151% -
6%
  AndHighLow  577.81   12.46  637.96   29.803% -   
18%
{noformat}

And 128 (base) vs 32:
{noformat}
TaskQPS base StdDev base QPS block64StdDev block64  Pct 
diff
 AndHighHigh   23.860.52   20.680.59  -17% -   
-8%
  IntNRQ9.480.388.840.46  -15% -
2%
HighSloppyPhrase1.870.041.760.06  -11% -
0%
 Prefix3   72.652.18   68.242.96  -12% -
1%
HighTerm   29.911.40   28.282.94  -19% -
9%
Wildcard   44.740.83   42.431.49  -10% -
0%
HighSpanNear1.370.081.300.07  -15% -
6%
 MedTerm  152.735.28  145.45   14.69  -17% -
8%
 MedSloppyPhrase7.460.127.120.25   -9% -
0%
  HighPhrase1.570.031.500.01   -7% -   
-1%
   OrHighLow   22.940.70   22.001.10  -11% -
3%
  AndHighMed   58.721.79   56.601.95   -9% -
2%
 LowSloppyPhrase6.670.106.440.20   -7% -
1%
   OrHighMed   17.520.56   17.000.82  -10% -
5%
 LowSpanNear7.530.357.340.39  -11% -
7%
  OrHighHigh8.840.318.620.43  -10% -
6%
 MedSpanNear3.790.203.710.21  -12% -
9%
PKLookup  153.343.22  150.194.91   -7% -
3%
  Fuzzy1   62.931.77   62.282.23   -7% -
5%
 LowTerm  410.23   21.57  410.83   35.19  -13% -   
14%
   MedPhrase   12.550.14   12.650.080% -
2%
   LowPhrase   20.420.17   20.770.210% -
3%
  Fuzzy2   61.443.12   64.131.97   -3% -   
13%
 Respell   56.653.29   60.211.39   -1% -   
15%
  AndHighLow  588.05   12.37  720.63   19.33   16% -   
28%
{noformat}

It looks like there's some speedup to AndHighLow and LowPhrase ... but
slowdowns in other (harder) queries... so I think net/net we should
leave block size at 128.


> Add a useful intblock postings format (eg, F

[jira] [Commented] (LUCENE-2145) TokenStream.close() is called multiple times per TokenStream instance

2012-08-07 Thread Benjamin Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430554#comment-13430554
 ] 

Benjamin Douglas commented on LUCENE-2145:
--

This looks to be intentional. Calling close() on the token stream is designed 
to release the Reader, which should happen as soon as you know you are done 
with it. LUCENE-2387 explains the negative side-effects of holding onto Readers 
too long. Calling analyzer.reusableTokenStream() the next time will provide a 
new Reader. 

If the external resource is tied to the Reader, then it should also be released 
when TokenStream.close() is called. Only that data that is independent of 
current text should survive to the next reusableTokenStream() call.

> TokenStream.close() is called multiple times per TokenStream instance
> -
>
> Key: LUCENE-2145
> URL: https://issues.apache.org/jira/browse/LUCENE-2145
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index, core/queryparser
>Affects Versions: 2.9, 2.9.1, 3.0
> Environment: Solr 1.4.0
>Reporter: KuroSaka TeruHiko
>
> I have a Tokenizer that uses an external resource.  I wrote this Tokenizer so 
> that the external resource is released in its close() method.
> This should work because close() is supposed to be called when the caller is 
> done with the TokenStream of which Tokenizer is a subclass.  TokenStream's 
> API document 
> 
>  states:
> {noformat}
> 6. The consumer calls close() to release any resource when finished using the 
> TokenStream. 
> {noformat}
> When I used my Tokenizer from Solr 1.4.0, it did not work as expected.  An 
> error analysis suggests an instance of my Tokenizer is used even after 
> close() is called and the external resource is released. After a further 
> analysis it seems that it is not Solr but Lucene itself that is breaking the 
> contract.
> This is happening in two places.
> src/java/org/apache/lucene/queryParser/QueryParser.java:
>   protected Query getFieldQuery(String field, String queryText)  throws 
> ParseException {
> // Use the analyzer to get all the tokens, and then build a TermQuery,
> // PhraseQuery, or nothing based on the term count
> TokenStream source;
> try {
>   source = analyzer.reusableTokenStream(field, new 
> StringReader(queryText));
>   source.reset();
> .
> .
> .
>  try {
>   // rewind the buffer stream
>   buffer.reset();
>   // close original stream - all tokens buffered
>   source.close(); // < HERE
> }
> src/java/org/apache/lucene/index/DocInverterPerField.java
> public void processFields(final Fieldable[] fields,
> final int count) throws IOException {
> ...
>   } finally {
> stream.close();
>   }
> Calling close() would be good if the TokenStream is not reusable one. But 
> when it is reusable, it might be used again, so the resource associated with 
> the TokenStream instance should not be released.  close() needs to be called 
> selectively only when it know it is not going to be reused. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4296) Update/clean up Maven POMs and documentation

2012-08-07 Thread Steven Rowe (JIRA)

Steven Rowe created LUCENE-4296:
---

 Summary: Update/clean up Maven POMs and documentation
 Key: LUCENE-4296
 URL: https://issues.apache.org/jira/browse/LUCENE-4296
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Affects Versions: 4.0-BETA, 5.0
Reporter: Steven Rowe
Assignee: Steven Rowe
Priority: Minor


* Remove {{appassembler-maven-plugin}} configurations from all POMs - these are 
unmaintained and bitrotting.
* Update Hudson CI references -> Jenkins 
* Switch scm URLs to refer to property values, to simplify maintenance
* Update README.maven to remove mention of {{modules/}}, and increase minimum 
Ant version from 1.7.X to 1.8.2+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4296) Update/clean up Maven POMs and documentation

2012-08-07 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-4296:


Attachment: LUCENE-4296.patch

Patch implementing these changes.

> Update/clean up Maven POMs and documentation
> 
>
> Key: LUCENE-4296
> URL: https://issues.apache.org/jira/browse/LUCENE-4296
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 4.0-BETA, 5.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
>Priority: Minor
> Attachments: LUCENE-4296.patch
>
>
> * Remove {{appassembler-maven-plugin}} configurations from all POMs - these 
> are unmaintained and bitrotting.
> * Update Hudson CI references -> Jenkins 
> * Switch scm URLs to refer to property values, to simplify maintenance
> * Update README.maven to remove mention of {{modules/}}, and increase minimum 
> Ant version from 1.7.X to 1.8.2+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4295) smoke test an unsigned release in hudson nightly

2012-08-07 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4295.
-

   Resolution: Fixed
Fix Version/s: 4.0
   5.0

I turned this on in hudson. Its likely to be very slow. I think it would 
actually be better if it were a separate jenkins task that ran say, twice a 
week.

But for now we just can't leave packaging etc untested until release time, its 
too painful.

> smoke test an unsigned release in hudson nightly
> 
>
> Key: LUCENE-4295
> URL: https://issues.apache.org/jira/browse/LUCENE-4295
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 5.0, 4.0
>
> Attachments: LUCENE-4295.patch
>
>
> Currently to build a release, its a huge battle to get the smoke tester "up 
> to speed"
> so the checks are actually current and everything works, and its python so
> its pretty fragile (no compile time checking).
> So I think its time to do this in the nightly build, otherwise release 
> managers, python
> gurus, policeman, and whoever just likes punishment has to do a lot of work 
> before the release
> to adapt to all the changes: its easier to keep this stuff maintained 
> incrementally.
> We need a top-level task 'nightly-smoke' that does prepare-release-no-sign 
> for lucene + solr,
> then smoke tests it and fails if the smoke tester fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-07 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430636#comment-13430636
 ] 

Michael McCandless commented on LUCENE-4283:


Thanks Billy, that's a nice optimization!  I think other postings formats 
should do the same thing...

It seems to give a small gain to the skip-heavy queries:
{noformat}
TaskQPS base StdDev baseQPS nextskipStdDev nextskip  
Pct diff
 AndHighHigh   23.870.09   23.560.19   -2% -
0%
  Fuzzy2   63.371.07   62.590.86   -4% -
1%
  OrHighHigh   11.670.08   11.530.35   -4% -
2%
  Fuzzy1   75.441.02   74.590.74   -3% -
1%
   OrHighMed   24.140.18   23.890.72   -4% -
2%
 Respell   62.660.65   62.041.37   -4% -
2%
   OrHighLow   27.860.23   27.600.85   -4% -
2%
HighSloppyPhrase2.000.041.990.05   -5% -
3%
HighSpanNear1.700.021.690.01   -2% -
1%
 LowTerm  517.401.67  514.322.68   -1% -
0%
 LowSloppyPhrase7.610.077.580.16   -3% -
2%
 MedSloppyPhrase6.900.096.880.13   -3% -
2%
PKLookup  192.231.99  191.813.80   -3% -
2%
 Prefix3   82.350.63   82.361.06   -2% -
2%
Wildcard   52.490.44   52.540.41   -1% -
1%
HighTerm   36.030.11   36.090.030% -
0%
  IntNRQ   11.560.07   11.580.030% -
1%
 MedTerm  197.940.88  198.870.360% -
1%
 MedSpanNear4.840.074.860.03   -1% -
2%
 LowSpanNear9.490.269.640.01   -1% -
4%
   LowPhrase   21.950.38   22.390.080% -
4%
  AndHighLow  641.56   10.38  657.495.640% -
5%
   MedPhrase   13.040.30   13.370.050% -
5%
  AndHighMed   67.130.57   69.300.801% -
5%
  HighPhrase1.810.101.870.03   -3% -   
11%
{noformat}


> Support more frequent skip with Block Postings Format
> -
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Han Jiang
>Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, 
> LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch, 
> LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch, 
> LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
> Every time the skipper reaches the last level 0 skip point, we'll have to 
> decode a whole block to read doc/freq data. Also,  a higher level skip list 
> will be created only for those df>blockSize^k, which means for most terms, 
> skipping will just be a linear scan. If we increase current blockSize for 
> better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set 
> skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

setup a jenkins task for twice a week?

2012-08-07 Thread Robert Muir

Hello,

I just committed https://issues.apache.org/jira/browse/LUCENE-4295
(heads up if it calls all hell to break loose).

Basically this builds a release with no GPG signatures and runs the
smoketester on it. I think its important we do this regularly so we
stay releasable (versus now, where when you want to build an RC you
have to fix the smoketester and any bugs it finds).

But I think its going to take quite a few hours, as it runs all tests
for both the lucene and solr releases with both java6 and java7, etc.
Is it possible we can move it out of lucene-4x and lucene-trunk
nightly and have a separate jenkins job that runs say, twice a week to
do this?

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1238) exception in solrJ when authentication is used

2012-08-07 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430701#comment-13430701
 ] 

Markus Jelsma commented on SOLR-1238:
-

We've seen this issue happening over the past few years with or without 
authentication using SolrJ. Perhaps this issue could be renamed and marked for 
current Solr versions if applicable.

I can't remember seeing this exception when using Curl to load data.

> exception in solrJ when authentication is used
> --
>
> Key: SOLR-1238
> URL: https://issues.apache.org/jira/browse/SOLR-1238
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Noble Paul
>Priority: Minor
> Attachments: SOLR-1238.patch
>
>
> see the thread http://markmail.org/thread/w36ih2fnphbubian
> {code}
> I am facing getting error when I am using Authentication in Solr. I
> followed Wiki. The error doesnot appear when I searching. Below is the
> code snippet and the error.
> Please note I am using Solr 1.4 Development build from SVN.
>HttpClient client=new HttpClient();
>AuthScope scope = new 
> AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT,null, null);
>client.getState().setCredentials(scope,new 
> UsernamePasswordCredentials("guest", "guest"));
>SolrServer server =new 
> CommonsHttpSolrServer("http://localhost:8983/solr",client);
>SolrInputDocument doc1=new SolrInputDocument();
>//Add fields to the document
>doc1.addField("employeeid", "1237");
>doc1.addField("employeename", "Ann");
>doc1.addField("employeeunit", "etc");
>doc1.addField("employeedoj", "1995-11-31T23:59:59Z");
>server.add(doc1);
> Exception in thread "main"
> org.apache.solr.client.solrj.SolrServerException:
> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
> enclosing request can not be repeated.
>at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:468)
>at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>at 
> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)
>at test.SolrAuthenticationTest.(SolrAuthenticationTest.java:49)
>at test.SolrAuthenticationTest.main(SolrAuthenticationTest.java:113)
> Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
> entity enclosing request can not be repeated.
>at 
> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
>at 
> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>at 
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>at 
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415)
>... 5 more.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4296) Update/clean up Maven POMs and documentation

2012-08-07 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved LUCENE-4296.
-

   Resolution: Fixed
Fix Version/s: 4.0
   5.0

Committed:

* [r1370513|http://svn.apache.org/viewvc?view=revision&revision=1370513]: trunk
* [r1370561|http://svn.apache.org/viewvc?view=revision&revision=1370561]: 
branch_4x

> Update/clean up Maven POMs and documentation
> 
>
> Key: LUCENE-4296
> URL: https://issues.apache.org/jira/browse/LUCENE-4296
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Affects Versions: 4.0-BETA, 5.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 5.0, 4.0
>
> Attachments: LUCENE-4296.patch
>
>
> * Remove {{appassembler-maven-plugin}} configurations from all POMs - these 
> are unmaintained and bitrotting.
> * Update Hudson CI references -> Jenkins 
> * Switch scm URLs to refer to property values, to simplify maintenance
> * Update README.maven to remove mention of {{modules/}}, and increase minimum 
> Ant version from 1.7.X to 1.8.2+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (32bit/ibm-j9-jdk6) - Build # 261 - Failure!

2012-08-07 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/261/
Java: 32bit/ibm-j9-jdk6 

2 tests failed.
REGRESSION:  
org.apache.lucene.spatial.bbox.TestBBoxStrategy.testCitiesWithinBBox

Error Message:


Stack Trace:
java.lang.AssertionError
at 
__randomizedtesting.SeedInfo.seed([46E649F8AC0C7C28:510BD4C38CCF8D13]:0)
at 
org.apache.lucene.util.packed.GrowableWriter.ensureCapacity(GrowableWriter.java:70)
at 
org.apache.lucene.util.packed.GrowableWriter.set(GrowableWriter.java:83)
at org.apache.lucene.util.fst.FST.pack(FST.java:1505)
at 
org.apache.lucene.codecs.memory.MemoryPostingsFormat$TermsWriter.finish(MemoryPostingsFormat.java:273)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:550)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:481)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:419)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:313)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1428)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1107)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:186)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:145)
at 
org.apache.lucene.spatial.SpatialTestCase.addDocumentsAndCommit(SpatialTestCase.java:67)
at 
org.apache.lucene.spatial.StrategyTestCase.getAddAndVerifyIndexedDocuments(StrategyTestCase.java:75)
at 
org.apache.lucene.spatial.bbox.TestBBoxStrategy.testCitiesWithinBBox(TestBBoxStrategy.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:600)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.

[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.6.0_33) - Build # 135 - Failure!

2012-08-07 Thread Policeman Jenkins Server

Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/135/
Java: 32bit/jdk1.6.0_33 -server -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.solr.servlet.SolrRequestParserTest.testStreamURL

Error Message:
Read timed out

Stack Trace:
java.net.SocketTimeoutException: Read timed out
at 
__randomizedtesting.SeedInfo.seed([DE02E95B46EE9A76:8737D74832CC5743]:0)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at 
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:85)
at 
org.apache.solr.servlet.SolrRequestParserTest.testStreamURL(SolrRequestParserTest.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnor

[jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430798#comment-13430798
 ] 

Han Jiang commented on LUCENE-3892:
---

Thanks Mike. And detailed comparison result on my computer is here: 
http://pastebin.com/HLaAuCNp
I tried block size range from 1024~32, also used 128 as the base.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-07 Thread Raintung Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430839#comment-13430839
 ] 

Raintung Li commented on SOLR-3684:
---

I just check all solr/lucene analyzer, the entrance is method createComponents, 
for different field name use different TokenStreamComponent that cause field 
type's cache is avalid. Is this way?
 protected TokenStreamComponents createComponents(String fieldName,
  Reader reader) {
...
}
the parameter fieldName doesn't be used in solr/lucene self analyzer, maybe we 
can remove this parameter direct in Analyzer.java. We can identify this that 
one field type only match one analyzer. 

The other simple way is define that only solr/lucene package path use field 
type's cache, the other custom's analyzer use field name's cache.

For different path thread pool issue, maybe can user different port to handle, 
we do it in the tomcat.

> Frequently full gc while do pressure index
> --
>
> Key: SOLR-3684
> URL: https://issues.apache.org/jira/browse/SOLR-3684
> Project: Solr
>  Issue Type: Improvement
>  Components: multicore
>Affects Versions: 4.0-ALPHA
> Environment: System: Linux
> Java process: 4G memory
> Jetty: 1000 threads 
> Index: 20 field
> Core: 5
>Reporter: Raintung Li
>Priority: Critical
>  Labels: garbage, performance
> Fix For: 4.0
>
> Attachments: patch.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Recently we test the Solr index throughput and performance, configure the 20 
> fields do test, the field type is normal text_general, start 1000 threads for 
> Jetty, and define 5 cores.
> After test continued for some time, the solr process throughput is down very 
> quickly. After check the root cause, find the java process always do the full 
> GC. 
> Check the heap dump, the main object is StandardTokenizer, it is be saved in 
> the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
> In the Solr, will use the PerFieldReuseStrategy for the default reuse 
> component strategy, that means one field has one own StandardTokenizer if it 
> use standard analyzer,  and standardtokenizer will occur 32KB memory because 
> of zzBuffer char array.
> The worst case: Total memory = live threads*cores*fields*32KB
> In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
> and those object only thread die can be released.
> Suggestion:
> Every request only handles by one thread that means one document only 
> analyses by one thread.  For one thread will parse the document’s field step 
> by step, so the same field type can use the same reused component. While 
> thread switches the same type’s field analyzes only reset the same component 
> input stream, it can save a lot of memory for same type’s field.
> Total memory will be = live threads*cores*(different fields types)*32KB
> The source code modifies that it is simple; I can provide the modification 
> patch for IndexSchema.java: 
> private class SolrIndexAnalyzer extends AnalyzerWrapper {
> 
>   private class SolrFieldReuseStrategy extends ReuseStrategy {
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public TokenStreamComponents getReusableComponents(String 
> fieldName) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   return componentsPerField != null ? 
> componentsPerField.get(analyzers.get(fieldName)) : null;
> }
> /**
>  * {@inheritDoc}
>  */
> @SuppressWarnings("unchecked")
> public void setReusableComponents(String fieldName, 
> TokenStreamComponents components) {
>   Map componentsPerField = 
> (Map) getStoredValue();
>   if (componentsPerField == null) {
> componentsPerField = new HashMap TokenStreamComponents>();
> setStoredValue(componentsPerField);
>   }
>   componentsPerField.put(analyzers.get(fieldName), components);
> }
>   }
>   
> protected final static HashMap analyzers;
> /**
>  * Implementation of {@link ReuseStrategy} that reuses components 
> per-field by
>  * maintaining a Map of TokenStreamComponent per field name.
>  */
> 
> SolrIndexAnalyzer() {
>   super(new solrFieldReuseStrategy());
>   analyzers = analyzerCache();
> }
> protected HashMap analyzerCache() {
>   HashMap cache = new HashMap();
>   for (SchemaField f : getFields().values()) {
> Analyzer analyzer = f.getType().getAnalyzer();
> cache.put(f.getName(), analyzer);
>   }
>   return cache;
> }
> @Override
> protected Analyzer

[jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396987#comment-13396987
 ] 

Han Jiang edited comment on LUCENE-3892 at 8/8/12 3:34 AM:
---

Oh, thank you Mike! I haven't thought too much about those skipping policies.

bq. Up above, in ForFactory, when we readInt() to get numBytes ... it seems 
like we could stuff the header numBits into that same int and save checking 
that in FORUtil.decompress
Ah, yes, I just forgot to remove the redundant codes. Here is a initial try to 
remove header and call ForDecompressImpl directly in readBlock():with For, 
blockSize=128. Data in bracket show prior benchmark.
{noformat}
TaskQPS Base StdDev Base QPS For  StdDev For  Pct 
diff
  Phrase4.990.373.570.26  -38% -  
-17% (-44% -  -18%)
  AndHighMed   28.912.17   22.660.82  -29% -  
-12% (-38% -   -9%)
SpanNear2.720.142.220.13  -26% -   
-8% (-36% -   -8%)
SloppyPhrase4.240.263.700.16  -21% -   
-3% (-33% -   -6%)
 Respell   40.712.59   37.661.36  -16% -
2% (-18% -0%)
  Fuzzy1   43.222.01   40.660.32  -10% -
0% (-12% -0%)
  Fuzzy2   16.250.90   15.640.26  -10% -
3% (-12% -3%)
Wildcard   19.070.86   19.070.73   -8% -
8% (-21% -3%)
 AndHighHigh7.760.477.770.15   -7% -
8% (-21% -   10%)
PKLookup   87.504.56   88.511.24   -5% -
8% ( -2% -5%)
TermBGroup1M   20.420.87   21.320.74   -3% -   
12% (  2% -   10%)
   OrHighMed5.330.685.610.14   -9% -   
23% (-16% -   25%)
  OrHighHigh4.430.534.690.12   -8% -   
23% (-15% -   24%)
 TermGroup1M   13.300.34   14.310.402% -   
13% (  0% -   13%)
  TermBGroup1M1P   20.920.59   23.710.866% -   
20% ( -1% -   22%)
 Prefix3   30.301.41   35.141.765% -   
27% (-14% -   21%)
  IntNRQ3.900.544.580.47   -7% -   
50% (-25% -   33%)
Term   42.171.55   52.332.57   13% -   
35% (  1% -   33%)
{noformat}
-The improvement is quite general. However, I still suppose this just benefits 
from less method calling. I'm trying to change the PFor codes, and remove those 
nested call.- (this is not actually true, since I was using percentage diff 
instead of QPS during comparison)

bq. Get more direct access to the file as an int[]; ...
Ok, this will be considered when the pfor+pulsing is completed. I'm just 
curious why we don't have readInts in ora.util yet...

bq. Skipping: can we partially decode a block? ...
The pfor-opt approach(encode lower bits of exception in normal area, and other 
bits in exception area)  natually fits "partially decode a block", that'll be 
possible when we optimize skipping queries.

  was (Author: billy):
Oh, thank you Mike! I haven't thought too much about those skipping 
policies.

bq. Up above, in ForFactory, when we readInt() to get numBytes ... it seems 
like we could stuff the header numBits into that same int and save checking 
that in FORUtil.decompress
Ah, yes, I just forgot to remove the redundant codes. Here is a initial try to 
remove header and call ForDecompressImpl directly in readBlock():with For, 
blockSize=128. Data in bracket show prior benchmark.
{noformat}
TaskQPS Base StdDev Base QPS For  StdDev For  Pct 
diff
  Phrase4.990.373.570.26  -38% -  
-17% (-44% -  -18%)
  AndHighMed   28.912.17   22.660.82  -29% -  
-12% (-38% -   -9%)
SpanNear2.720.142.220.13  -26% -   
-8% (-36% -   -8%)
SloppyPhrase4.240.263.700.16  -21% -   
-3% (-33% -   -6%)
 Respell   40.712.59   37.661.36  -16% -
2% (-18% -0%)
  Fuzzy1   43.222.01   40.660.32  -10% -
0% (-12% -0%)
  Fuzzy2   16.250.90   15.640.26  -10% -
3% (-12% -3%)
Wildcard   19.070.86   19.070.73   -8% -
8% (-21% -3%)
 AndHighHigh7.760.477.770.15   -7% -
8% (-21% -   10%)
PKLookup   87.504.56   88.511.24   -5% -
8% ( -2% -5%)
TermBGroup1M

[jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397228#comment-13397228
 ] 

Han Jiang edited comment on LUCENE-3892 at 8/8/12 3:35 AM:
---

And result for PFor(blocksize=128):
{noformat}
TaskQPS Base StdDev BaseQPS PFor StdDev PFor  Pct 
diff
  Phrase4.870.363.390.18  -38% -  
-20% (-47% -  -25%)
  AndHighMed   27.782.35   21.130.52  -31% -  
-14% (-37% -  -15%)
SpanNear2.700.142.200.11  -26% -   
-9% (-36% -  -13%)
SloppyPhrase4.170.153.770.21  -17% -
0% (-30% -   -6%)
 Respell   39.971.56   37.651.95  -14% -
3% (-15% -2%)
Wildcard   19.080.77   18.330.92  -12% -
5% (-17% -3%)
  Fuzzy1   42.291.13   40.781.44   -9% -
2% (-11% -1%)
 AndHighHigh7.610.557.450.08   -9% -
6% (-19% -6%)
  Fuzzy2   15.790.55   15.640.70   -8% -
7% (-11% -6%)
PKLookup   86.712.13   88.922.24   -2% -
7% ( -2% -7%)
 TermGroup1M   13.040.23   14.030.402% -   
12% (  1% -9%)
  IntNRQ3.970.484.350.61  -15% -   
41% (-16% -   24%)
  TermBGroup1M1P   21.040.35   23.200.605% -   
14% (  0% -   14%)
TermBGroup1M   19.270.47   21.280.843% -   
17% (  1% -   10%)
  OrHighHigh4.130.474.630.27   -5% -   
34% (-14% -   27%)
   OrHighMed4.950.595.580.34   -5% -   
35% (-14% -   27%)
 Prefix3   30.331.36   34.262.141% -   
25% ( -6% -   20%)
Term   41.991.19   50.751.72   13% -   
28% (  2% -   26%)
{noformat}
-It works, and it is quite interesting that StdDev for Term query is reduced 
significantly. - (same as last comment, when comparing two versions 
directly(method call vs. unfolded, the improvement is somewhat noisy))

  was (Author: billy):
And result for PFor(blocksize=128):
{noformat}
TaskQPS Base StdDev BaseQPS PFor StdDev PFor  Pct 
diff
  Phrase4.870.363.390.18  -38% -  
-20% (-47% -  -25%)
  AndHighMed   27.782.35   21.130.52  -31% -  
-14% (-37% -  -15%)
SpanNear2.700.142.200.11  -26% -   
-9% (-36% -  -13%)
SloppyPhrase4.170.153.770.21  -17% -
0% (-30% -   -6%)
 Respell   39.971.56   37.651.95  -14% -
3% (-15% -2%)
Wildcard   19.080.77   18.330.92  -12% -
5% (-17% -3%)
  Fuzzy1   42.291.13   40.781.44   -9% -
2% (-11% -1%)
 AndHighHigh7.610.557.450.08   -9% -
6% (-19% -6%)
  Fuzzy2   15.790.55   15.640.70   -8% -
7% (-11% -6%)
PKLookup   86.712.13   88.922.24   -2% -
7% ( -2% -7%)
 TermGroup1M   13.040.23   14.030.402% -   
12% (  1% -9%)
  IntNRQ3.970.484.350.61  -15% -   
41% (-16% -   24%)
  TermBGroup1M1P   21.040.35   23.200.605% -   
14% (  0% -   14%)
TermBGroup1M   19.270.47   21.280.843% -   
17% (  1% -   10%)
  OrHighHigh4.130.474.630.27   -5% -   
34% (-14% -   27%)
   OrHighMed4.950.595.580.34   -5% -   
35% (-14% -   27%)
 Prefix3   30.331.36   34.262.141% -   
25% ( -6% -   20%)
Term   41.991.19   50.751.72   13% -   
28% (  2% -   26%)
{noformat}
It works, and it is quite interesting that StdDev for Term query is reduced 
significantly.  
  
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.

[jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397228#comment-13397228
 ] 

Han Jiang edited comment on LUCENE-3892 at 8/8/12 3:35 AM:
---

And result for PFor(blocksize=128):
{noformat}
TaskQPS Base StdDev BaseQPS PFor StdDev PFor  Pct 
diff
  Phrase4.870.363.390.18  -38% -  
-20% (-47% -  -25%)
  AndHighMed   27.782.35   21.130.52  -31% -  
-14% (-37% -  -15%)
SpanNear2.700.142.200.11  -26% -   
-9% (-36% -  -13%)
SloppyPhrase4.170.153.770.21  -17% -
0% (-30% -   -6%)
 Respell   39.971.56   37.651.95  -14% -
3% (-15% -2%)
Wildcard   19.080.77   18.330.92  -12% -
5% (-17% -3%)
  Fuzzy1   42.291.13   40.781.44   -9% -
2% (-11% -1%)
 AndHighHigh7.610.557.450.08   -9% -
6% (-19% -6%)
  Fuzzy2   15.790.55   15.640.70   -8% -
7% (-11% -6%)
PKLookup   86.712.13   88.922.24   -2% -
7% ( -2% -7%)
 TermGroup1M   13.040.23   14.030.402% -   
12% (  1% -9%)
  IntNRQ3.970.484.350.61  -15% -   
41% (-16% -   24%)
  TermBGroup1M1P   21.040.35   23.200.605% -   
14% (  0% -   14%)
TermBGroup1M   19.270.47   21.280.843% -   
17% (  1% -   10%)
  OrHighHigh4.130.474.630.27   -5% -   
34% (-14% -   27%)
   OrHighMed4.950.595.580.34   -5% -   
35% (-14% -   27%)
 Prefix3   30.331.36   34.262.141% -   
25% ( -6% -   20%)
Term   41.991.19   50.751.72   13% -   
28% (  2% -   26%)
{noformat}
-It works, and it is quite interesting that StdDev for Term query is reduced 
significantly.- (same as last comment, when comparing two versions 
directly(method call vs. unfolded, the improvement is somewhat noisy))

  was (Author: billy):
And result for PFor(blocksize=128):
{noformat}
TaskQPS Base StdDev BaseQPS PFor StdDev PFor  Pct 
diff
  Phrase4.870.363.390.18  -38% -  
-20% (-47% -  -25%)
  AndHighMed   27.782.35   21.130.52  -31% -  
-14% (-37% -  -15%)
SpanNear2.700.142.200.11  -26% -   
-9% (-36% -  -13%)
SloppyPhrase4.170.153.770.21  -17% -
0% (-30% -   -6%)
 Respell   39.971.56   37.651.95  -14% -
3% (-15% -2%)
Wildcard   19.080.77   18.330.92  -12% -
5% (-17% -3%)
  Fuzzy1   42.291.13   40.781.44   -9% -
2% (-11% -1%)
 AndHighHigh7.610.557.450.08   -9% -
6% (-19% -6%)
  Fuzzy2   15.790.55   15.640.70   -8% -
7% (-11% -6%)
PKLookup   86.712.13   88.922.24   -2% -
7% ( -2% -7%)
 TermGroup1M   13.040.23   14.030.402% -   
12% (  1% -9%)
  IntNRQ3.970.484.350.61  -15% -   
41% (-16% -   24%)
  TermBGroup1M1P   21.040.35   23.200.605% -   
14% (  0% -   14%)
TermBGroup1M   19.270.47   21.280.843% -   
17% (  1% -   10%)
  OrHighHigh4.130.474.630.27   -5% -   
34% (-14% -   27%)
   OrHighMed4.950.595.580.34   -5% -   
35% (-14% -   27%)
 Prefix3   30.331.36   34.262.141% -   
25% ( -6% -   20%)
Term   41.991.19   50.751.72   13% -   
28% (  2% -   26%)
{noformat}
-It works, and it is quite interesting that StdDev for Term query is reduced 
significantly. - (same as last comment, when comparing two versions 
directly(method call vs. unfolded, the improvement is somewhat noisy))
  
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
>  Issue Type: Impr

[jira] [Comment Edited] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)

2012-08-07 Thread Han Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399883#comment-13399883
 ] 

Han Jiang edited comment on LUCENE-3892 at 8/8/12 3:41 AM:
---

Yes, really interesting. And that should make sense. -As far as I know, a 
method with exception handling may be quite slow than a simple if statement 
check.-(Hmm, now I think this is not true, the improvement should mainly come 
the framework change) Here is part of the result in my test, with Mike's patch:
{noformat}
   OrHighMed2.530.312.570.13  -13% -   
21%
Wildcard3.860.123.940.38  -10% -   
15%
  OrHighHigh1.570.181.610.08  -12% -   
21%
  TermBGroup1M1P1.930.032.480.10   21% -   
35%
 TermGroup1M1.370.021.810.05   26% -   
37%
TermBGroup1M1.170.021.640.07   32% -   
47%
Term2.920.134.460.23   38% -   
68%
{noformat}

  was (Author: billy):
Yes, really interesting. And that should make sense. As far as I know, a 
method with exception handling may be quite slow than a simple if statement 
check. Here is part of the result in my test, with Mike's patch:
{noformat}
   OrHighMed2.530.312.570.13  -13% -   
21%
Wildcard3.860.123.940.38  -10% -   
15%
  OrHighHigh1.570.181.610.08  -12% -   
21%
  TermBGroup1M1P1.930.032.480.10   21% -   
35%
 TermGroup1M1.370.021.810.05   26% -   
37%
TermBGroup1M1.170.021.640.07   32% -   
47%
Term2.920.134.460.23   38% -   
68%
{noformat}
  
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-blockFor&hardcode(base).patch, 
> LUCENE-3892-blockFor&packedecoder(comp).patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints-decoder.patch, 
> LUCENE-3892-blockFor-with-packedints.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-handle_open_files.patch, 
> LUCENE-3892-pfor-compress-iterate-numbits.patch, 
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, 
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3720) TermVectorComponent: distributed results give raw df & tf-idf for terms, would be nice to also include "merged" aggregates from all shards

2012-08-07 Thread Hoss Man (JIRA)

Hoss Man created SOLR-3720:
--

 Summary: TermVectorComponent: distributed results give raw df & 
tf-idf for terms, would be nice to also include "merged" aggregates from all 
shards
 Key: SOLR-3720
 URL: https://issues.apache.org/jira/browse/SOLR-3720
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man


I realized when working on SOLR-3229 that options in the TermVectorComponent to 
return "df" and "tf-idf" values for each term wind up giving back values that 
are specific to the shard where the document is -- i think this is "ok" (ie: 
feature, not a bug) because it means you see the values for each term, for  
each document, according to the shard where the document lives, but we should 
consider adding an option to also return the aggregate information for these 
terms across all shards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3229) TermVectorComponent does not return terms in distributed search

2012-08-07 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3229:
---

Attachment: SOLR-3229.patch

Hang: Thank you for your patch.

I agree, the "docid" as a key is dangerous and misleading in distributed mode, 
and we should switch to using the uniqueKey when available, but if we leave 
things as you had it in your patch, existing (single node) users who don't have 
a uniqueKey field would no longer be able to get term vectors at all.

I updated your patch to leave the key alone if there is no uniqueKey, and 
eliminate the "doc-" prefix when there is one.  I also added a new distributed 
test to prove that everything is working, and that turned up a few problems - 
some of which i fixed (dealing with warnings, and ensuring that TVC results are 
in the correct order for the result documents).

One thing i discovered that i'm not sure about is what to do about the "df" and 
"tf-idf" values when requested. in the test they have to be ignored because the 
way the distributed test works is to create a single node instance and compare 
it with a multi-node instance that has identical documents, and in the 
distributed TVC code, these won't match up -- but i'm not sure if that's a bug 
(because the df & tf-idf values aren't "merged" from all nodes) or a feature 
(because you get the real df & tf-idf values for that term for that doc from 
the shard it lives in) ... either way it shouldn't stop fixing the basic 
problem of TVC failing painfully in a distributed request, so i've opened 
SOLR-3720 to track this in the future.

feedback on this revised patch/test would be appreciated

> TermVectorComponent does not return terms in distributed search
> ---
>
> Key: SOLR-3229
> URL: https://issues.apache.org/jira/browse/SOLR-3229
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.0-ALPHA
> Environment: Ubuntu 11.10, openjdk-6
>Reporter: Hang Xie
>Assignee: Hoss Man
>  Labels: patch
> Fix For: 4.0
>
> Attachments: SOLR-3229.patch, TermVectorComponent.patch
>
>
> TermVectorComponent does not return terms in distributed search, the 
> distributedProcess() incorrectly uses Solr Unique Key to do subrequests, 
> while process() expects Lucene document ids. Also, parameters are transferred 
> in different format thus making distributed search returns no result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3229) TermVectorComponent does not return terms in distributed search

2012-08-07 Thread Hang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430876#comment-13430876
 ] 

Hang Xie commented on SOLR-3229:


I use "doc-0" to make it compatible with single node mode so far as I can 
recall, as my client was expecting that for parser. It's all up to you to keep 
"doc-" or not - it seems to me if you keep it, you can reduce lots of changes 
in tests. Other than that I don't have any comment on test thinking of my 
little to no knowledge on solr's test framework.

I remember I read something regarding df/tf-idf in distributed mode is a highly 
anticipated feature, I don't expect that can be done easily, I'm good to have a 
bug there.

> TermVectorComponent does not return terms in distributed search
> ---
>
> Key: SOLR-3229
> URL: https://issues.apache.org/jira/browse/SOLR-3229
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 4.0-ALPHA
> Environment: Ubuntu 11.10, openjdk-6
>Reporter: Hang Xie
>Assignee: Hoss Man
>  Labels: patch
> Fix For: 4.0
>
> Attachments: SOLR-3229.patch, TermVectorComponent.patch
>
>
> TermVectorComponent does not return terms in distributed search, the 
> distributedProcess() incorrectly uses Solr Unique Key to do subrequests, 
> while process() expects Lucene document ids. Also, parameters are transferred 
> in different format thus making distributed search returns no result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Build failed in Jenkins: Lucene-trunk-Linux-Java6-64 #187

2012-08-07 Thread builder

See 

Changes:

[Robert Muir] harden test against internet problems

[sarowe] fix doc bugs (thanks to Oren Bochman)

[Robert Muir] turn on nightly-smoke task

[Robert Muir] LUCENE-4295: add task to build and smoke-test a release for the 
nightly build

[sarowe] LUCENE-4296: Update/clean up Maven POMs and documentation

[sarowe] Mention ReusableAnalyzerBase -> Analyzer

[markrmiller] fix comment

[markrmiller] SOLR-3579: SolrCloud view should default to the graph view rather 
than tree view.

[markrmiller] cancel recovery before trying to sync as new leader - also 
improve logging

[yonik] SOLR-3685: cloud sometimes skipped peersync attempt due to flags not 
being cleared when no updates were buffered during replication

[Robert Muir] rename confusing variables: numDocs is really docFreq, docFreqs 
is really termFreqs, termDocFreq is termFreq

[yonik] SOLR-3685: cloud sometimes skipped peersync attempt due to flags not 
being cleared when no updates were buffered during replication

[Robert Muir] add test for when fixed dv is not really fixed length

[Robert Muir] add 4.0 section

[markrmiller] Add CHANGES entry for SOLR-3647

[sarowe] SOLR-1725: remove all rhino inclusion hacks; I've copied the jars to 
/usr/local/openjdk{6,7}/jre/lib/ext/, which should directly include them in 
JVMs' class paths

[sarowe] SOLR-1725: fix copy/paste-o (extra export in mvn cmdline)

[sarowe] SOLR-1725: Add rhino javascript engine jars to maven jvm's test boot 
class path (moved to mvn cmdline invocation surefire -DargLine parameter)

[uschindler] Make TESTS_PARALLELISM configureable from Jenkins Job

[yonik] tests: fix test of unordered namedlist, skip explain comparisons

[sarowe] LUCENE-2510: Add resources directories, containing 
META-INF/services/o.a.l.analysis.util.*Factory, to POMs for analysis modules 
that previously didn't have them, so that these files will make it into the 
Maven-produced jars

[sarowe] SOLR-1725: Add rhino javascript engine jars to maven jvm's boot class 
path

[ehatcher] fix typo

[mikemccand] LUCENE-4292: cannot assert numSearches > 0 in this test

--
[...truncated 71058 lines...]

check-memory-uptodate:

jar-memory:

check-misc-uptodate:

jar-misc:

check-spatial-uptodate:

jar-spatial:

check-grouping-uptodate:

jar-grouping:

check-queries-uptodate:

jar-queries:

check-queryparser-uptodate:

jar-queryparser:

prep-lucene-jars:

resolve-example:

resolve:

common.init:

compile-lucene-core:

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 


resolve:

init:

-clover.disable:

-clover.setup:

clover:

common.compile-core:

compile-core:

init:

-clover.disable:

-clover.setup:

clover:

common.compile-core:

common-solr.compile-core:

compile-core:

jar-core:
  [jar] Building jar: 


jar-src:
  [jar] Building jar: 


define-lucene-javadoc-url-SNAPSHOT:

define-lucene-javadoc-url-release:

define-lucene-javadoc-url:

check-lucene-core-javadocs-uptodate:

javadocs-lucene-core:

check-analyzers-common-javadocs-uptodate:

javadocs-analyzers-common:

check-analyzers-icu-javadocs-uptodate:

javadocs-analyzers-icu:

check-analyzers-kuromoji-javadocs-uptodate:

javadocs-analyzers-kuromoji:

check-analyzers-phonetic-javadocs-uptodate:

javadocs-analyzers-phonetic:

check-analyzers-smartcn-javadocs-uptodate:

javadocs-analyzers-smartcn:

check-analyzers-morfologik-javadocs-uptodate:

javadocs-analyzers-morfologik:

check-analyzers-stempel-javadocs-uptodate:

javadocs-analyzers-stempel:

check-analyzers-uima-javadocs-uptodate:

javadocs-analyzers-uima:

check-suggest-javadocs-uptodate:

javadocs-suggest:

check-grouping-javadocs-uptodate:

javadocs-grouping:

check-queries-javadocs-uptodate:

javadocs-queries:

check-queryparser-javadocs-uptodate:

javadocs-queryparser:

check-highlighter-javadocs-uptodate:

javadocs-highlighter:

check-memory-javadocs-uptodate:

javadocs-memory:

check-misc-javadocs-uptodate:

javadocs-misc:

check-spatial-javadocs-uptodate:

javadocs-spatial:

lucene-javadocs:

javadocs:
[mkdir] Created dir: 

 [echo] Building solr-velocity...

download-java6-javadoc-packagelist:
 [copy] Copying 1 file to 

  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.solr.response...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0_33
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 

  [javadoc] Note: Custom tags that were not seen:  @lucene.internal, 
@lucene.experimental
  [jar] Building jar: 


dist-maven-common:
[artifact:install-provider] Installing provider: 
org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7:runtime
[artifact:deploy] Deploying to 
file://

[jira] [Updated] (LUCENE-4283) Support more frequent skip with Block Postings Format

2012-08-07 Thread Han Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-4283:
--

Attachment: LUCENE-4283-record-skip&inlining-scanning.patch

This patch should also inline scanning for EverythingEnum, and remove some 
condition statements(refillDocs etc.) out from the while loop.

> Support more frequent skip with Block Postings Format
> -
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Han Jiang
>Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch, 
> LUCENE-4283-codes-cleanup.patch, LUCENE-4283-record-next-skip.patch, 
> LUCENE-4283-record-skip&inlining-scanning.patch, LUCENE-4283-slow.patch, 
> LUCENE-4283-small-interval-fully.patch, 
> LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize. 
> Every time the skipper reaches the last level 0 skip point, we'll have to 
> decode a whole block to read doc/freq data. Also,  a higher level skip list 
> will be created only for those df>blockSize^k, which means for most terms, 
> skipping will just be a linear scan. If we increase current blockSize for 
> better bulk i/o performance, current skip setting will be a bottleneck. 
> For ForPF, the encoded block can be easily splitted if we set 
> skipInterval=32*k. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

60 matches

Mail list logo