[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811890#comment-13811890
 ] 

ASF subversion and git services commented on LUCENE-5189:
-

Commit 1538146 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1538146 ]

LUCENE-5189: rename internal API following NumericDocValues updates

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
 LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
 LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
 LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
 LUCENE-5189_process_events.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5189) Numeric DocValues Updates

2013-11-02 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5189.


   Resolution: Fixed
Fix Version/s: 5.0
   4.6

Finished backporting the changes to 4x. Thanks all for your valuable comments 
and help to get this feature in!

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
 LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
 LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
 LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
 LUCENE-5189_process_events.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5381) Split Clusterstate and scale

2013-11-02 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809096#comment-13809096
 ] 

Noble Paul edited comment on SOLR-5381 at 11/2/13 7:51 AM:
---

OK ,
here is the plan to split clusterstate on a per collection basis

h2. How to use this feature?
Introduce a new option while creating a collection (external=true) . This will 
keep the state of the collection in a separate node. 
example :

http://localhost:8983/solr/admin/collections?action=CREATEname=xcollnumShards=5replicationFactor=2external=true

This will result in this following entry in clusterstate.json
{code:JavaScript}
{
 “xcoll” : {“ex”:true}
}
{code}
there will be another ZK entry which carries the actual collection information
*  /collections
** /xcoll
*** /state.json
{code:JavaScript}
{xcoll:{
shards:{shard1:{
range:”8000-b332”l,
state:active,
replicas:{
   core_node1:{
  state:active,
  base_url:http://192.168.1.5:8983/solr;,
   core:xcoll_shard1_replica1,
node_name:192.168.1.5:8983_solr,
leader:true,
router:{name:compositeId}}}
{code}

The main Overseer thread is responsible for creating collections and managing 
all the events for all the collections in the clusterstate.json . 
clusterstate.json is modified only when a collection is created/deleted or when 
state updates happen to “non-external” collections

Each external collection to have its own Overseer queue as follows. There will 
be a separate thread for each external collection.  

* /collections
** /xcoll
*** /overseer
 /collection-queue-work
 /queue
  /queue-work


h2. SolrJ enhancements
SolrJ would only listen to cluterstate,json. When a request comes for a 
collection ‘xcoll’
* it would first check if such a collection exists
* If yes it first looks up the details in the local cache for that collection 
* If not found in cache , it fetches the node /collections/xcoll/state.json and 
caches the information 
* Any query/update will be sent with extra query param specifying the 
collection name , shard name, Role (Leader/Replica), and range (example 
\_target_=xcoll:shard1:L:8000-b332) . A node would throw an error 
(INVALID_NODE) if it does not the serve the collection/shard/Role/range combo.
* If a SolrJ gets INVALID_NODE error it  would invalidate the cache and fetch 
fresh state information for that collection (and caches it again).

h2. Changes to each Solr Node
Each node would only listen to the clusterstate.json and the states of 
collections which it is a member of. If a request comes for a collection it 
does not serve, it first checks for the \_target_ param. All collections 
present in the clusterstate.json will be deemed as collections it serves
* If the param is present and the node does not serve that 
collection/shard/Role/Range combo, an INVALID_NODE error is thrown
** If the validation succeeds it is served 
* If the param is not present and the node is a member of the collection, the 
request is served by 
** If the node is not a member of the collection,  it uses SolrJ to proxy the 
request to appropriate location

Internally , the node really does not care about the state of external 
collections. If/when it is required, the information is fetched real time from 
ZK and used and thrown away.

h2. Changes to admin GUI
External collections are not shown graphically in the admin UI . 




was (Author: noble.paul):
OK ,
here is the plan to split clusterstate on a per collection basis

h2. How to use this feature?
Introduce a new option while creating a collection (external=true) . This will 
keep the state of the collection in a separate node. 
example :

http://localhost:8983/solr/admin/collections?action=CREATEname=xcollnumShards=5replicationFactor=2external=true

This will result in this following entry in clusterstate.json
{code:JavaScript}
{
 “xcoll” : {“ex”:true}
}
{code}
there will be another ZK entry which carries the actual collection information
*  /collections
** /xcoll
*** /state.json
{code:JavaScript}
{xcoll:{
shards:{shard1:{
range:”8000-b332”l,
state:active,
replicas:{
   core_node1:{
  state:active,
  base_url:http://192.168.1.5:8983/solr;,
   core:xcoll_shard1_replica1,
node_name:192.168.1.5:8983_solr,
leader:true,
router:{name:compositeId}}}
{code}

The main Overseer thread is responsible for creating collections and managing 
all the events for all the collections in the clusterstate.json . 
clusterstate.json is modified only when a collection is created/deleted or when 
state updates happen to “non-external” collections

Each external collection to have its own Overseer queue as follows. There will 
be a separate thread for each external 

[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811952#comment-13811952
 ] 

Michael McCandless commented on LUCENE-5320:


+1

 Create SearcherTaxonomyManager over Directory
 -

 Key: LUCENE-5320
 URL: https://issues.apache.org/jira/browse/LUCENE-5320
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera

 SearcherTaxonomyManager now only allows working in NRT mode. It could be 
 useful to have an STM which allows reopening a SearcherAndTaxonomy pair over 
 Directories, e.g. for replication. The problem is that if the thread that 
 calls maybeRefresh() is not the one that does the commit(), it could lead to 
 a pair that is not synchronized.
 Perhaps at first we could have a simple version that works under some 
 assumptions, i.e. that the app does the commit + reopen in the same thread in 
 that order, so that it can be used by such apps + when replicating the 
 indexes, and later we can figure out how to generalize it to work even if 
 commit + reopen are done by separate threads/JVMs.
 I'll see if SearcherTaxonomyManager can be extended to support it, or a new 
 STM is required.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)
Erick Erickson created SOLR-5418:


 Summary: Background merge after field removed from solr.xml causes 
error
 Key: SOLR-5418
 URL: https://issues.apache.org/jira/browse/SOLR-5418
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.5
Reporter: Erick Erickson


Problem from the user's list, cut/pasted below. Robert Muir hacked out a quick 
patch he pasted on the dev list, I'll append it shortly.

I am working at implementing solr to work as the search backend for our web
system.  So far things have been going well, but today I made some schema
changes and now things have broken.

I updated the schema.xml file and reloaded the core (via the admin
interface).  No errors were reported in the logs.

I then pushed 100 records to be indexed.  A call to Commit afterwards
seemed fine, however my next call for Optimize caused the following errors:

java.io.IOException: background merge hit exception:
_2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
[maxNumSegments=1]

null:java.io.IOException: background merge hit exception:
_2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
[maxNumSegments=1]


Unfortunately, googling for background merge hit exception came up
with 2 thing: a corrupt index or not enough free space.  The host
machine that's hosting solr has 227 out of 229GB free (according to df
-h), so that's not it.


I then ran CheckIndex on the index, and got the following results:
http://apaste.info/gmGU


As someone who is new to solr and lucene, as far as I can tell this
means my index is fine. So I am coming up at a loss. I'm somewhat sure
that I could probably delete my data directory and rebuild it but I am
more interested in finding out why is it having issues, what is the
best way to fix it, and what is the best way to prevent it from
happening when this goes into production.


Does anyone have any advice that may help?

I helped Matthew find the logs and he posted this stack trace:

1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  â 
start 
commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
1691104153 [http-bio-8080-exec-3] INFO  
org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] webapp=/solr 
path=/update params={optimize=true_=1382999386564wt=jsonwaitFlush=true} {} 0 
224
1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
_30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at 

[jira] [Updated] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5418:
-

Attachment: SOLR-5418.patch

Patch constructed from cut-n-paste of Robert's quick code change on the dev 
list.

Warning, it's cut/paste, not generated from svn so be warned

 Background merge after field removed from solr.xml causes error
 ---

 Key: SOLR-5418
 URL: https://issues.apache.org/jira/browse/SOLR-5418
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.5
Reporter: Erick Erickson
 Attachments: SOLR-5418.patch


 Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
 quick patch he pasted on the dev list, I'll append it shortly.
 I am working at implementing solr to work as the search backend for our web
 system.  So far things have been going well, but today I made some schema
 changes and now things have broken.
 I updated the schema.xml file and reloaded the core (via the admin
 interface).  No errors were reported in the logs.
 I then pushed 100 records to be indexed.  A call to Commit afterwards
 seemed fine, however my next call for Optimize caused the following errors:
 java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 null:java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 Unfortunately, googling for background merge hit exception came up
 with 2 thing: a corrupt index or not enough free space.  The host
 machine that's hosting solr has 227 out of 229GB free (according to df
 -h), so that's not it.
 I then ran CheckIndex on the index, and got the following results:
 http://apaste.info/gmGU
 As someone who is new to solr and lucene, as far as I can tell this
 means my index is fine. So I am coming up at a loss. I'm somewhat sure
 that I could probably delete my data directory and rebuild it but I am
 more interested in finding out why is it having issues, what is the
 best way to fix it, and what is the best way to prevent it from
 happening when this goes into production.
 Does anyone have any advice that may help?
 I helped Matthew find the logs and he posted this stack trace:
 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
 â start 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 1691104153 [http-bio-8080-exec-3] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
 webapp=/solr path=/update 
 params={optimize=true_=1382999386564wt=jsonwaitFlush=true} {} 0 224
 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
 java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
 at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
 at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
 at 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
 at 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
  

[jira] [Assigned] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-5418:


Assignee: Erick Erickson

 Background merge after field removed from solr.xml causes error
 ---

 Key: SOLR-5418
 URL: https://issues.apache.org/jira/browse/SOLR-5418
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.5
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5418.patch


 Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
 quick patch he pasted on the dev list, I'll append it shortly.
 I am working at implementing solr to work as the search backend for our web
 system.  So far things have been going well, but today I made some schema
 changes and now things have broken.
 I updated the schema.xml file and reloaded the core (via the admin
 interface).  No errors were reported in the logs.
 I then pushed 100 records to be indexed.  A call to Commit afterwards
 seemed fine, however my next call for Optimize caused the following errors:
 java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 null:java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 Unfortunately, googling for background merge hit exception came up
 with 2 thing: a corrupt index or not enough free space.  The host
 machine that's hosting solr has 227 out of 229GB free (according to df
 -h), so that's not it.
 I then ran CheckIndex on the index, and got the following results:
 http://apaste.info/gmGU
 As someone who is new to solr and lucene, as far as I can tell this
 means my index is fine. So I am coming up at a loss. I'm somewhat sure
 that I could probably delete my data directory and rebuild it but I am
 more interested in finding out why is it having issues, what is the
 best way to fix it, and what is the best way to prevent it from
 happening when this goes into production.
 Does anyone have any advice that may help?
 I helped Matthew find the logs and he posted this stack trace:
 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
 â start 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 1691104153 [http-bio-8080-exec-3] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
 webapp=/solr path=/update 
 params={optimize=true_=1382999386564wt=jsonwaitFlush=true} {} 0 224
 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
 java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
 at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
 at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
 at 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
 at 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at 
 

[jira] [Updated] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5321:
---

Attachment: LUCENE-5321.patch

I ended up removing everything under o.a.l.facet.codecs/, including 
Facet46Codec. It seemed redundant as all it does is use the app's DVF with the 
facet fields that are returned by 
FacetIndexingParams.getAllCategoryListParams(). It's a waste of time and 
resources to maintain such a Codec.

I also removed some tests which tested Facet42DVF.

 Remove Facet42DocValuesFormat
 -

 Key: LUCENE-5321
 URL: https://issues.apache.org/jira/browse/LUCENE-5321
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5321.patch


 The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
 it stores the addresses in direct int[] rather than PackedInts. On 
 LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
 improves perf for some queries and have negligible effect for others, as well 
 as RAM consumption isn't much worse. We should remove Facet42DVF and use 
 DirectDVF instead.
 I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
 the class whenever the default codec changes (e.g. from 45 to 46) since it 
 doesn't care about the actual Codec version underneath, it only overrides the 
 DVF used for the facet fields. FacetCodec should take the DVF from the app 
 (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
 a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5302) Analytics Component

2013-11-02 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5302:
-

Attachment: SOLR-5302.patch

Please apply my updated version of the patch or make the same changes before 
making a new one or I'll have to re-do some work.

NOTE: This is against trunk!

Working with pre-commit:

 Changes I had to make:

A couple of files were indented with tabs. Since it's a new file, I just 
reformatted them.

The forbidden api checks failed on several files. Mostly requiring either 
Scanners to have UTF-8 specified or String.toLowercase to have Locale.ROOT 
and such-like.

I did most of this on the plane ride home, and I must admit it's annoying to 
have precommit fail because I don't have internet connnectivity, there _must_ 
be a build flag somewhere.

These files have missing javadocs
 [exec]   missing: org.apache.solr.analytics.accumulator
 [exec]   missing: org.apache.solr.analytics.accumulator.facet
 [exec]   missing: org.apache.solr.analytics.expression
 [exec]   missing: org.apache.solr.analytics.plugin
 [exec]   missing: org.apache.solr.analytics.request
 [exec]   missing: org.apache.solr.analytics.statistics
 [exec]   missing: org.apache.solr.analytics.util
 [exec]   missing: org.apache.solr.analytics.util.valuesource
 [exec] 
 [exec] Missing javadocs were found!


Tests failing, and a JVM crash to boot. 

FieldFacetExrasTest fails with unknown field int_id. There's nothing in 
schema-docValues.xml that would map to that field, did it get changed? Is this 
a difference between trunk and 4x?

 - org.apache.solr.analytics.NoFacetTest (suite)
   [junit4]   - org.apache.solr.analytics.facet.FieldFacetExtrasTest (suite)
   [junit4]   - org.apache.solr.analytics.expression.ExpressionTest (suite)
   [junit4]   - 
org.apache.solr.analytics.AbstractAnalyticsStatsTest.initializationError
   [junit4]   - org.apache.solr.analytics.util.valuesource.FunctionTest (suite)
   [junit4]   - 
org.apache.solr.analytics.facet.AbstractAnalyticsFacetTest.initializationError
   [junit4]   - org.apache.solr.analytics.facet.FieldFacetTest (suite)
   [junit4]   - org.apache.solr.analytics.facet.QueryFacetTest.queryTest
   [junit4]   - org.apache.solr.analytics.facet.RangeFacetTest (suite)

 Analytics Component
 ---

 Key: SOLR-5302
 URL: https://issues.apache.org/jira/browse/SOLR-5302
 Project: Solr
  Issue Type: New Feature
Reporter: Steven Bower
Assignee: Erick Erickson
 Attachments: SOLR-5302.patch, SOLR-5302.patch, Search Analytics 
 Component.pdf, Statistical Expressions.pdf, solr_analytics-2013.10.04-2.patch


 This ticket is to track a replacement for the StatsComponent. The 
 AnalyticsComponent supports the following features:
 * All functionality of StatsComponent (SOLR-4499)
 * Field Faceting (SOLR-3435)
 ** Support for limit
 ** Sorting (bucket name or any stat in the bucket
 ** Support for offset
 * Range Faceting
 ** Supports all options of standard range faceting
 * Query Faceting (SOLR-2925)
 * Ability to use overall/field facet statistics as input to range/query 
 faceting (ie calc min/max date and then facet over that range
 * Support for more complex aggregate/mapping operations (SOLR-1622)
 ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean, 
 median, percentiles
 ** Operations: negation, abs, add, multiply, divide, power, log, date math, 
 string reversal, string concat
 ** Easily pluggable framework to add additional operations
 * New / cleaner output format
 Outstanding Issues:
 * Multi-value field support for stats (supported for faceting)
 * Multi-shard support (may not be possible for some operations, eg median)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5311) Make it possible to train / run classification over multiple fields

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812031#comment-13812031
 ] 

ASF subversion and git services commented on LUCENE-5311:
-

Commit 1538205 from [~teofili] in branch 'dev/trunk'
[ https://svn.apache.org/r1538205 ]

LUCENE-5311 - added support for training using multiple content fields for knn 
and naive bayes

 Make it possible to train / run classification over multiple fields
 ---

 Key: LUCENE-5311
 URL: https://issues.apache.org/jira/browse/LUCENE-5311
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/classification
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili

 It'd be nice to be able to use multiple fields instead of just one for 
 training / running each classifier.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812034#comment-13812034
 ] 

Benson Margulies commented on LUCENE-4956:
--

Looks like mapHanja.dic needs some adjustment of the legal notice? Or was this 
going to be replaced?


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812039#comment-13812039
 ] 

Robert Muir commented on LUCENE-4956:
-

please point to specific files in svn that you have concerns about.

I recreated this file myself from clearly attributed sources, from scratch. 

It has *MORE THAN ENOUGH* legal notice.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812044#comment-13812044
 ] 

Benson Margulies commented on LUCENE-4956:
--

My point is that it might have a bit too much legal notice. Generally, when 
someone grants a license, the headers all move up to some global NOTICE file, 
and the file is left with just an Apache license. 

I also noted the following:

! Except as contained in this notice, the name of a copyright holder shall not 
be 
! used in advertising or otherwise to promote the sale, use or other dealings 
in 
! these Data Files or Software without prior written authorization of the 
copyright holder.

and then noticed:

that http://www.apache.org/legal/resolved.html says that it approves of 

 * BSD (without advertising clause). 

So that Unicode license is possibly an issue.

Right now I'm using the git clone, but I just did a pull, and the pathname is 
lucene/analysis/arirang/src/data/mapHanja.dic




 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812047#comment-13812047
 ] 

Robert Muir commented on LUCENE-4956:
-

{quote}
So that Unicode license is possibly an issue.
{quote}

No, its not. https://issues.apache.org/jira/browse/LEGAL-108

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812050#comment-13812050
 ] 

Benson Margulies commented on LUCENE-4956:
--

That jira concerns a different license. The license on the file pointed-to 
there has no advertising clause that I can spot.


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812050#comment-13812050
 ] 

Benson Margulies edited comment on LUCENE-4956 at 11/2/13 4:11 PM:
---

That jira concerns a different license. The license on the file pointed-to 
there has no advertising clause that I can spot. Which isn't to say that legal 
would have a problem with this, just that I don't think that the JIRA in 
question tells us.



was (Author: bmargulies):
That jira concerns a different license. The license on the file pointed-to 
there has no advertising clause that I can spot.


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812051#comment-13812051
 ] 

Robert Muir commented on LUCENE-4956:
-

This is the unicode license that all of their data and code comes from. There 
is only one.

Please, don't waste my time here, if you want to waste the legal team's time, 
thats ok :)

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812052#comment-13812052
 ] 

Robert Muir commented on LUCENE-4956:
-

The exact question about using unicode data tables has been answered explicitly 
already:

http://mail-archives.apache.org/mod_mbox/www-legal-discuss/200903.mbox/%3c3d4032300903030415w4831f6e4u65c12881cbb86...@mail.gmail.com%3E

I don't think it needs any further discussion

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812053#comment-13812053
 ] 

Benson Margulies commented on LUCENE-4956:
--

Rob, I got shat on at great length over this for merely test data over at the 
WS project.  I had to make the build pull the data over the network to get 
certain directors off of my back. I'm trying to spare you the experience. 
That's all.

As a low-intensity member of the UTC, I would also expect there to be only one 
license. However, I compare:

{noformat}
#  Copyright (c) 1991-2011 Unicode, Inc. All Rights reserved.
#  
#  This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No
#  claims are made as to fitness for any particular purpose. No warranties of
#  any kind are expressed or implied. The recipient agrees to determine
#  applicability of information provided. If this file has been provided on
#  magnetic media by Unicode, Inc., the sole remedy for any claim will be
#  exchange of defective media within 90 days of receipt.
#  
#  Unicode, Inc. hereby grants the right to freely use the information
#  supplied in this file in the creation of products supporting the
#  Unicode Standard, and to make copies of this file in any form for
#  internal or external distribution as long as this notice remains
#  attached.
{noformat}

with

{noformat}
! Copyright (c) 1991-2013 Unicode, Inc. 
! All rights reserved. 
! Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
!
! Permission is hereby granted, free of charge, to any person obtaining a copy 
! of the Unicode data files and any associated documentation (the Data Files) 
! or Unicode software and any associated documentation (the Software) to deal 
! in the Data Files or Software without restriction, including without 
limitation 
! the rights to use, copy, modify, merge, publish, distribute, and/or sell 
copies 
! of the Data Files or Software, and to permit persons to whom the Data Files 
or 
! Software are furnished to do so, provided that (a) the above copyright 
notice(s) 
! and this permission notice appear with all copies of the Data Files or 
Software, 
! (b) both the above copyright notice(s) and this permission notice appear in 
! associated documentation, and (c) there is clear notice in each modified Data 
! File or in the Software as well as in the documentation associated with the 
Data 
! File(s) or Software that the data or software has been modified.
!
! THE DATA FILES AND SOFTWARE ARE PROVIDED AS IS, WITHOUT WARRANTY OF ANY 
KIND, 
! EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY, 
! FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. 
IN NO 
! EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE 
FOR 
! ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES 
! WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION 
OF 
! CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN 
CONNECTION 
! WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.
! 
! Except as contained in this notice, the name of a copyright holder shall not 
be 
! used in advertising or otherwise to promote the sale, use or other dealings 
in 
! these Data Files or Software without prior written authorization of the 
copyright holder.
{noformat}

They look pretty different to me. Go figure?



 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812056#comment-13812056
 ] 

Robert Muir commented on LUCENE-4956:
-

{quote}
Rob, I got shat on at great length over this for merely test data over at the 
WS project. I had to make the build pull the data over the network to get 
certain directors off of my back. I'm trying to spare you the experience. 
That's all.
{quote}

Then perhaps you should push back hard when people don't know what they are 
talking about, like I do. As i said, the question about using unicode data 
tables has already been directly answered.

{quote}
As a low-intensity member of the UTC, I would also expect there to be only one 
license. However, I compare:
{quote}

I am also one. this means nothing.

{quote}
They look pretty different to me. Go figure?
{quote}

There is only one license from the terms of use page 
http://www.unicode.org/copyright.html

That is what I include. Whoever created your other license decided to omit 
some of the information, which I did not.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812059#comment-13812059
 ] 

Benson Margulies commented on LUCENE-4956:
--

OK, I see, the email thread about Unicode data in general does certainly cover 
this. Sometimes the workings of Legal are pretty perplexing.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-11-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812065#comment-13812065
 ] 

Robert Muir commented on LUCENE-4956:
-

Just so anyone reading the thread knows: the clause Benson mentioned is not an 
advertising clause:

{quote}
Except as contained in this notice, the name of a copyright holder shall not be 
used in advertising or otherwise to promote the sale, use or other dealings in 
these Data Files or Software without prior written authorization of the 
copyright holder.
{quote}

The BSD advertising clause reads like this:

{quote}
All advertising materials mentioning features or use of this software must 
display the following acknowledgement: This product includes software developed 
by the organization.
{quote}

These are very different.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
 lucene-4956.patch, lucene4956.patch


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812082#comment-13812082
 ] 

Michael McCandless commented on LUCENE-5321:


+1, but maybe we can keep that test case if we just change it to an 
assumeTrue(_TestUtil.fieldSupportsHugeBinaryValues)?

 Remove Facet42DocValuesFormat
 -

 Key: LUCENE-5321
 URL: https://issues.apache.org/jira/browse/LUCENE-5321
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5321.patch


 The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
 it stores the addresses in direct int[] rather than PackedInts. On 
 LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
 improves perf for some queries and have negligible effect for others, as well 
 as RAM consumption isn't much worse. We should remove Facet42DVF and use 
 DirectDVF instead.
 I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
 the class whenever the default codec changes (e.g. from 45 to 46) since it 
 doesn't care about the actual Codec version underneath, it only overrides the 
 DVF used for the facet fields. FacetCodec should take the DVF from the app 
 (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
 a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812084#comment-13812084
 ] 

Michael McCandless commented on LUCENE-5321:


Also, maybe somewhere in javadocs we could show how the app could do what 
Facet46Codec is doing today?  Ie, how to gather up all facet fields and then 
override getDocValuesFormatForField w/ the default codec?

 Remove Facet42DocValuesFormat
 -

 Key: LUCENE-5321
 URL: https://issues.apache.org/jira/browse/LUCENE-5321
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5321.patch


 The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
 it stores the addresses in direct int[] rather than PackedInts. On 
 LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
 improves perf for some queries and have negligible effect for others, as well 
 as RAM consumption isn't much worse. We should remove Facet42DVF and use 
 DirectDVF instead.
 I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
 the class whenever the default codec changes (e.g. from 45 to 46) since it 
 doesn't care about the actual Codec version underneath, it only overrides the 
 DVF used for the facet fields. FacetCodec should take the DVF from the app 
 (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
 a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-5418:
--

Attachment: SOLR-5418.patch

Here is the patch from my svn checkout.

I sent it to the list really quick just to get an opinion on it. I don't 
remember why the current checks were added. I guess I can see a line of 
reasoning that its good for the user to know they are dragging unused shit 
around in their index.

And I can agree with that, but I don't think its the job of the codec to fail a 
background merge to communicate such a thing to the user.

Such a warning/failure could be implemented in other ways, for example a warmer 
in the example for firstSearcher event that looks at fieldinfos and warns or 
fails YOU ARE DRAGGING AROUND BOGUS STUFF IN YOUR INDEX if it finds things 
that don't match to the schema or something like that: and it would be easy for 
users to enable/disable.

 Background merge after field removed from solr.xml causes error
 ---

 Key: SOLR-5418
 URL: https://issues.apache.org/jira/browse/SOLR-5418
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.5
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5418.patch, SOLR-5418.patch


 Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
 quick patch he pasted on the dev list, I'll append it shortly.
 I am working at implementing solr to work as the search backend for our web
 system.  So far things have been going well, but today I made some schema
 changes and now things have broken.
 I updated the schema.xml file and reloaded the core (via the admin
 interface).  No errors were reported in the logs.
 I then pushed 100 records to be indexed.  A call to Commit afterwards
 seemed fine, however my next call for Optimize caused the following errors:
 java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 null:java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 Unfortunately, googling for background merge hit exception came up
 with 2 thing: a corrupt index or not enough free space.  The host
 machine that's hosting solr has 227 out of 229GB free (according to df
 -h), so that's not it.
 I then ran CheckIndex on the index, and got the following results:
 http://apaste.info/gmGU
 As someone who is new to solr and lucene, as far as I can tell this
 means my index is fine. So I am coming up at a loss. I'm somewhat sure
 that I could probably delete my data directory and rebuild it but I am
 more interested in finding out why is it having issues, what is the
 best way to fix it, and what is the best way to prevent it from
 happening when this goes into production.
 Does anyone have any advice that may help?
 I helped Matthew find the logs and he posted this stack trace:
 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
 â start 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 1691104153 [http-bio-8080-exec-3] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
 webapp=/solr path=/update 
 params={optimize=true_=1382999386564wt=jsonwaitFlush=true} {} 0 224
 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
 java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
 at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
 at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
 at 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
 at 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  

[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812089#comment-13812089
 ] 

Shai Erera commented on LUCENE-5321:


I'll add the test back with the assumeTrue. I'm not sure where to document this 
FacetCodec example ... it doesn't seem to belong in any of the classes' 
javadocs, and package.html aren't too verbose (point to blogs). So maybe we can 
just write a blog about it, though really, this isn't too complicated to figure 
out. I'll attach a patch shortly, want to make sure this test + assume really 
work!

 Remove Facet42DocValuesFormat
 -

 Key: LUCENE-5321
 URL: https://issues.apache.org/jira/browse/LUCENE-5321
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5321.patch


 The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
 it stores the addresses in direct int[] rather than PackedInts. On 
 LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
 improves perf for some queries and have negligible effect for others, as well 
 as RAM consumption isn't much worse. We should remove Facet42DVF and use 
 DirectDVF instead.
 I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
 the class whenever the default codec changes (e.g. from 45 to 46) since it 
 doesn't care about the actual Codec version underneath, it only overrides the 
 DVF used for the facet fields. FacetCodec should take the DVF from the app 
 (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
 a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5418) Background merge after field removed from solr.xml causes error

2013-11-02 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812094#comment-13812094
 ] 

Erick Erickson commented on SOLR-5418:
--

Thanks Robert! Mostly I was making sure I didn't lose anything





 Background merge after field removed from solr.xml causes error
 ---

 Key: SOLR-5418
 URL: https://issues.apache.org/jira/browse/SOLR-5418
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.5
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5418.patch, SOLR-5418.patch


 Problem from the user's list, cut/pasted below. Robert Muir hacked out a 
 quick patch he pasted on the dev list, I'll append it shortly.
 I am working at implementing solr to work as the search backend for our web
 system.  So far things have been going well, but today I made some schema
 changes and now things have broken.
 I updated the schema.xml file and reloaded the core (via the admin
 interface).  No errors were reported in the logs.
 I then pushed 100 records to be indexed.  A call to Commit afterwards
 seemed fine, however my next call for Optimize caused the following errors:
 java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 null:java.io.IOException: background merge hit exception:
 _2n(4.4):C4263/154 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _37
 [maxNumSegments=1]
 Unfortunately, googling for background merge hit exception came up
 with 2 thing: a corrupt index or not enough free space.  The host
 machine that's hosting solr has 227 out of 229GB free (according to df
 -h), so that's not it.
 I then ran CheckIndex on the index, and got the following results:
 http://apaste.info/gmGU
 As someone who is new to solr and lucene, as far as I can tell this
 means my index is fine. So I am coming up at a loss. I'm somewhat sure
 that I could probably delete my data directory and rebuild it but I am
 more interested in finding out why is it having issues, what is the
 best way to fix it, and what is the best way to prevent it from
 happening when this goes into production.
 Does anyone have any advice that may help?
 I helped Matthew find the logs and he posted this stack trace:
 1691103929 [http-bio-8080-exec-3] INFO  org.apache.solr.update.UpdateHandler  
 â start 
 commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 1691104153 [http-bio-8080-exec-3] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  â [dbqItems] 
 webapp=/solr path=/update 
 params={optimize=true_=1382999386564wt=jsonwaitFlush=true} {} 0 224
 1691104154 [http-bio-8080-exec-3] ERROR org.apache.solr.core.SolrCore  â 
 java.io.IOException: background merge hit exception: _2n(4.4):C4263/154 
 _30(4.4):C134 _32(4.4):C10 _31(4.4):C10 into _39 [maxNumSegments=1]
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714)
 at 
 org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
 at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:530)
 at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
 at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1240)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1219)
 at 
 org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
 at 
 org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
   

[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812101#comment-13812101
 ] 

ASF subversion and git services commented on LUCENE-5321:
-

Commit 1538245 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1538245 ]

LUCENE-5321: remove Facet42DocValuesFormat and FacetCodec

 Remove Facet42DocValuesFormat
 -

 Key: LUCENE-5321
 URL: https://issues.apache.org/jira/browse/LUCENE-5321
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5321.patch


 The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
 it stores the addresses in direct int[] rather than PackedInts. On 
 LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
 improves perf for some queries and have negligible effect for others, as well 
 as RAM consumption isn't much worse. We should remove Facet42DVF and use 
 DirectDVF instead.
 I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
 the class whenever the default codec changes (e.g. from 45 to 46) since it 
 doesn't care about the actual Codec version underneath, it only overrides the 
 DVF used for the facet fields. FacetCodec should take the DVF from the app 
 (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
 a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812106#comment-13812106
 ] 

ASF subversion and git services commented on LUCENE-5321:
-

Commit 1538249 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1538249 ]

LUCENE-5321: remove Facet42DocValuesFormat and FacetCodec

 Remove Facet42DocValuesFormat
 -

 Key: LUCENE-5321
 URL: https://issues.apache.org/jira/browse/LUCENE-5321
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5321.patch


 The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
 it stores the addresses in direct int[] rather than PackedInts. On 
 LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
 improves perf for some queries and have negligible effect for others, as well 
 as RAM consumption isn't much worse. We should remove Facet42DVF and use 
 DirectDVF instead.
 I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
 the class whenever the default codec changes (e.g. from 45 to 46) since it 
 doesn't care about the actual Codec version underneath, it only overrides the 
 DVF used for the facet fields. FacetCodec should take the DVF from the app 
 (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
 a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5321) Remove Facet42DocValuesFormat

2013-11-02 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5321.


   Resolution: Fixed
Fix Version/s: 5.0
   4.6
 Assignee: Shai Erera
Lucene Fields: New,Patch Available  (was: New)

Committed to trunk and 4x. If it turns out FacetCodec is too tricky to write, 
we can either add it back under facet/ or maybe under demo/. For the moment, I 
think it's not that important to keep it and maintain it.

 Remove Facet42DocValuesFormat
 -

 Key: LUCENE-5321
 URL: https://issues.apache.org/jira/browse/LUCENE-5321
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5321.patch


 The new DirectDocValuesFormat is nearly identical to Facet42DVF, except that 
 it stores the addresses in direct int[] rather than PackedInts. On 
 LUCENE-5296 we measured the performance of DirectDVF vs Facet42DVF and it 
 improves perf for some queries and have negligible effect for others, as well 
 as RAM consumption isn't much worse. We should remove Facet42DVF and use 
 DirectDVF instead.
 I also want to rename Facet46Codec to FacetCodec. There's no need to refactor 
 the class whenever the default codec changes (e.g. from 45 to 46) since it 
 doesn't care about the actual Codec version underneath, it only overrides the 
 DVF used for the facet fields. FacetCodec should take the DVF from the app 
 (so e.g. the facet/ module doesn't depend on codecs/) and be exposed more as 
 a utility Codec rather than a real, versioned, Codec.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_45) - Build # 3419 - Still Failing!

2013-11-02 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3419/
Java: 64bit/jdk1.7.0_45 -XX:+UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT

Error Message:
expected:3 but was:2

Stack Trace:
java.lang.AssertionError: expected:3 but was:2
at 
__randomizedtesting.SeedInfo.seed([93AADFEF7FFAFCCB:262CBE68C03B4E3F]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.core.TestNonNRTOpen.assertNotNRT(TestNonNRTOpen.java:133)
at 
org.apache.solr.core.TestNonNRTOpen.testReaderIsNotNRT(TestNonNRTOpen.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-02 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812112#comment-13812112
 ] 

Shai Erera commented on LUCENE-5316:


Using NO_PARENTS is not that simple decision, since the counts of the parents 
will be wrong if more than one category of that dimension is added to a 
document. If it's a flat dimension, and you don't care about the dimension's 
count, that may be fine. But if it's a hierarchical dimension, the counts of 
the inner taxonomy nodes will be wrong in that case.

While indexing as NO_PARENTS does exercise the API more, I think it's wrong to 
test it here. NO_PARENTS should be used only for hierarchical dimensions, in 
order to save space in the category list and eventually (hopefully) speed 
things up since less bytes are read and decoded during search. But for flat 
dimensions, it adds the rollupValues cost. If we make the search code smart to 
detect this is a flat dimension, we'd save that cost (no need to rollup), but I 
think in general you should tweak OrdPolicy to NO_PARENTS only for hierarchical 
dimensions. I wonder what the perf numbers will be if you used NO_PARENTS only 
for the hierarchical dims - that's what we recommend the users to use, so I 
think that's what we should benchmark.

I'll review the patch later.

 Taxonomy tree traversing improvement
 

 Key: LUCENE-5316
 URL: https://issues.apache.org/jira/browse/LUCENE-5316
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Gilad Barkai
Priority: Minor
 Attachments: LUCENE-5316.patch


 The taxonomy traversing is done today utilizing the 
 {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
 which hold for each ordinal it's (array #1) youngest child and (array #2) 
 older sibling.
 This is a compact way of holding the tree information in memory, but it's not 
 perfect:
 * Large (8 bytes per ordinal in memory)
 * Exposes internal implementation
 * Utilizing these arrays for tree traversing is not straight forward
 * Lose reference locality while traversing (the array is accessed in 
 increasing only entries, but they may be distant from one another)
 * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
 This issue is about making the traversing more easy, the code more readable, 
 and open it for future improvements (i.e memory footprint and NRT cost) - 
 without changing any of the internals. 
 A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5311) Avoid registering replicas which are removed

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812118#comment-13812118
 ] 

ASF subversion and git services commented on SOLR-5311:
---

Commit 1538254 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1538254 ]

SOLR-5311 trying to stop test failures

 Avoid registering replicas which are removed 
 -

 Key: SOLR-5311
 URL: https://issues.apache.org/jira/browse/SOLR-5311
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 4.6, 5.0

 Attachments: SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, 
 SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch


 If a replica is removed from the clusterstate and if it comes back up it 
 should not be allowed to register. 
 Each core ,when comes up, checks if it was already registered and if yes is 
 it still there. If not ,throw an error and do an unregister . If such a 
 request come to overseer it should ignore such a core



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5311) Avoid registering replicas which are removed

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812119#comment-13812119
 ] 

ASF subversion and git services commented on SOLR-5311:
---

Commit 1538255 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1538255 ]

SOLR-5311 trying to stop test failures

 Avoid registering replicas which are removed 
 -

 Key: SOLR-5311
 URL: https://issues.apache.org/jira/browse/SOLR-5311
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 4.6, 5.0

 Attachments: SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, 
 SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch, SOLR-5311.patch


 If a replica is removed from the clusterstate and if it comes back up it 
 should not be allowed to register. 
 Each core ,when comes up, checks if it was already registered and if yes is 
 it still there. If not ,throw an error and do an unregister . If such a 
 request come to overseer it should ignore such a core



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-11-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812124#comment-13812124
 ] 

ASF subversion and git services commented on LUCENE-5189:
-

Commit 1538258 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1538258 ]

LUCENE-5189: add @Deprecated annotation to SegmentInfo.attributes

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
 LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
 LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
 LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
 LUCENE-5189_process_events.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #495: POMs out of sync

2013-11-02 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/495/

1 tests failed.
FAILED:  org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.testDistribSearch

Error Message:
No live SolrServers available to handle this 
request:[http://127.0.0.1:13780/collection1, 
http://127.0.0.1:52756/collection1, http://127.0.0.1:65012/collection1, 
http://127.0.0.1:32173/collection1]

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request:[http://127.0.0.1:13780/collection1, 
http://127.0.0.1:52756/collection1, http://127.0.0.1:65012/collection1, 
http://127.0.0.1:32173/collection1]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:464)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:268)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:640)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
org.apache.solr.cloud.ChaosMonkeyNothingIsSafeTest.doTest(ChaosMonkeyNothingIsSafeTest.java:200)




Build Log:
[...truncated 37396 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812214#comment-13812214
 ] 

Michael McCandless commented on LUCENE-5316:


OK I will re-test, using NO_PARENTS only for the hierarchical field.  The 
problem is, the Wikipedia docs have only one such field, date (Y/M/D), and it's 
low cardinality.

Actually, Wikipedia does have a VERY big taxonomy but I've never succeeded in 
extracting it...

So net/net, I will re-test, but I feel this can easily give a false sense of 
security since my test data does not have a big single-valued hierarchical 
field...

 Taxonomy tree traversing improvement
 

 Key: LUCENE-5316
 URL: https://issues.apache.org/jira/browse/LUCENE-5316
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Gilad Barkai
Priority: Minor
 Attachments: LUCENE-5316.patch


 The taxonomy traversing is done today utilizing the 
 {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
 which hold for each ordinal it's (array #1) youngest child and (array #2) 
 older sibling.
 This is a compact way of holding the tree information in memory, but it's not 
 perfect:
 * Large (8 bytes per ordinal in memory)
 * Exposes internal implementation
 * Utilizing these arrays for tree traversing is not straight forward
 * Lose reference locality while traversing (the array is accessed in 
 increasing only entries, but they may be distant from one another)
 * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
 This issue is about making the traversing more easy, the code more readable, 
 and open it for future improvements (i.e memory footprint and NRT cost) - 
 without changing any of the internals. 
 A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5323) Add sizeInBytes to Suggester.Lookup

2013-11-02 Thread Areek Zillur (JIRA)
Areek Zillur created LUCENE-5323:


 Summary: Add sizeInBytes to Suggester.Lookup 
 Key: LUCENE-5323
 URL: https://issues.apache.org/jira/browse/LUCENE-5323
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur


It would be nice to have a sizeInBytes() method added to Suggester.Lookup 
interface. This would allow users to estimate the size of the in-memory data 
structure created by various suggester implementation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5323) Add sizeInBytes to Suggester.Lookup

2013-11-02 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5323:
-

Attachment: LUCENE-5323.patch

Initial patch implementing sizeInBytes() for Suggest.Lookup.

 Add sizeInBytes to Suggester.Lookup 
 

 Key: LUCENE-5323
 URL: https://issues.apache.org/jira/browse/LUCENE-5323
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Attachments: LUCENE-5323.patch


 It would be nice to have a sizeInBytes() method added to Suggester.Lookup 
 interface. This would allow users to estimate the size of the in-memory data 
 structure created by various suggester implementation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

2013-11-02 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812260#comment-13812260
 ] 

Shai Erera commented on LUCENE-5316:


How about if you make up a hierarchical category, e.g. 
{{charCount/0-100K/0-10K/0-1K/0-100/0-10}}? If there will be candidates in all 
ranges, that's 100K nodes. Also, we can hack a hierarchical dimension made up 
of A-Z/A-Z/A-Z... and randomly assign categories at different levels to 
documents.

But, NO_PARENTS is not the only way to exercise the API. By asking top-K on a 
big flat dimension, when we compute the top-K for that dimension we currently 
traverse all its children, to find which of them has count0. So the big flat 
dimensions also make thousands of calls to ChildrenIterator.next().

 Taxonomy tree traversing improvement
 

 Key: LUCENE-5316
 URL: https://issues.apache.org/jira/browse/LUCENE-5316
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Gilad Barkai
Priority: Minor
 Attachments: LUCENE-5316.patch


 The taxonomy traversing is done today utilizing the 
 {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
 which hold for each ordinal it's (array #1) youngest child and (array #2) 
 older sibling.
 This is a compact way of holding the tree information in memory, but it's not 
 perfect:
 * Large (8 bytes per ordinal in memory)
 * Exposes internal implementation
 * Utilizing these arrays for tree traversing is not straight forward
 * Lose reference locality while traversing (the array is accessed in 
 increasing only entries, but they may be distant from one another)
 * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
 This issue is about making the traversing more easy, the code more readable, 
 and open it for future improvements (i.e memory footprint and NRT cost) - 
 without changing any of the internals. 
 A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org