[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website

2011-02-15 Thread Prescott Nasser (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995115#comment-12995115
 ] 

Prescott Nasser commented on LUCENENET-379:
---

I like it better than the current one. What is people's feelings on moving away 
from the Lucene logo? We aren't a different product like solr, nor are we a 
loose port like Lucy.

 Clean up Lucene.Net website
 ---

 Key: LUCENENET-379
 URL: https://issues.apache.org/jira/browse/LUCENENET-379
 Project: Lucene.Net
  Issue Type: Task
Reporter: George Aroush
 Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch


 The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is 
 still based on the incubation, out of date design.  This JIRA task is to 
 bring it up to date with other ASF project's web page.
 The existing website is here: 
 https://svn.apache.org/repos/asf/lucene/lucene.net/site/
 See http://www.apache.org/dev/project-site.html to get started.
 It would be best to start by cloning an existing ASF project's website and 
 adopting it for Lucene.Net.  Some examples, 
 https://svn.apache.org/repos/asf/lucene/pylucene/site/ and 
 https://svn.apache.org/repos/asf/lucene/java/site/

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (LUCENENET-379) Clean up Lucene.Net website

2011-02-15 Thread Christopher Currens (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995120#comment-12995120
 ] 

Christopher Currens commented on LUCENENET-379:
---

I would be happy to move away from the old logo.  The project's goals have 
certainly changed from the previous project and I think it deserves to look 
new.  Not to mention the old lucene.net logo, the .net part of it looks funky.

 Clean up Lucene.Net website
 ---

 Key: LUCENENET-379
 URL: https://issues.apache.org/jira/browse/LUCENENET-379
 Project: Lucene.Net
  Issue Type: Task
Reporter: George Aroush
 Attachments: Lucene.zip, New Logo Idea.jpg, asfcms.zip, asfcms_1.patch


 The existing Lucene.Net home page at http://lucene.apache.org/lucene.net/ is 
 still based on the incubation, out of date design.  This JIRA task is to 
 bring it up to date with other ASF project's web page.
 The existing website is here: 
 https://svn.apache.org/repos/asf/lucene/lucene.net/site/
 See http://www.apache.org/dev/project-site.html to get started.
 It would be best to start by cloning an existing ASF project's website and 
 adopting it for Lucene.Net.  Some examples, 
 https://svn.apache.org/repos/asf/lucene/pylucene/site/ and 
 https://svn.apache.org/repos/asf/lucene/java/site/

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Problem loading jcc from java : undefined symbol: PyExc_IOError

2011-02-15 Thread Roman Chyla
On Tue, Feb 15, 2011 at 4:22 AM, Andi Vajda va...@apache.org wrote:

 On Tue, 15 Feb 2011, Roman Chyla wrote:

 from:
 http://realmike.org/blog/2010/07/18/python-extensions-in-cpp-using-swig/

 Q. ?Fatal Python error: Interpreter not initialized (version mismatch?)?

 A. This error occurs when the version of the Python interpreter for
 which the extension module has been built is different from the
 version of the interpreter that attempts to import the module.

 Is there a way to find out which python interpreter version is inside
 JCC? Also, Is it somehow possible that the java process that load jcc
 library will be picking the default python (2.4) instead of the python
 (2.5)? PATH is set to python2.5.

 There is no Python interpreter inside jcc. It's dynamically linked.
 To know which version of the shared library is looked for and expected, use
 the 'ldd' utility against the various shared libraries involved to tell you.
 That version is selected at build time, when you run 'python setup.py ...'
 That version of python determines the version of libpython.so used.

This will be probably the problem (as you said before), the libjcc.so
shows no python -

bash-3.2$ ldd build/lib.linux-x86_64-2.5/libjcc.so
linux-vdso.so.1 =  (0x7fff7affc000)
/$LIB/snoopy.so = /lib64/snoopy.so (0x2b8ed0e74000)
libjava.so = 
/afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libjava.so
(0x2b8ed1076000)
libjvm.so = 
/afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so
(0x2b8ed11a5000)
libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b8ed1c3f000)
libm.so.6 = /lib64/libm.so.6 (0x2b8ed1f3f000)
libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b8ed21c2000)
libpthread.so.0 = /lib64/libpthread.so.0 (0x2b8ed23cf000)
libc.so.6 = /lib64/libc.so.6 (0x2b8ed25eb000)
libdl.so.2 = /lib64/libdl.so.2 (0x2b8ed2943000)
libverify.so =
/afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libverify.so
(0x2b8ed2b47000)
libnsl.so.1 = /lib64/libnsl.so.1 (0x2b8ed2c57000)
/lib64/ld-linux-x86-64.so.2 (0x2b8ed08c9000)

And I think, the python2.4 (the default  on the system) is being
loaded -- but how to force loading of python2.5 (if that was possible
at all) I don't know. Compilation is definitely done with -lpython2.5

Cheers,

  roman


 Andi..


 Cheers,

  roman


 On Tue, Feb 15, 2011 at 2:40 AM, Roman Chyla roman.ch...@gmail.com
 wrote:

 On Tue, Feb 15, 2011 at 1:32 AM, Andi Vajda va...@apache.org wrote:

 On Tue, 15 Feb 2011, Roman Chyla wrote:

 The python embedded in Java works really well on MacOsX and also
 Ubuntu. But I am trying hard to make it work also on Scientific Linux
 (SLC5) with *statically* built Python. The python is a build from
 ActiveState.

 You mean you're going to try to dynamically load libpython.a into a JVM
 ?
 I have no idea if this can work at all.

 I am very ignorant as far as the difference between statically and
 dynamically linked libraries go - I just wanted to use JCC wrapped
 code with this particular statically linked python

 I got little bit further, but just little:

 after I changed -Xlinker --export-dynamic into -Xlinker
 -export-dynamic (and installed python into /opt...) I am getting a
 different error:

 SEVERE: org.apache.jcc.PythonException: No module named
 solrpie.java_bridge
 null
        at org.apache.jcc.PythonVM.instantiate(Native Method)
        at rca.python.jni.PythonVMBridge.start(Unknown Source)
        at rca.python.jni.PythonVMBridge.start(Unknown Source)
        at rca.python.jni.PythonVMBridge.start(Unknown Source)
        at rca.python.jni.SolrpieVM.getBridge(Unknown Source)


 My understanding is that the previous error has gone (and the python
 module time is loaded), because if I set PYTHONPATH incorrectly, I
 get:
 This message is IMHO coming from Python

 But when I correct the PYTHONPATH, I am getting only this:

 [java] Fatal Python error: Interpreter not initialized (version
 mismatch?)
 [java] Java Result: 134




 If my understanding of static builds is correct, I'd imagine the only
 way
 for this to work would be to statically compile the JVM (hotspot) and
 python
 together.

 oooups, that is way over my head


 But why all this ?

 Because on the grid, we already had a statically linked python and it
 was working very well with pylucene (and after all, I managed to make
 it work also for solr and other packages)

 But if you think that it is not possible, I should do something else :)
 But it was fun trying, if you get some idea, please let me know.

 Thank you,

  Roman


 Andi..

 So far, I managed to build all the needed extensions (jcc, lucene,
 solr) and I can run them in python, but when I try to start the java
 app and use python, I get:

 SEVERE: org.apache.jcc.PythonException:


 

[jira] Commented: (SOLR-1395) Integrate Katta

2011-02-15 Thread tom liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994688#comment-12994688
 ] 

tom liu commented on SOLR-1395:
---

ISolrServer's config is set by katta script. QueryCore's config will be set 
autolly.

Sub-proxy solr is just proxy, which do not process any request.
so, sub-proxy dispatch request to querycore. and querycore process request, 
return solrdoclists.

but, you get the exception which do not cast object type. i think that 
querycore would be wrong.

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
 back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
 katta-solrcores.jpg, katta.node.properties, katta.zk.properties, 
 log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, 
 solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, 
 solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-1969) Make MMapDirectory configurable in solrconfig.xml

2011-02-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1969.
---

Resolution: Duplicate

mmap directory was added in SOLR-2187

 Make MMapDirectory configurable in solrconfig.xml
 -

 Key: SOLR-1969
 URL: https://issues.apache.org/jira/browse/SOLR-1969
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Stephen Bochinski
 Attachments: mmap_upd.patch, mmap_upd.patch, mmap_upd.txt, 
 mmap_upd.txt

   Original Estimate: 102.5h
  Remaining Estimate: 102.5h

 This will make it so you can enable mmapdirectory from the solrconfig.xml 
 file. There are also several configurations you can specify in the 
 solrconfig.xml file. You can enable or disable the unmapping files which have 
 been closed by solr. This is almost necessary for an index which is being 
 optimized. You also have the option to not mmap certain files. In this case, 
 FSDirectory will be used to manage those particular files. This is 
 particularly useful if you are using FieldCache (SOLR-1961). Having this 
 enabled makes it useless to memory map the .fdt and .fdx files, considering 
 they are already in memory.
 The configurations are specified as follows:
 directoryFactory class=solr.MMapDirectoryFactory
 str name=unmaptrue/str
 lst name=filetypes
   bool name=fdtfalse/bool
   bool name=fdxfalse/bool
 /lst
   /directoryFactory
 This would enable unmapping of closed files and would not memory map files 
 ending with .fdt and .fdx.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1391) Token type and flags values get lost when using ShingleMatrixFilter

2011-02-15 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994700#comment-12994700
 ] 

Uwe Schindler commented on LUCENE-1391:
---

As nobody seems to be interested in or understands this Filter and wants to 
maintain it, I will deprecate it in 3.x branch and remove in trunk. It's only 
deprecated, so we can easily un-deprecate it after the release of 3.1 
ifsomebody rewrites it to be more generic and works with attributes.

 Token type and flags values get lost when using ShingleMatrixFilter
 ---

 Key: LUCENE-1391
 URL: https://issues.apache.org/jira/browse/LUCENE-1391
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.4, 2.9, 3.0
Reporter: Wouter Heijke
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-1391.patch


 While using the new ShingleMatrixFilter I noticed that a token's type and 
 flags get lost while using this filter. ShingleFilter does respect these 
 values like the other filters I know.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter

2011-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2920:
--

Fix Version/s: 4.0
   3.1

 Deprecate and remove ShingleMatrixFilter
 

 Key: LUCENE-2920
 URL: https://issues.apache.org/jira/browse/LUCENE-2920
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0


 Spin-off from LUCENE-1391: This filter is unmainatined and no longer 
 up-to-date, has bugs nobody understands and does not work with attributes.
 This issue deprecates it as of Lucene 3.1 and removes it from trunk.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter

2011-02-15 Thread Uwe Schindler (JIRA)
Deprecate and remove ShingleMatrixFilter


 Key: LUCENE-2920
 URL: https://issues.apache.org/jira/browse/LUCENE-2920
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler


Spin-off from LUCENE-1391: This filter is unmainatined and no longer 
up-to-date, has bugs nobody understands and does not work with attributes.

This issue deprecates it as of Lucene 3.1 and removes it from trunk.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter

2011-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2920:
--

Component/s: contrib/analyzers

 Deprecate and remove ShingleMatrixFilter
 

 Key: LUCENE-2920
 URL: https://issues.apache.org/jira/browse/LUCENE-2920
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0


 Spin-off from LUCENE-1391: This filter is unmainatined and no longer 
 up-to-date, has bugs nobody understands and does not work with attributes.
 This issue deprecates it as of Lucene 3.1 and removes it from trunk.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1391) Token type and flags values get lost when using ShingleMatrixFilter

2011-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-1391.
-

Resolution: Won't Fix

See LUCENE-2920.

 Token type and flags values get lost when using ShingleMatrixFilter
 ---

 Key: LUCENE-1391
 URL: https://issues.apache.org/jira/browse/LUCENE-1391
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.4, 2.9, 3.0
Reporter: Wouter Heijke
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-1391.patch


 While using the new ShingleMatrixFilter I noticed that a token's type and 
 flags get lost while using this filter. ShingleFilter does respect these 
 values like the other filters I know.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2011-02-15 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994716#comment-12994716
 ] 

Peter Sturge commented on SOLR-1709:


Hi David,

Thank you thank you thank you for working on this and providing tests - your 
efforts are very much appreciated!

For deprecation of facet.date, I suspect it probably shouldn't be deprecated 
until a fully-fledged replacement is ready, ported and committed, but if 
SOLR-1240 can functionally slot-in (including the 'NOW' stuff in SOLR-1729), 
that's great.

Many thanks,
Peter


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, FacetComponent.java, 
 ResponseBuilder.java, SOLR-1709_distributed_date_faceting_v3x.patch, 
 solr-1.4.0-solr-1709.patch


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2920) Deprecate and remove ShingleMatrixFilter

2011-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2920.
---

Resolution: Fixed

Deprecated 3.x revision: 1070818
Removed trunk revision: 1070821

 Deprecate and remove ShingleMatrixFilter
 

 Key: LUCENE-2920
 URL: https://issues.apache.org/jira/browse/LUCENE-2920
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0


 Spin-off from LUCENE-1391: This filter is unmainatined and no longer 
 up-to-date, has bugs nobody understands and does not work with attributes.
 This issue deprecates it as of Lucene 3.1 and removes it from trunk.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



inverted index pruning

2011-02-15 Thread Li Li
hi all,
I recently read a paper Pruning Policies for Two-Tiered Inverted
Index with Correctness Guarantee. It's idea is interesting and I
have some questions and like to share with you.
it's idea is pruning unlikely documents for certain terms
e.g.
term1  d1  d3  d6  |  d9  d7 d8
term2  d1  d6  d8  |  d3  d4 d5
we have 2 terms here -- term1 and term2
we perform a and query for searching documents that contain both
the 2 terms
if our score function consider 2 types of scores: document static
score and term related document score
for simplicity, the static score is the page_rank of a document
and term related score is tf*idf and final score is linear combination
such as page_rank + tf*idf
the pruning criterion is : we only keep the documents whose
page_rank is top N of all the  docList for this term and tf*idf is
also
top N
take above example  d1 d3 d5's page_rank is top 3 of all the 5
documents containing term1 and also d1's tf *idf is top 3 of all the 5
documents
   Then we want to get top 3 documents
   we evaluate d1 d3  d6 d8 because it's in pruned docList
   d1 and d6 's information is complete so we can calculate the
accurate score of d1
   d3 d8's information is incomplete, we can only calculate the upper
bound of them.
   we select top 3 document by score. suppose the result is d6 d1 d8 d3
   because d8 and d3 is upper bound score, we can compare the real
score of d8 and d3. So the search is failed and we have to search
un-pruned index for answer
   But if we want to get top 2 documents. we can know d1 and d6 are
the answer because d6, d8 and other documents' score is less
than them

   The experiments in this paper shows that we can use pruned
index(30% size of full index) to answer 90%+ queries
   This result is excited but we have a problem in using it in lucene.
   for and query in lucene, we use skipList to speed up queries.
   but this pruning algorithm cannot use pruned index.
   we can check it by above example.
   if we use skipList, we can skip d3. but d3's upper bound score may
larger than d1. if we don't score d3, we cannot know it.
   although we need only score docList in pruned index(30%), it may be
slower than and query in full index because we can
use skipList to skip many docs.
   Anyone has good idea for this problem? if we can solve this
problem, we can improve performance well.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1395) Integrate Katta

2011-02-15 Thread JohnWu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994757#comment-12994757
 ] 

JohnWu commented on SOLR-1395:
--

tomliu:
   so the solrhome of ISolrServer need configure to multi-core style?
   in solr.xml
solr persistent=false
  cores adminPath=/admin/cores

core name=queryCore instanceDir=queryCore/

  /cores
/solr
   but how to set the handler to each role of katta slave?

   can you show the solr home folder hierarchy and config content of katta 
slave node?


 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
 back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
 katta-solrcores.jpg, katta.node.properties, katta.zk.properties, 
 log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, 
 solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, 
 solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2894) Use of google-code-prettify for Lucene/Solr Javadoc

2011-02-15 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994759#comment-12994759
 ] 

Koji Sekiguchi commented on LUCENE-2894:


{quote}
but Solr javadoc on hudson looks not good:

https://hudson.apache.org/hudson/job/Solr-trunk/javadoc/org/apache/solr/handler/component/TermsComponent.html
{quote}

The problem was gone.

 Use of google-code-prettify for Lucene/Solr Javadoc
 ---

 Key: LUCENE-2894
 URL: https://issues.apache.org/jira/browse/LUCENE-2894
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Javadocs
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2894.patch, LUCENE-2894.patch, LUCENE-2894.patch, 
 LUCENE-2894.patch


 My company, RONDHUIT uses google-code-prettify (Apache License 2.0) in 
 Javadoc for syntax highlighting:
 http://www.rondhuit-demo.com/RCSS/api/com/rondhuit/solr/analysis/JaReadingSynonymFilterFactory.html
 I think we can use it for Lucene javadoc (java sample code in overview.html 
 etc) and Solr javadoc (Analyzer Factories etc) to improve or simplify our 
 life.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2272) Join

2011-02-15 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994762#comment-12994762
 ] 

Bojan Smid commented on SOLR-2272:
--

Very nice patch Yonik. However, it doesn't apply on current trunk any more. 
Does anyone, by any chance, have a fresh version of this patch?

 Join
 

 Key: SOLR-2272
 URL: https://issues.apache.org/jira/browse/SOLR-2272
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-2272.patch


 Limited join functionality for Solr, mapping one set of IDs matching a query 
 to another set of IDs, based on the indexed tokens of the fields.
 Example:
 fq={!join  from=parent_ptr to:parent_id}child_doc:query

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (SOLR-2293) SolrCloud distributed indexing

2011-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl closed SOLR-2293.
-

Resolution: Duplicate

Use SOLR-2358

 SolrCloud distributed indexing
 --

 Key: SOLR-2293
 URL: https://issues.apache.org/jira/browse/SOLR-2293
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Jan Høydahl

 Add SolrCloud support for distributed indexing, as described in 
 http://wiki.apache.org/solr/DistributedSearch#Distributed_Indexing and the 
 Support user specified partitioning paragraph of 
 http://wiki.apache.org/solr/SolrCloud#High_level_design_goals
 Currently, the client needs to decide what shard indexer to talk to for each 
 document. Common partitioning strategies include has-based, date-based and 
 custom.
 Solr should have the capability of accepting a document update on any of the 
 nodes in a cluster, and perform partitioning and distribution of updates to 
 correct shard, based on current ZK config. The ShardDistributionPolicy should 
 be pluggable, with the most common provided out of the box.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2358) Distributing Indexing

2011-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994766#comment-12994766
 ] 

Jan Høydahl commented on SOLR-2358:
---

See SOLR-2293 for some thoughts.

Since this functionality is core to Solr and should always be present, it would 
be natural to either build it into the DirectUpdateHandler2 or to add this 
processor to the set of default UpdateProcessors that are executed if no 
update.processor parameter is specified.

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
Reporter: William Mayor
Priority: Minor
 Attachments: SOLR-2358.patch


 The first steps towards creating distributed indexing functionality in Solr

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2908) clean up serialization in the codebase

2011-02-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994769#comment-12994769
 ] 

Earwin Burrfoot commented on LUCENE-2908:
-

Oh, damn :)
On my project, we specifically use java-serialization to pass configured 
Queries/Filters between cluster nodes, as it saves us HEAPS of 
wrapping/unwrapping them into some parallel serializable classes.

 clean up serialization in the codebase
 --

 Key: LUCENE-2908
 URL: https://issues.apache.org/jira/browse/LUCENE-2908
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2908.patch


 We removed contrib/remote, but forgot to cleanup serialization hell 
 everywhere.
 this is no longer needed, never really worked (e.g. across versions), and 
 slows 
 development (e.g. i wasted a long time debugging stupid serialization of 
 Similarity.idfExplain when trying to make a patch for the scoring system).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-756) Make DisjunctionMaxQueryParser generally useful by supporting all query types.

2011-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994772#comment-12994772
 ] 

Jan Høydahl commented on SOLR-756:
--

I think this issue can be closed as it duplicates SOLR-1553, not?

 Make DisjunctionMaxQueryParser generally useful by supporting all query types.
 --

 Key: SOLR-756
 URL: https://issues.apache.org/jira/browse/SOLR-756
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: Next

 Attachments: SolrPluginUtilsDisMax.patch


 This is an enhancement to the DisjunctionMaxQueryParser to work on all the 
 query variants such as wildcard, prefix, and fuzzy queries, and to support 
 working in AND scenarios that are not processed by the min-should-match 
 DisMax QParser. This was not in Solr already because DisMax was only used for 
 a very limited syntax that didn't use those features. In my opinion, this 
 makes a more suitable base parser for general use because unlike the 
 Lucene/Solr parser, this one supports multiple default fields whereas other 
 ones (say Yonik's {!prefix} one for example, can't do dismax). The notion of 
 a single default field is antiquated and a technical under-the-hood detail of 
 Lucene that I think Solr should shield the user from by on-the-fly using a 
 DisMax when multiple fields are used. 
 (patch to be attached soon)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2363) Rename the dismax request handler

2011-02-15 Thread JIRA
Rename the dismax request handler
---

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl


It is misleading that one of the requestHandlers in the example schema is named 
the same as the queryParser dismax. It creates confusion as to whether the 
use of defType=dismax vs qt=dismax. It would be better if the example 
requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: inverted index pruning

2011-02-15 Thread Andrzej Bialecki

On 2/15/11 11:57 AM, Li Li wrote:

hi all,
 I recently read a paper Pruning Policies for Two-Tiered Inverted
Index with Correctness Guarantee. It's idea is interesting and I
have some questions and like to share with you.


Please take a look at LUCENE-1812, LUCENE-2632 and my presentation from 
Apache EuroCon 2010 in Prague, Munching and Crunching.



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1581) Facet by Function

2011-02-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994774#comment-12994774
 ] 

Grant Ingersoll commented on SOLR-1581:
---

I would agree it relates to SOLR-1240.  In fact, API wise, I think we could 
just add facet.range.function= (or some abbreviation like facet.range.fn).  The 
key here is that we don't want to have to run the query multiple times like one 
has to do w/ frange

 Facet by Function
 -

 Key: SOLR-1581
 URL: https://issues.apache.org/jira/browse/SOLR-1581
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: Next


 It would be really great if we could execute a function and quantize it into 
 buckets that could then be returned as facets.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2348:
--

Fix Version/s: (was: 3.1)
   3.2

moving to 3.2

 No error reported when using a FieldCached backed ValueSource for a field 
 Solr knows won't work
 ---

 Key: SOLR-2348
 URL: https://issues.apache.org/jira/browse/SOLR-2348
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.2, 4.0


 For the same reasons outlined in SOLR-2339, Solr FieldTypes that return 
 FieldCached backed ValueSources should explicitly check for situations where 
 knows the FieldCache is meaningless.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2921) Now that we track the code version at the segment level, we can stop tracking it also in each file level

2011-02-15 Thread Shai Erera (JIRA)
Now that we track the code version at the segment level, we can stop tracking 
it also in each file level


 Key: LUCENE-2921
 URL: https://issues.apache.org/jira/browse/LUCENE-2921
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
 Fix For: 3.2, 4.0


Now that we track the code version that created the segment at the segment 
level, we can stop tracking versions in each file. This has several major 
benefits:
# Today the constant names that use to track versions are confusing - they do 
not state since which version it applies to, and so it's harder to determine 
which formats we can stop supporting when working on the next major release.
# Those format numbers are usually negative, but in some cases positive 
(inconsistency) -- we need to remember to increase it one down for the 
negative ones, which I always find confusing.
# It will remove the format tracking from all the *Writers, and the *Reader 
will receive the code format (String) and work w/ the appropriate constant 
(e.g. Constants.LUCENE_30). Centralizing version tracking to SegmentInfo is an 
advantage IMO.

It's not urgent that we do it for 3.1 (though it requires an index format 
change), because starting from 3.1 all segments track their version number 
anyway (or migrated to track it), so we can safely release it in follow-on 3x 
release.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1553) extended dismax query parser

2011-02-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994781#comment-12994781
 ] 

Robert Muir commented on SOLR-1553:
---

I marked this experimental in trunk.

I'll keep the issue open in 3.1 for a few more days as discussed, then i'm 
moving it out.

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 1.5, 3.1, 4.0

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, 
 edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, 
 edismax.userFields.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2363) Rename the dismax request handler

2011-02-15 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-2363.
-

Resolution: Duplicate

The DismaxRequestHandler and StandardRequestHandler are both deprecated and 
replaced with SearchHandler.






 Rename the dismax request handler
 ---

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema

 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Created: (SOLR-2363) Rename the dismax request handler

2011-02-15 Thread Erick Erickson
+1

2011/2/15 Jan Høydahl (JIRA) j...@apache.org:
 Rename the dismax request handler
 ---

                 Key: SOLR-2363
                 URL: https://issues.apache.org/jira/browse/SOLR-2363
             Project: Solr
          Issue Type: Bug
          Components: Schema and Analysis
            Reporter: Jan Høydahl


 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

 --
 This message is automatically generated by JIRA.
 -
 For more information on JIRA, see: http://www.atlassian.com/software/jira



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (SOLR-2363) Rename the dismax request handler

2011-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reopened SOLR-2363:
---


Reopening. This issue is not talking about the old DisMaxRequestHandler but the 
example SearchHandler config named dismax. 

We should probably start using the name RequestHandler instance or similar 
for these entries.

 Rename the dismax request handler
 ---

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema
 Attachments: SOLR-2363.patch


 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2363) Rename the dismax request handler

2011-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2363:
--

Attachment: SOLR-2363.patch

This patch renames the example requesthandler to dismaxexample, updates the 
outdated comment with more proper reference to DisMaxQParser and switches to 
edismax as default.

 Rename the dismax request handler
 ---

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema
 Attachments: SOLR-2363.patch


 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2363) Rename the example dismax request handler instance

2011-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2363:
--

Summary: Rename the example dismax request handler instance  (was: Rename 
the dismax request handler)

Just renaming issue to reflect that it's about a requesthandler instance

 Rename the example dismax request handler instance
 

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema
 Attachments: SOLR-2363.patch


 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2363) Rename the example dismax request handler instance

2011-02-15 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994793#comment-12994793
 ] 

Ryan McKinley commented on SOLR-2363:
-

ah - my bad.

what about something more descriptive?  dismax is kinda cryptic.  maybe 
'escaped', 'safe', or just 'query'

Though i'm not convinced it really needs changing -- we would also need to 
update all the documentation that refers to ?qt=dismax

 Rename the example dismax request handler instance
 

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema
 Attachments: SOLR-2363.patch


 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2351) Allow the MoreLikeThis component to accept filters and use the already parsed query from previous stages (if applicable) as seed.

2011-02-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2351:
---

Affects Version/s: (was: 1.5)
Fix Version/s: (was: 1.5)
   (was: 1.3)
   Next

 Allow the MoreLikeThis component to accept filters and use the already parsed 
 query from previous stages (if applicable) as seed.
 -

 Key: SOLR-2351
 URL: https://issues.apache.org/jira/browse/SOLR-2351
 Project: Solr
  Issue Type: Improvement
  Components: MoreLikeThis
Reporter: Amit Nithian
Priority: Minor
 Fix For: Next

 Attachments: mlt.patch


 Currently the MLT component doesn't accept filter queries specified on the 
 URL which my application needed (I needed to restrict similar results by a 
 lat/long bounding box). This patch also attempts to solve the issue of 
 allowing the boost functions of the dismax to be used in the MLT component by 
 using the query object created by the QueryComponent to OR with the query 
 created by the MLT as part of the final query. In a blank dismax query with 
 no query/phrase clauses, this works although a separate BF definition/parsing 
 would be ideal.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1191) NullPointerException in delta import

2011-02-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994795#comment-12994795
 ] 

Yonik Seeley commented on SOLR-1191:


If someone could whip up a test for this, we could get this fix into the 
upcoming 3.1 release.

 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: user rows obtained : 46
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: user rows obtained : 0
 05/27 11:59:29 86987873 INFO  

[jira] Commented: (SOLR-2245) MailEntityProcessor Update

2011-02-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994797#comment-12994797
 ] 

Yonik Seeley commented on SOLR-2245:


Thanks Peter,
If we can get someone who knows more DIH stuff to add some tests, we can get 
this committed!

 MailEntityProcessor Update
 --

 Key: SOLR-2245
 URL: https://issues.apache.org/jira/browse/SOLR-2245
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2

 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip


 This patch addresses a number of issues in the MailEntityProcessor 
 contrib-extras module.
 The changes are outlined here:
 * Added an 'includeContent' entity attribute to allow specifying content to 
 be included independently of processing attachments
  e.g. entity includeContent=true processAttachments=false . . . / 
 would include message content, but not attachment content
 * Added a synonym called 'processAttachments', which is synonymous to the 
 mis-spelled (and singular) 'processAttachement' property. This property 
 functions the same as processAttachement. Default= 'true' - if either is 
 false, then attachments are not processed. Note that only one of these should 
 really be specified in a given entity tag.
 * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
 unread, not deleted etc.), there is still a property value stored in the 
 'flags' field (the value is the string none)
 Note: there is a potential backward compat issue with FLAGS.NONE for clients 
 that expect the absence of the 'flags' field to mean 'Not read'. I'm 
 calculating this would be extremely rare, and is inadviasable in any case as 
 user flags can be arbitrarily set, so fixing it up now will ensure future 
 client access will be consistent.
 * The folder name of an email is now included as a field called 'folder' 
 (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
 processing
 * The addPartToDocument() method that processes attachments is significantly 
 re-written, as there looked to be no real way the existing code would ever 
 actually process attachment content and add it to the row data
 Tested on the 3.x trunk with a number of popular imap servers.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1191) NullPointerException in delta import

2011-02-15 Thread Gunnlaugur Thor Briem (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994803#comment-12994803
 ] 

Gunnlaugur Thor Briem commented on SOLR-1191:
-

I'll make one later today or tomorrow.

 NullPointerException in delta import
 

 Key: SOLR-1191
 URL: https://issues.apache.org/jira/browse/SOLR-1191
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4
 Environment: OS: Windows  Linux.
 Java: 1.6
 DB: MySQL  SQL Server 
Reporter: Ali Syed
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1191.patch


 Seeing few of these NullPointerException during delta imports. Once this 
 happens delta import stops working and keeps giving the same error.
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:622)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:240)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:376)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:355)
 Running delta import for a particular entity fixes the problem and delta 
 import start working again.
 Here is the log just before  after the exception
 05/27 11:59:29 86987686 INFO  btpool0-538 org.apache.solr.core.SolrCore  - 
 [localhost] webapp=/solr path=/dataimport 
 params={command=delta-importoptimize=false} status=0 QTime=0
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DataImporter  - Starting Delta Import
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.SolrWriter  - Read dataimport.properties
 05/27 11:59:29 86987687 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: content
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987690 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: content rows obtained : 0
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: content
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: job
 05/27 11:59:29 86987692 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity job with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987704 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 12
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: job rows obtained : 0
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed parentDeltaQuery 
 for Entity: job
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Delta Import completed 
 successfully
 05/27 11:59:29 86987707 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Starting delta collection.
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Running ModifiedRowKey() for 
 Entity: user
 05/27 11:59:29 86987709 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Creating a connection 
 for entity user with URL: jdbc:sqlserver://localhost;databaseName=TestDB
 05/27 11:59:29 86987716 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.JdbcDataSource  - Time taken for 
 getConnection(): 7
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed ModifiedRowKey for 
 Entity: user rows obtained : 46
 05/27 11:59:29 86987873 INFO  Thread-4162 
 org.apache.solr.handler.dataimport.DocBuilder  - Completed DeletedRowKey for 
 Entity: user rows obtained : 0
 05/27 11:59:29 86987873 INFO  Thread-4162 
 

Re: [jira] Commented: (SOLR-2363) Rename the example dismax request handler instance

2011-02-15 Thread Bill Bell
Since the qt=dismax has specific qt fields, I would suggest we have a
qt=dismax that is plain vanilla, and one that is called qt=dismaxexample
with the fields.

On 2/15/11 7:01 AM, Ryan McKinley (JIRA) j...@apache.org wrote:


[ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994793#comm
ent-12994793 ] 

Ryan McKinley commented on SOLR-2363:
-

ah - my bad.

what about something more descriptive?  dismax is kinda cryptic.  maybe
'escaped', 'safe', or just 'query'

Though i'm not convinced it really needs changing -- we would also need
to update all the documentation that refers to ?qt=dismax

 Rename the example dismax request handler instance
 

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema
 Attachments: SOLR-2363.patch


 It is misleading that one of the requestHandlers in the example schema
is named the same as the queryParser dismax. It creates confusion as
to whether the use of defType=dismax vs qt=dismax. It would be better if
the example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2363) Rename the example dismax request handler instance

2011-02-15 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994817#comment-12994817
 ] 

Erick Erickson commented on SOLR-2363:
--

bq: Though i'm not convinced it really needs changing – we would also need to 
update all the documentation that refers to ?qt=dismax

I agree with Jan on this one. I distinctly remember having this confusion, and 
I've seen it go round multiple times on the user's list. Interestingly, I can't 
find anything on the Wiki where qt=dismax is in the text..

bq:  I would suggest we have a qt=dismax that is plain vanilla,

Please no! The whole point is to avoid the confusion over qt=dismax and 
defType=dismax. 



 Rename the example dismax request handler instance
 

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema
 Attachments: SOLR-2363.patch


 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1581) Facet by Function

2011-02-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994828#comment-12994828
 ] 

Grant Ingersoll commented on SOLR-1581:
---

actually, in looking at this, we don't need facet.range.function, we just need 
facet.range to take functions

 Facet by Function
 -

 Key: SOLR-1581
 URL: https://issues.apache.org/jira/browse/SOLR-1581
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: Next


 It would be really great if we could execute a function and quantize it into 
 buckets that could then be returned as facets.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Please mark distributed date faceting for 3.1

2011-02-15 Thread Smiley, David W.
Distributed date faceting now has a patch and is tested:
https://issues.apache.org/jira/browse/SOLR-1709
I'm posting to the dev list because I want a committer to mark this for 3.1.  I 
don't want to assume any of you guys see the comment activity.

~ David



[jira] Commented: (SOLR-2363) Rename the example dismax request handler instance

2011-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994837#comment-12994837
 ] 

Jan Høydahl commented on SOLR-2363:
---

Also we must remember that it's only (supposed to be) an EXAMPLE schema. It's 
where most people start learning about Solr, request handlers and the like - 
thus it should not be confusing, but rather have super-clear comments helping 
the user get going.

Also, per definition, changing this in the example schema will not break 
anything anywhere :)

qt=robust or qt=userfriendly or qt=onesearchbox could be other alternatives?

 Rename the example dismax request handler instance
 

 Key: SOLR-2363
 URL: https://issues.apache.org/jira/browse/SOLR-2363
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: dismax, example-schema
 Attachments: SOLR-2363.patch


 It is misleading that one of the requestHandlers in the example schema is 
 named the same as the queryParser dismax. It creates confusion as to 
 whether the use of defType=dismax vs qt=dismax. It would be better if the 
 example requestHandler was named e.g. dismaxexample

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Please mark distributed date faceting for 3.1

2011-02-15 Thread Robert Muir
On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Distributed date faceting now has a patch and is tested:

 https://issues.apache.org/jira/browse/SOLR-1709

 I’m posting to the dev list because I want a committer to mark this for
 3.1.  I don’t want to assume any of you guys see the comment activity.

Thanks very much for adding a test!

But, can't we just do this for 3.2 instead? I don't like the idea of
rushing features into 3.1 at the last minute because we are nearing a
release (0 open lucene issues, 2 open solr ones).

Right now the 3.x branch is feature-frozen for 3.1

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2105) RequestHandler param update.processor is confusing

2011-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994841#comment-12994841
 ] 

Jan Høydahl commented on SOLR-2105:
---

5-minute fix candidate for 3.1

Anyone vote for including this name change fix in the 3.1 release?
Custom update chains are very little in use out there so it's easier to change 
the name of the parameter now than later. Marking this change clearly in 
CHANGES.TXT should let anyone be able to catch up. A softer option is to leave 
the old param in there but deprecated.

 RequestHandler param update.processor is confusing
 --

 Key: SOLR-2105
 URL: https://issues.apache.org/jira/browse/SOLR-2105
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4.1
Reporter: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2105.patch


 Today we reference a custom updateRequestProcessorChain using the update 
 request parameter update.processor.
 See 
 http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section
 This is confusing, since what we are really referencing is not an 
 UpdateProcessor, but an updateRequestProcessorChain.
 I propose that update.processor is renamed as update.chain or similar

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Problem loading jcc from java : undefined symbol: PyExc_IOError

2011-02-15 Thread Roman Chyla
In the end, I compiled a new python with the necessary modules, and
that works just fine.
But it was an interesting experience. Thank you Andi, your help is always great.

Cheers,

  roman

On Tue, Feb 15, 2011 at 9:22 AM, Roman Chyla roman.ch...@gmail.com wrote:
 On Tue, Feb 15, 2011 at 4:22 AM, Andi Vajda va...@apache.org wrote:

 On Tue, 15 Feb 2011, Roman Chyla wrote:

 from:
 http://realmike.org/blog/2010/07/18/python-extensions-in-cpp-using-swig/

 Q. ?Fatal Python error: Interpreter not initialized (version mismatch?)?

 A. This error occurs when the version of the Python interpreter for
 which the extension module has been built is different from the
 version of the interpreter that attempts to import the module.

 Is there a way to find out which python interpreter version is inside
 JCC? Also, Is it somehow possible that the java process that load jcc
 library will be picking the default python (2.4) instead of the python
 (2.5)? PATH is set to python2.5.

 There is no Python interpreter inside jcc. It's dynamically linked.
 To know which version of the shared library is looked for and expected, use
 the 'ldd' utility against the various shared libraries involved to tell you.
 That version is selected at build time, when you run 'python setup.py ...'
 That version of python determines the version of libpython.so used.

 This will be probably the problem (as you said before), the libjcc.so
 shows no python -

 bash-3.2$ ldd build/lib.linux-x86_64-2.5/libjcc.so
        linux-vdso.so.1 =  (0x7fff7affc000)
        /$LIB/snoopy.so = /lib64/snoopy.so (0x2b8ed0e74000)
        libjava.so = 
 /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libjava.so
 (0x2b8ed1076000)
        libjvm.so = 
 /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/server/libjvm.so
 (0x2b8ed11a5000)
        libstdc++.so.6 = /usr/lib64/libstdc++.so.6 (0x2b8ed1c3f000)
        libm.so.6 = /lib64/libm.so.6 (0x2b8ed1f3f000)
        libgcc_s.so.1 = /lib64/libgcc_s.so.1 (0x2b8ed21c2000)
        libpthread.so.0 = /lib64/libpthread.so.0 (0x2b8ed23cf000)
        libc.so.6 = /lib64/libc.so.6 (0x2b8ed25eb000)
        libdl.so.2 = /lib64/libdl.so.2 (0x2b8ed2943000)
        libverify.so =
 /afs/cern.ch/user/r/rchyla/public/jdk1.6.0_18/jre/lib/amd64/libverify.so
 (0x2b8ed2b47000)
        libnsl.so.1 = /lib64/libnsl.so.1 (0x2b8ed2c57000)
        /lib64/ld-linux-x86-64.so.2 (0x2b8ed08c9000)

 And I think, the python2.4 (the default  on the system) is being
 loaded -- but how to force loading of python2.5 (if that was possible
 at all) I don't know. Compilation is definitely done with -lpython2.5

 Cheers,

  roman


 Andi..


 Cheers,

  roman


 On Tue, Feb 15, 2011 at 2:40 AM, Roman Chyla roman.ch...@gmail.com
 wrote:

 On Tue, Feb 15, 2011 at 1:32 AM, Andi Vajda va...@apache.org wrote:

 On Tue, 15 Feb 2011, Roman Chyla wrote:

 The python embedded in Java works really well on MacOsX and also
 Ubuntu. But I am trying hard to make it work also on Scientific Linux
 (SLC5) with *statically* built Python. The python is a build from
 ActiveState.

 You mean you're going to try to dynamically load libpython.a into a JVM
 ?
 I have no idea if this can work at all.

 I am very ignorant as far as the difference between statically and
 dynamically linked libraries go - I just wanted to use JCC wrapped
 code with this particular statically linked python

 I got little bit further, but just little:

 after I changed -Xlinker --export-dynamic into -Xlinker
 -export-dynamic (and installed python into /opt...) I am getting a
 different error:

 SEVERE: org.apache.jcc.PythonException: No module named
 solrpie.java_bridge
 null
        at org.apache.jcc.PythonVM.instantiate(Native Method)
        at rca.python.jni.PythonVMBridge.start(Unknown Source)
        at rca.python.jni.PythonVMBridge.start(Unknown Source)
        at rca.python.jni.PythonVMBridge.start(Unknown Source)
        at rca.python.jni.SolrpieVM.getBridge(Unknown Source)


 My understanding is that the previous error has gone (and the python
 module time is loaded), because if I set PYTHONPATH incorrectly, I
 get:
 This message is IMHO coming from Python

 But when I correct the PYTHONPATH, I am getting only this:

 [java] Fatal Python error: Interpreter not initialized (version
 mismatch?)
 [java] Java Result: 134




 If my understanding of static builds is correct, I'd imagine the only
 way
 for this to work would be to statically compile the JVM (hotspot) and
 python
 together.

 oooups, that is way over my head


 But why all this ?

 Because on the grid, we already had a statically linked python and it
 was working very well with pylucene (and after all, I managed to make
 it work also for solr and other packages)

 But if you think that it is not possible, I should do something else :)
 But it was fun trying, if you get some idea, please let me know.

 Thank you,

  Roman


 Andi..

 So far, I managed to build all the 

Fwd: Any contribs available for Range field type?

2011-02-15 Thread mike anderson
-- Forwarded message --
From: kenf_nc ken.fos...@realestate.com
Date: Tue, Feb 15, 2011 at 10:49 AM
Subject: Re: Any contribs available for Range field type?
To: solr-u...@lucene.apache.org



I've tried several times to get an active account on
solr-...@lucene.apache.org and the mailing list won't send me a confirmation
email, and therefore won't let me post because I'm not confirmed. Could I
get someone that is a member of Solr-Dev to post either my original request
in this thread, or a link to this thread on the Dev mailing list? I really
was hoping for more response than this to this question. This would be a
terrifically useful field type to just about any solr index.

Thanks,
Ken
--
View this message in context:
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2502203.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fwd: Any contribs available for Range field type?

2011-02-15 Thread mike anderson
-- Forwarded message --
From: kenf_nc ken.fos...@realestate.com
Date: Fri, Feb 11, 2011 at 8:49 AM
Subject: Any contribs available for Range field type?
To: solr-u...@lucene.apache.org



I have a huge need for a new field type. It would be a Poly field, similar
to
Point or Payload. It would take 2 data elements and a search would return a
hit if the search term fell within the range of the elements. For example
let's say I have a document representing an Employment record. I may want to
create a field for years_of_service where it would take values 1999,2004.
Then in a query q=years_of_service:2001 would be a hit,
q=years_of_service:2010 would not. The field would need to take a data type
attribute as a parameter. I may need to do integer ranges, float/double
ranges, date ranges. I don't see the need now, but heck maybe even a string
range. This would be useful for things like Event dates. An event often
occurs between several days (or hours) but the query is something like what
events are happening today. If I did q=event_date:NOW (or similar) it
should hit all documents where event_date has a range that in inclusive of
today. Another example would be product category document. A specific
automobile may have a fixed price, but a category of auto (2010 BMW 3-series
for example) would have a price range.

I hope you get the point. My question (finally) is, does anyone know of an
existing contribution to the public domain that already does this? I'm more
of a .Net/C# developer than a Java developer. I know my way around Java, but
don't really have the right tools to build/test/etc. So was hoping to borrow
rather than build if I could.

Thanks,
Ken
--
View this message in context:
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2473601.html
Sent from the Solr - User mailing list archive at Nabble.com.


[jira] Commented: (SOLR-2245) MailEntityProcessor Update

2011-02-15 Thread Peter Sturge (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994847#comment-12994847
 ] 

Peter Sturge commented on SOLR-2245:


I've been meaning to get back to this, as I have made some local updates to 
this that help performance.
Could you give me some feedback on these 2 questions please - it would be 
really useful:
  * Is there a committer's standard or similar spec that describes what tests 
should be included, and if so, could you point me to it please?
  I can then make sure I include appropriate tests
  * Is there a time-frame for committing for this or next release?
  I have a product release of my own coming fup or beg-March, so if I know 
the time-scales, I can plan accordingly.

Thanks!
Peter


 MailEntityProcessor Update
 --

 Key: SOLR-2245
 URL: https://issues.apache.org/jira/browse/SOLR-2245
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2

 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip


 This patch addresses a number of issues in the MailEntityProcessor 
 contrib-extras module.
 The changes are outlined here:
 * Added an 'includeContent' entity attribute to allow specifying content to 
 be included independently of processing attachments
  e.g. entity includeContent=true processAttachments=false . . . / 
 would include message content, but not attachment content
 * Added a synonym called 'processAttachments', which is synonymous to the 
 mis-spelled (and singular) 'processAttachement' property. This property 
 functions the same as processAttachement. Default= 'true' - if either is 
 false, then attachments are not processed. Note that only one of these should 
 really be specified in a given entity tag.
 * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
 unread, not deleted etc.), there is still a property value stored in the 
 'flags' field (the value is the string none)
 Note: there is a potential backward compat issue with FLAGS.NONE for clients 
 that expect the absence of the 'flags' field to mean 'Not read'. I'm 
 calculating this would be extremely rare, and is inadviasable in any case as 
 user flags can be arbitrarily set, so fixing it up now will ensure future 
 client access will be consistent.
 * The folder name of an email is now included as a field called 'folder' 
 (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
 processing
 * The addPartToDocument() method that processes attachments is significantly 
 re-written, as there looked to be no real way the existing code would ever 
 actually process attachment content and add it to the row data
 Tested on the 3.x trunk with a number of popular imap servers.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread DM Smith
Can we see more frequent releases? Can we look forward to a 3.2 release 
in a few months? Say May 15? That'd be a quarterly release cycle.
(Personally, I'd like to see Robert's improvement to the handling of 
Chinese as soon as possible.)

-- DM

On 02/15/2011 10:24 AM, Robert Muir wrote:

On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org  wrote:

Distributed date faceting now has a patch and is tested:

https://issues.apache.org/jira/browse/SOLR-1709

I’m posting to the dev list because I want a committer to mark this for
3.1.  I don’t want to assume any of you guys see the comment activity.

Thanks very much for adding a test!

But, can't we just do this for 3.2 instead? I don't like the idea of
rushing features into 3.1 at the last minute because we are nearing a
release (0 open lucene issues, 2 open solr ones).

Right now the 3.x branch is feature-frozen for 3.1

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2011-02-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994853#comment-12994853
 ] 

Bill Bell commented on SOLR-1709:
-

1 vote for 3.1

 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, FacetComponent.java, 
 ResponseBuilder.java, SOLR-1709_distributed_date_faceting_v3x.patch, 
 solr-1.4.0-solr-1709.patch


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Commented: (SOLR-2245) MailEntityProcessor Update

2011-02-15 Thread Bill Bell
3.1 may be too late

Bill Bell
Sent from mobile


On Feb 15, 2011, at 8:52 AM, Peter Sturge (JIRA) j...@apache.org wrote:

 
[ 
 https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994847#comment-12994847
  ] 
 
 Peter Sturge commented on SOLR-2245:
 
 
 I've been meaning to get back to this, as I have made some local updates to 
 this that help performance.
 Could you give me some feedback on these 2 questions please - it would be 
 really useful:
  * Is there a committer's standard or similar spec that describes what 
 tests should be included, and if so, could you point me to it please?
  I can then make sure I include appropriate tests
  * Is there a time-frame for committing for this or next release?
  I have a product release of my own coming fup or beg-March, so if I know 
 the time-scales, I can plan accordingly.
 
 Thanks!
 Peter
 
 
 MailEntityProcessor Update
 --
 
Key: SOLR-2245
URL: https://issues.apache.org/jira/browse/SOLR-2245
Project: Solr
 Issue Type: Improvement
 Components: contrib - DataImportHandler
   Affects Versions: 1.4, 1.4.1
   Reporter: Peter Sturge
   Priority: Minor
Fix For: 1.4.2
 
Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip
 
 
 This patch addresses a number of issues in the MailEntityProcessor 
 contrib-extras module.
 The changes are outlined here:
 * Added an 'includeContent' entity attribute to allow specifying content to 
 be included independently of processing attachments
 e.g. entity includeContent=true processAttachments=false . . . / 
 would include message content, but not attachment content
 * Added a synonym called 'processAttachments', which is synonymous to the 
 mis-spelled (and singular) 'processAttachement' property. This property 
 functions the same as processAttachement. Default= 'true' - if either is 
 false, then attachments are not processed. Note that only one of these 
 should really be specified in a given entity tag.
 * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
 unread, not deleted etc.), there is still a property value stored in the 
 'flags' field (the value is the string none)
 Note: there is a potential backward compat issue with FLAGS.NONE for clients 
 that expect the absence of the 'flags' field to mean 'Not read'. I'm 
 calculating this would be extremely rare, and is inadviasable in any case as 
 user flags can be arbitrarily set, so fixing it up now will ensure future 
 client access will be consistent.
 * The folder name of an email is now included as a field called 'folder' 
 (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
 processing
 * The addPartToDocument() method that processes attachments is significantly 
 re-written, as there looked to be no real way the existing code would ever 
 actually process attachment content and add it to the row data
 Tested on the 3.x trunk with a number of popular imap servers.
 
 -- 
 This message is automatically generated by JIRA.
 -
 For more information on JIRA, see: http://www.atlassian.com/software/jira
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread Bill Bell
I would love to see a release every 3 to 6 months too

Bill Bell
Sent from mobile


On Feb 15, 2011, at 8:55 AM, DM Smith dmsmith...@gmail.com wrote:

 Can we see more frequent releases? Can we look forward to a 3.2 release in a 
 few months? Say May 15? That'd be a quarterly release cycle.
 (Personally, I'd like to see Robert's improvement to the handling of Chinese 
 as soon as possible.)
 -- DM
 
 On 02/15/2011 10:24 AM, Robert Muir wrote:
 On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org  wrote:
 Distributed date faceting now has a patch and is tested:
 
 https://issues.apache.org/jira/browse/SOLR-1709
 
 I’m posting to the dev list because I want a committer to mark this for
 3.1.  I don’t want to assume any of you guys see the comment activity.
 Thanks very much for adding a test!
 
 But, can't we just do this for 3.2 instead? I don't like the idea of
 rushing features into 3.1 at the last minute because we are nearing a
 release (0 open lucene issues, 2 open solr ones).
 
 Right now the 3.x branch is feature-frozen for 3.1
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2922) Optimize BlockTermsReader.seek

2011-02-15 Thread Michael McCandless (JIRA)
Optimize BlockTermsReader.seek
--

 Key: LUCENE-2922
 URL: https://issues.apache.org/jira/browse/LUCENE-2922
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.0


When we seek, we first consult the terms index to find the right block
of 32 (default) terms that may hold the target term.  Then, we scan
that block looking for an exact match.

The scanning just uses next() and then compares the full term, but
this is actually rather wasteful.  First off, since all terms in the
block share a common prefix, we should compare the target against that
common prefix once, and then only compare the new suffix of each
term.  Second, since the term suffixes have already been read up front
into a byte[], we should do a no-copy comparison (vs today, where we
first read a copy into the local BytesRef and then compare).

With this opto, I removed the ability for BlockTermsWriter/Reader to
support arbitrary term sort order -- it's now hardwired to
BytesRef.utf8SortedAsUnicode.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2922) Optimize BlockTermsReader.seek

2011-02-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2922:
---

Attachment: LUCENE-2922.patch

Patch.

 Optimize BlockTermsReader.seek
 --

 Key: LUCENE-2922
 URL: https://issues.apache.org/jira/browse/LUCENE-2922
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2922.patch


 When we seek, we first consult the terms index to find the right block
 of 32 (default) terms that may hold the target term.  Then, we scan
 that block looking for an exact match.
 The scanning just uses next() and then compares the full term, but
 this is actually rather wasteful.  First off, since all terms in the
 block share a common prefix, we should compare the target against that
 common prefix once, and then only compare the new suffix of each
 term.  Second, since the term suffixes have already been read up front
 into a byte[], we should do a no-copy comparison (vs today, where we
 first read a copy into the local BytesRef and then compare).
 With this opto, I removed the ability for BlockTermsWriter/Reader to
 support arbitrary term sort order -- it's now hardwired to
 BytesRef.utf8SortedAsUnicode.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2922) Optimize BlockTermsReader.seek

2011-02-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994865#comment-12994865
 ] 

Michael McCandless commented on LUCENE-2922:


The opto is a big win for FuzzyQuery (and, automaton respeller):


||Query||QPS base||QPS opto||Pct diff
|united states|13.92|13.81|{color:red}-0.8%{color}|
|+united +states|20.59|20.55|{color:red}-0.2%{color}|
|united states|20.06|20.03|{color:red}-0.1%{color}|
|states|56.67|56.68|{color:green}0.0%{color}|
|united states~3|9.55|9.55|{color:green}0.0%{color}|
|uni*|17.67|17.71|{color:green}0.2%{color}|
|spanNear([unit, state], 10, true)|65.84|66.03|{color:green}0.3%{color}|
|unit*|31.50|31.62|{color:green}0.4%{color}|
|timesecnum:[1 TO 6]|10.88|10.93|{color:green}0.4%{color}|
|un*d|19.64|19.74|{color:green}0.5%{color}|
|title:.*[Uu]nited.*|1.48|1.49|{color:green}0.9%{color}|
|u*d|8.52|8.63|{color:green}1.3%{color}|
|+nebraska +states|230.99|235.15|{color:green}1.8%{color}|
|spanFirst(unit, 5)|289.74|300.65|{color:green}3.8%{color}|
|united~0.75|18.01|19.26|{color:green}7.0%{color}|
|unit~0.7|36.39|40.33|{color:green}10.8%{color}|
|united~0.6|14.15|15.73|{color:green}11.1%{color}|
|unit~0.5|24.99|29.82|{color:green}19.3%{color}|


 Optimize BlockTermsReader.seek
 --

 Key: LUCENE-2922
 URL: https://issues.apache.org/jira/browse/LUCENE-2922
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2922.patch


 When we seek, we first consult the terms index to find the right block
 of 32 (default) terms that may hold the target term.  Then, we scan
 that block looking for an exact match.
 The scanning just uses next() and then compares the full term, but
 this is actually rather wasteful.  First off, since all terms in the
 block share a common prefix, we should compare the target against that
 common prefix once, and then only compare the new suffix of each
 term.  Second, since the term suffixes have already been read up front
 into a byte[], we should do a no-copy comparison (vs today, where we
 first read a copy into the local BytesRef and then compare).
 With this opto, I removed the ability for BlockTermsWriter/Reader to
 support arbitrary term sort order -- it's now hardwired to
 BytesRef.utf8SortedAsUnicode.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread Mark Miller
More contributors contributin' will help us get there!

Release work is not glorious. Release work is not fun (most of it). Release 
discussions involve...*cough*...Maven...

Been there. Many hands make light work or something though.

Many want more releases - few have more time to give - that's my impression. 
Open Source - help scratch your itch is the best advice I can give.

- Mark

On Feb 15, 2011, at 11:04 AM, Bill Bell wrote:

 I would love to see a release every 3 to 6 months too
 
 Bill Bell
 Sent from mobile
 
 
 On Feb 15, 2011, at 8:55 AM, DM Smith dmsmith...@gmail.com wrote:
 
 Can we see more frequent releases? Can we look forward to a 3.2 release in a 
 few months? Say May 15? That'd be a quarterly release cycle.
 (Personally, I'd like to see Robert's improvement to the handling of Chinese 
 as soon as possible.)
 -- DM
 
 On 02/15/2011 10:24 AM, Robert Muir wrote:
 On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org  
 wrote:
 Distributed date faceting now has a patch and is tested:
 
 https://issues.apache.org/jira/browse/SOLR-1709
 
 I’m posting to the dev list because I want a committer to mark this for
 3.1.  I don’t want to assume any of you guys see the comment activity.
 Thanks very much for adding a test!
 
 But, can't we just do this for 3.2 instead? I don't like the idea of
 rushing features into 3.1 at the last minute because we are nearing a
 release (0 open lucene issues, 2 open solr ones).
 
 Right now the 3.x branch is feature-frozen for 3.1
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: wind down for 3.1?

2011-02-15 Thread Mark Miller

On Feb 12, 2011, at 7:38 PM, David Smiley (@MITRE.org) wrote:

 I don't want to overstep my role in this conversation (not being a committer
 as much as I want to be),


My advice? Purge both of these idea's from your head.

We don't like to talk about this subject around here much, but rebel that I am: 

Mark Miller's guide to becoming a Committer -

The simple answer:

Act like a Committer.


The long answer:

Lucene/Solr is not developed by Committers IMO. It's developed by contributors. 
It's measured by it's contributors.

Great contributors - great stewards - they will all become Committers over 
time. I don't think a lot of us really care about the time tables. Sometimes a 
name is nominated and some of us think - oh, I already thought he was a 
committer - or wow, it's about time.

What prompts the creation of a Committer is wide and varied. It might be as 
simple as someone is sick of committing all of your work. Committing others 
work takes time - and the shouldering of some responsibility. Being a Committer 
is more work than being a contributor in this way. In a lot of ways, it's an 
added burden - it's not just the convenience of being able to commit straight 
to svn. That is not really a convenience if you ask me. 

But honestly, a committer has no true weight over a regular contributor in 
Apache land. A respected member of the community can easily have the same 
influence as a respected committer IMO. Only PMC members have binding votes 
when lines are drawn in the sand. But again - great contributors - great 
stewards - they will all become PMC members too. And I don't think most of us 
are too worried about the time table. Great contributors will continue to 
contribute regardless of that time table in my experience. And over time, 
things are brought into line as they should be.

When the nominee is ready - when he shows that he gets the Apache way - that he 
fits into the community - that he has demonstrated enough merit - that's point 
in time one.

When the nominator is ready - when he see's or is prompted to act - when he 
feels comfortable putting his name out there for someone - that's point in time 
two.

These two points don't always coincide, much as we would like them too.

Persistence - it's the key to so many things. Lucene/Solr is like a cat farm, 
if such a things existed.

- Mark Miller
lucidimagination.com





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java

2011-02-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994903#comment-12994903
 ] 

Yonik Seeley commented on SOLR-1711:


bq. What about moving the queue.put() inside the synchronized(runners) block to 
fix this?

On second thought, that looks like a pretty bad idea ;-)
Looks like a recipe for deadlock since the runners lock will be held if put 
then blocks. 

 Race condition in 
 org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
 --

 Key: SOLR-1711
 URL: https://issues.apache.org/jira/browse/SOLR-1711
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4, 1.5
Reporter: Attila Babo
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 1.4.1, 1.5, 3.1, 4.0

 Attachments: StreamingUpdateSolrServer.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 While inserting a large pile of documents using StreamingUpdateSolrServer 
 there is a race condition as all Runner instances stop processing while the 
 blocking queue is full. With a high performance client this could happen 
 quite often, there is no way to recover from it at the client side.
 In StreamingUpdateSolrServer there is a BlockingQueue called queue to store 
 UpdateRequests, there are up to threadCount number of workers threads from 
 StreamingUpdateSolrServer.Runner to read that queue and push requests to a 
 Solr instance. If at one point the BlockingQueue is empty all workers stop 
 processing it and pushing the collected content to Solr which could be a time 
 consuming process, sometimes all worker threads are waiting for Solr. If at 
 this time the client fills the BlockingQueue to full all worker threads will 
 quit without processing any further and the main thread will block forever.
 There is a simple, well tested patch attached to handle this situation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread DM Smith

Mark,
I understand what you are saying. In this case, there are two issues 
that are not making it into 3.1 because they landed too late. After the 
freeze. The contributions appear to be done. So, the itch at this point 
needs to be scratched by one or more committers, to commit the changes 
and to act as release manager.


It appears to me, that the effort to commit the contributions are 
minimal, and that in this case the true cost is that of doing the release.


As to release discussions involving maven: if the next release were in a 
couple of months and nothing had been contributed to make maven better, 
why would it even need to be discussed. The last decision could still 
stand. I think it is the long time between releases that bring up the 
same intensity on the maven discussion.


-- DM

On 02/15/2011 12:08 PM, Mark Miller wrote:

More contributors contributin' will help us get there!

Release work is not glorious. Release work is not fun (most of it). Release 
discussions involve...*cough*...Maven...

Been there. Many hands make light work or something though.

Many want more releases - few have more time to give - that's my impression. 
Open Source - help scratch your itch is the best advice I can give.

- Mark

On Feb 15, 2011, at 11:04 AM, Bill Bell wrote:


I would love to see a release every 3 to 6 months too

Bill Bell
Sent from mobile


On Feb 15, 2011, at 8:55 AM, DM Smithdmsmith...@gmail.com  wrote:


Can we see more frequent releases? Can we look forward to a 3.2 release in a few months? 
Say May 15? That'd be a quarterly release cycle.
(Personally, I'd like to see Robert's improvement to the handling of Chinese as 
soon as possible.)
-- DM

On 02/15/2011 10:24 AM, Robert Muir wrote:

On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org   wrote:

Distributed date faceting now has a patch and is tested:

https://issues.apache.org/jira/browse/SOLR-1709

I’m posting to the dev list because I want a committer to mark this for
3.1.  I don’t want to assume any of you guys see the comment activity.

Thanks very much for adding a test!

But, can't we just do this for 3.2 instead? I don't like the idea of
rushing features into 3.1 at the last minute because we are nearing a
release (0 open lucene issues, 2 open solr ones).

Right now the 3.x branch is feature-frozen for 3.1



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2105) RequestHandler param update.processor is confusing

2011-02-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994914#comment-12994914
 ] 

Mark Miller commented on SOLR-2105:
---

I like this change.

Can you leave update.processor in but deprecated? Perhaps print log warning if 
it's detected.

Then we could make a hard change in 4.X perhaps?

 RequestHandler param update.processor is confusing
 --

 Key: SOLR-2105
 URL: https://issues.apache.org/jira/browse/SOLR-2105
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4.1
Reporter: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2105.patch


 Today we reference a custom updateRequestProcessorChain using the update 
 request parameter update.processor.
 See 
 http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section
 This is confusing, since what we are really referencing is not an 
 UpdateProcessor, but an updateRequestProcessorChain.
 I propose that update.processor is renamed as update.chain or similar

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2249) ArrayIndexOutOfBoundsException thrown instead of useful FieldCache exception when too many terms

2011-02-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2249.


   Resolution: Fixed
Fix Version/s: (was: 4.0)
   3.1

strictly speaking, this has already been fixed in 3.1 - AIOOBE is no longer 
thrown when using field cache.

the related issues track the more specific tasks of dealing with the various 
uses of FieldCache in solr to throw errors when appropriate.

 ArrayIndexOutOfBoundsException thrown instead of useful FieldCache exception 
 when too many terms 
 -

 Key: SOLR-2249
 URL: https://issues.apache.org/jira/browse/SOLR-2249
 Project: Solr
  Issue Type: Bug
  Components: clients - php
Affects Versions: 1.4.1
 Environment: Windows 7
Reporter: Anees shoukat
Assignee: Hoss Man
 Fix For: 3.1


 when attempting to sort, or otherwise use the FieldCache on a field that has 
 more terms then documents, Solr currently propogates an AIOOBE 
 (ArrayIndexOutOfBoundsException) all the way to the user

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread Mark Miller

On Feb 15, 2011, at 1:00 PM, DM Smith wrote:

 Mark,
 I understand what you are saying. In this case, there are two issues that are 
 not making it into 3.1 because they landed too late. After the freeze. The 
 contributions appear to be done. So, the itch at this point needs to be 
 scratched by one or more committers, to commit the changes and to act as 
 release manager.


But it's after the freeze? I'm not sure the contributions are %100 done either. 
Often these things need to be iterated on a bit once a committer takes a look. 
And are we sure we are happy with the level of the tests? If these are coming 
up as candidates after the freeze, I lean towards Roberts line of thinking...

By all means, shape them up, add tests, etc - that's the only hope they have - 
but I wouldn't expect them to get in. Many feel that nailing a release as soon 
as can be done is more important than last minute additions. If you can't find 
a sympathetic committer, sometimes, them is indeed the breaks. A feature freeze 
got a lazy consensus go ahead - I'm not sure we want to consider much more than 
bugs at this point...but thats just me.

 
 It appears to me, that the effort to commit the contributions are minimal, 
 and that in this case the true cost is that of doing the release.

Heh. I think looks can be deceiving sometimes. I'm not sure I'm willing to hold 
the responsibility of those commits right now. If someone else is, that's great 
... but I don't find them minimal enough for my taste I suppose ;) Depends on 
what areas you feel comfortable with I guess.

 
 As to release discussions involving maven: if the next release were in a 
 couple of months and nothing had been contributed to make maven better, why 
 would it even need to be discussed. The last decision could still stand. I 
 think it is the long time between releases that bring up the same intensity 
 on the maven discussion.

Heh - I wish things where that simple.

 
 -- DM
 
 On 02/15/2011 12:08 PM, Mark Miller wrote:
 More contributors contributin' will help us get there!
 
 Release work is not glorious. Release work is not fun (most of it). Release 
 discussions involve...*cough*...Maven...
 
 Been there. Many hands make light work or something though.
 
 Many want more releases - few have more time to give - that's my impression. 
 Open Source - help scratch your itch is the best advice I can give.
 
 - Mark
 
 On Feb 15, 2011, at 11:04 AM, Bill Bell wrote:
 
 I would love to see a release every 3 to 6 months too
 
 Bill Bell
 Sent from mobile
 
 
 On Feb 15, 2011, at 8:55 AM, DM Smithdmsmith...@gmail.com  wrote:
 
 Can we see more frequent releases? Can we look forward to a 3.2 release in 
 a few months? Say May 15? That'd be a quarterly release cycle.
 (Personally, I'd like to see Robert's improvement to the handling of 
 Chinese as soon as possible.)
 -- DM
 
 On 02/15/2011 10:24 AM, Robert Muir wrote:
 On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W.dsmi...@mitre.org   
 wrote:
 Distributed date faceting now has a patch and is tested:
 
 https://issues.apache.org/jira/browse/SOLR-1709
 
 I’m posting to the dev list because I want a committer to mark this for
 3.1.  I don’t want to assume any of you guys see the comment activity.
 Thanks very much for adding a test!
 
 But, can't we just do this for 3.2 instead? I don't like the idea of
 rushing features into 3.1 at the last minute because we are nearing a
 release (0 open lucene issues, 2 open solr ones).
 
 Right now the 3.x branch is feature-frozen for 3.1
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java

2011-02-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reopened SOLR-1711:



 Race condition in 
 org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
 --

 Key: SOLR-1711
 URL: https://issues.apache.org/jira/browse/SOLR-1711
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4, 1.5
Reporter: Attila Babo
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 1.4.1, 1.5, 3.1, 4.0

 Attachments: StreamingUpdateSolrServer.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 While inserting a large pile of documents using StreamingUpdateSolrServer 
 there is a race condition as all Runner instances stop processing while the 
 blocking queue is full. With a high performance client this could happen 
 quite often, there is no way to recover from it at the client side.
 In StreamingUpdateSolrServer there is a BlockingQueue called queue to store 
 UpdateRequests, there are up to threadCount number of workers threads from 
 StreamingUpdateSolrServer.Runner to read that queue and push requests to a 
 Solr instance. If at one point the BlockingQueue is empty all workers stop 
 processing it and pushing the collected content to Solr which could be a time 
 consuming process, sometimes all worker threads are waiting for Solr. If at 
 this time the client fills the BlockingQueue to full all worker threads will 
 quit without processing any further and the main thread will block forever.
 There is a simple, well tested patch attached to handle this situation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread Robert Muir
On Tue, Feb 15, 2011 at 1:33 PM, Mark Miller markrmil...@gmail.com wrote:

 It appears to me, that the effort to commit the contributions are minimal, 
 and that in this case the true cost is that of doing the release.

 Heh. I think looks can be deceiving sometimes. I'm not sure I'm willing to 
 hold the responsibility of those commits right now. If someone else is, 
 that's great ... but I don't find them minimal enough for my taste I suppose 
 ;) Depends on what areas you feel comfortable with I guess.


Right, this is why some features with functional patches are sitting
targeted at 3.2 instead of 3.1. Is it possible that we could put
distributed date faceting (SOLR-1709), better cjk handling out of box
(LUCENE-2906), and a better default merge policy (LUCENE-854) all in
3.1 right now? sure it is.

But is this the best decision... I don't think it is. I think as far
as 3.1 goes we already have a great set of features that have baked
for some time, including some rather serious performance improvements
(Mike and I have done some benchmarking against 3.0)... and its
already going to be a more challenging release since its the first one
since we merged lucene and solr.

For these newer features, its not that we are lazy... its that
sometimes you want more tests, want things to bake for a while with
hudson's random testing, perhaps want some reviews/second pairs of
eyes on the code, or maybe even just some more time to think about the
change before committing to it.

When we commit it and release it, we are signing up for some degree of
support in the future. Also, personally I think its better to put out
a good release with solid code and a few less features, than a more
buggy release that has a couple of extra features.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java

2011-02-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1711:
---

Attachment: SOLR-1711.patch

Here's a patch that uses offer instead of put in a retry loop.

 Race condition in 
 org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
 --

 Key: SOLR-1711
 URL: https://issues.apache.org/jira/browse/SOLR-1711
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4, 1.5
Reporter: Attila Babo
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 1.4.1, 1.5, 3.1, 4.0

 Attachments: SOLR-1711.patch, StreamingUpdateSolrServer.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 While inserting a large pile of documents using StreamingUpdateSolrServer 
 there is a race condition as all Runner instances stop processing while the 
 blocking queue is full. With a high performance client this could happen 
 quite often, there is no way to recover from it at the client side.
 In StreamingUpdateSolrServer there is a BlockingQueue called queue to store 
 UpdateRequests, there are up to threadCount number of workers threads from 
 StreamingUpdateSolrServer.Runner to read that queue and push requests to a 
 Solr instance. If at one point the BlockingQueue is empty all workers stop 
 processing it and pushing the collected content to Solr which could be a time 
 consuming process, sometimes all worker threads are waiting for Solr. If at 
 this time the client fills the BlockingQueue to full all worker threads will 
 quit without processing any further and the main thread will block forever.
 There is a simple, well tested patch attached to handle this situation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2348:
---

Attachment: SOLR-2348.patch

patch with needed functionality, breaks some tests (most likely tests abusing 
multiValued field types)...

{noformat}
hossman@bester:~/lucene/dev/solr$ grep -L Failures: 0, Errors: 0 
build/test-results/TEST-org.apache.solr.*
build/test-results/TEST-org.apache.solr.schema.PolyFieldTest.txt
build/test-results/TEST-org.apache.solr.search.function.distance.DistanceFunctionTest.txt
build/test-results/TEST-org.apache.solr.search.function.SortByFunctionTest.txt
build/test-results/TEST-org.apache.solr.search.QueryParsingTest.txt
build/test-results/TEST-org.apache.solr.search.SpatialFilterTest.txt
build/test-results/TEST-org.apache.solr.search.TestIndexSearcher.txt
build/test-results/TEST-org.apache.solr.search.TestQueryTypes.txt
{noformat}

 No error reported when using a FieldCached backed ValueSource for a field 
 Solr knows won't work
 ---

 Key: SOLR-2348
 URL: https://issues.apache.org/jira/browse/SOLR-2348
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0

 Attachments: SOLR-2348.patch


 For the same reasons outlined in SOLR-2339, Solr FieldTypes that return 
 FieldCached backed ValueSources should explicitly check for situations where 
 knows the FieldCache is meaningless.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2348:
---

Fix Version/s: (was: 3.2)
   3.1

i'm actively working on this today .. moving back in line for 3.1

 No error reported when using a FieldCached backed ValueSource for a field 
 Solr knows won't work
 ---

 Key: SOLR-2348
 URL: https://issues.apache.org/jira/browse/SOLR-2348
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0

 Attachments: SOLR-2348.patch


 For the same reasons outlined in SOLR-2339, Solr FieldTypes that return 
 FieldCached backed ValueSources should explicitly check for situations where 
 knows the FieldCache is meaningless.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread DM Smith

On 02/15/2011 02:07 PM, Robert Muir wrote:

On Tue, Feb 15, 2011 at 1:33 PM, Mark Millermarkrmil...@gmail.com  wrote:

It appears to me, that the effort to commit the contributions are minimal, and 
that in this case the true cost is that of doing the release.

Heh. I think looks can be deceiving sometimes. I'm not sure I'm willing to hold 
the responsibility of those commits right now. If someone else is, that's great 
... but I don't find them minimal enough for my taste I suppose ;) Depends on 
what areas you feel comfortable with I guess.


Right, this is why some features with functional patches are sitting
targeted at 3.2 instead of 3.1. Is it possible that we could put
distributed date faceting (SOLR-1709), better cjk handling out of box
(LUCENE-2906), and a better default merge policy (LUCENE-854) all in
3.1 right now? sure it is.

But is this the best decision... I don't think it is.

Nor do I. I'm fine with the freeze.


  I think as far
as 3.1 goes we already have a great set of features that have baked
for some time, including some rather serious performance improvements
(Mike and I have done some benchmarking against 3.0)... and its
already going to be a more challenging release since its the first one
since we merged lucene and solr.

For these newer features, its not that we are lazy...


I did not mean to suggest that anyone is lazy. Far from it, the effort 
that goes into this project is impressive.



its that
sometimes you want more tests, want things to bake for a while with
hudson's random testing, perhaps want some reviews/second pairs of
eyes on the code, or maybe even just some more time to think about the
change before committing to it.
I have a personal interest in LUCENE-2906. If there is anything I can do 
to help it along, I'll be glad to do that. I'll take it up on that issue.

When we commit it and release it, we are signing up for some degree of
support in the future. Also, personally I think its better to put out
a good release with solid code and a few less features, than a more
buggy release that has a couple of extra features.


As I said, I'm happy with 3.1 being frozen. This release is much more 
timely. :) In the past, I saw releases being repeatedly pushed out to 
get one last thing in. (Maybe it just appeared that way to me.)


-- DM


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Release 3.2 (was Re: Please mark distributed date faceting for 3.1)

2011-02-15 Thread Robert Muir
On Tue, Feb 15, 2011 at 2:34 PM, DM Smith dmsmith...@gmail.com wrote:
 I have a personal interest in LUCENE-2906. If there is anything I can do to
 help it along, I'll be glad to do that. I'll take it up on that issue.

thanks DM, I know I promised to update the patch after solving the
subtask, and haven't yet done this. I'll try to do this tonight.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2272) Join

2011-02-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2272:
---

Attachment: SOLR-2272.patch

bq. However, it doesn't apply on current trunk any more.

Here's a refresh.

 Join
 

 Key: SOLR-2272
 URL: https://issues.apache.org/jira/browse/SOLR-2272
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-2272.patch, SOLR-2272.patch


 Limited join functionality for Solr, mapping one set of IDs matching a query 
 to another set of IDs, based on the indexed tokens of the fields.
 Example:
 fq={!join  from=parent_ptr to:parent_id}child_doc:query

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2105) RequestHandler param update.processor is confusing

2011-02-15 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994983#comment-12994983
 ] 

Ryan McKinley commented on SOLR-2105:
-

+1

 RequestHandler param update.processor is confusing
 --

 Key: SOLR-2105
 URL: https://issues.apache.org/jira/browse/SOLR-2105
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4.1
Reporter: Jan Høydahl
Priority: Minor
 Attachments: SOLR-2105.patch


 Today we reference a custom updateRequestProcessorChain using the update 
 request parameter update.processor.
 See 
 http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section
 This is confusing, since what we are really referencing is not an 
 UpdateProcessor, but an updateRequestProcessorChain.
 I propose that update.processor is renamed as update.chain or similar

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2272) Join

2011-02-15 Thread Bojan Smid (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994984#comment-12994984
 ] 

Bojan Smid commented on SOLR-2272:
--

Great, thx a lot Yonik :).

 Join
 

 Key: SOLR-2272
 URL: https://issues.apache.org/jira/browse/SOLR-2272
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-2272.patch, SOLR-2272.patch


 Limited join functionality for Solr, mapping one set of IDs matching a query 
 to another set of IDs, based on the indexed tokens of the fields.
 Example:
 fq={!join  from=parent_ptr to:parent_id}child_doc:query

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Any contribs available for Range field type?

2011-02-15 Thread Smiley, David W.
solr-dev is the old list; it's now just dev.  The old one forwards to the 
new list though.
~ David

From: mike anderson [mailto:saidthero...@gmail.com]
Sent: Tuesday, February 15, 2011 10:51 AM
To: solr-...@lucene.apache.org
Cc: ken.fos...@realestate.com
Subject: Fwd: Any contribs available for Range field type?


-- Forwarded message --
From: kenf_nc ken.fos...@realestate.commailto:ken.fos...@realestate.com
Date: Tue, Feb 15, 2011 at 10:49 AM
Subject: Re: Any contribs available for Range field type?
To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org



I've tried several times to get an active account on
solr-...@lucene.apache.orgmailto:solr-...@lucene.apache.org and the mailing 
list won't send me a confirmation
email, and therefore won't let me post because I'm not confirmed. Could I
get someone that is a member of Solr-Dev to post either my original request
in this thread, or a link to this thread on the Dev mailing list? I really
was hoping for more response than this to this question. This would be a
terrifically useful field type to just about any solr index.

Thanks,
Ken
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-type-tp2473601p2502203.html
Sent from the Solr - User mailing list archive at Nabble.com.



[jira] Resolved: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java

2011-02-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1711.


   Resolution: Fixed
Fix Version/s: (was: 1.4.1)
   (was: 1.5)

Committed the latest patch - hopefully that finally fixes this issue!

 Race condition in 
 org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
 --

 Key: SOLR-1711
 URL: https://issues.apache.org/jira/browse/SOLR-1711
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4, 1.5
Reporter: Attila Babo
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 3.1, 4.0

 Attachments: SOLR-1711.patch, StreamingUpdateSolrServer.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 While inserting a large pile of documents using StreamingUpdateSolrServer 
 there is a race condition as all Runner instances stop processing while the 
 blocking queue is full. With a high performance client this could happen 
 quite often, there is no way to recover from it at the client side.
 In StreamingUpdateSolrServer there is a BlockingQueue called queue to store 
 UpdateRequests, there are up to threadCount number of workers threads from 
 StreamingUpdateSolrServer.Runner to read that queue and push requests to a 
 Solr instance. If at one point the BlockingQueue is empty all workers stop 
 processing it and pushing the collected content to Solr which could be a time 
 consuming process, sometimes all worker threads are waiting for Solr. If at 
 this time the client fills the BlockingQueue to full all worker threads will 
 quit without processing any further and the main thread will block forever.
 There is a simple, well tested patch attached to handle this situation.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2155:


Attachment: SOLR.2155.p3tests.patch

Test cases for geomultidist() function.

Add this and SOLR.2155.p3.patch

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2348) No error reported when using a FieldCached backed ValueSource for a field Solr knows won't work

2011-02-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2348:
---

Attachment: SOLR-2348.patch

Updated patch that fixes the test failures.

For the most part, this is fairly straight forward: tests that were abusing 
multiValued fields as if they were single valued.

The one situation where i made a genuine code change was in 
AbstractSubTypeFieldType and the way it deals with the subFieldType 
attribute.  When it's used, the registerPolyFieldDynamicPrototype function 
registers a new dynamic field based on the specified fieldType instance.  I 
updated the properties used to generate these dynamicFields so that it 
explicitly specifies multiValued=false (it was already specifying indexed=true 
and stored=false)

I could have just updated the test schemas so that the fieldType specified was 
already multiValued, but i think this makes more sense from a functional 
standpoint.  the existing code already enabled a use case like this...

{noformat}
fieldType name=double class=solr.TrieDoubleField indexed=false 
multiValued=false ... /
fieldType name=xy class=solr.PointType dimension=2 
subFieldType=double/
{noformat}

...so it makes sense that this should work equally well automatically...

{noformat}
fieldType name=double class=solr.TrieDoubleField indexed=true 
multiValued=true ... /
fieldType name=xy class=solr.PointType dimension=2 
subFieldType=double/
{noformat}




 No error reported when using a FieldCached backed ValueSource for a field 
 Solr knows won't work
 ---

 Key: SOLR-2348
 URL: https://issues.apache.org/jira/browse/SOLR-2348
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0

 Attachments: SOLR-2348.patch, SOLR-2348.patch


 For the same reasons outlined in SOLR-2339, Solr FieldTypes that return 
 FieldCached backed ValueSources should explicitly check for situations where 
 knows the FieldCache is meaningless.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-15 Thread hao yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao yan updated LUCENE-2903:


Attachment: LUCENE-2903.patch

This new patch provides PForDeltaFixedIntBlockWithIntBufferCodec 
(PatchedFrameOfRef4) which improves the performance of previous 
couterparts(PatchedFrameOfRef4,5,6). Note that the PatchedFrameOfRef4 is 
different from the previous PatchedFrameOfRef4. 

 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch, LUCENE-2903.patch, LUCENE_2903.patch, 
 LUCENE_2903.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-15 Thread hao yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao yan updated LUCENE-2903:


Attachment: LUCENE-2903.patch

This patch improves the performance of previous PatchedFrameOfRef4 and removed 
the PatchedFrameOfRef5 and PatchedFrameOfRef6. Now the performance 
ofPatchedFrameOfRef4 is better than BulkVInt and comparable to 
PatchedFrameOfRef in my tests.

 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-15 Thread hao yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao yan updated LUCENE-2903:


Attachment: (was: LUCENE_2903.patch)

 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-15 Thread hao yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao yan updated LUCENE-2903:


Attachment: (was: LUCENE-2903.patch)

 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2903) Improvement of PForDelta Codec

2011-02-15 Thread hao yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao yan updated LUCENE-2903:


Attachment: (was: LUCENE-2903.patch)

 Improvement of PForDelta Codec
 --

 Key: LUCENE-2903
 URL: https://issues.apache.org/jira/browse/LUCENE-2903
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: hao yan
 Attachments: LUCENE-2903.patch


 There are 3 versions of PForDelta implementations in the Bulk Branch: 
 FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
 The FrameOfRef is a very basic one which is essentially a binary encoding 
 (may result in huge index size).
 The PatchedFrameOfRef is the implmentation based on the original version of 
 PForDelta in the literatures.
 The PatchedFrameOfRef2 is my previous implementation which are improved this 
 time. (The Codec name is changed to NewPForDelta.).
 In particular, the changes are:
 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
 old PForDelta does not support very large exceptions (since
 the Simple16 does not support very large numbers). Now this has been fixed in 
 the new LCPForDelta.
 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
 two PForDelta implementation in the bulk branch (FrameOfRef and 
 PatchedFrameOfRef). The codec's name is NewPForDelta, as you can see in the 
 CodecProvider and PForDeltaFixedIntBlockCodec.
 3. The performance test results are:
 1) My NewPForDelta codec is faster then FrameOfRef and PatchedFrameOfRef 
 for almost all kinds of queries, slightly worse then BulkVInt.
 2) My NewPForDelta codec can result in the smallest index size among all 4 
 methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
 3) All performance test results are achieved by running with -server 
 instead of -client

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be

2011-02-15 Thread Hoss Man (JIRA)
lib dir=.../ directives are logging serious errors when they should not be
--

 Key: SOLR-2364
 URL: https://issues.apache.org/jira/browse/SOLR-2364
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0


The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly designed 
so that it would *not* log errors if the directory (or jars in that directory) 
didn't exist -- this was designed to make it possible to have a {{lib/}} 
directive that would optionally include jars if they are not there, and ignore 
them if they can't be found ({{lib path=foo/bar.jar.../}} can be used when 
you have an explict jar you want to load and you want an error if it's not 
there)

At some point in the not too distant past, something seems to have changed on 
both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader 
works, such that in the example you get errors logged like this...

{noformat}
Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader
SEVERE: Can't find (or read) file to add to classloader: /total/crap/dir/ignored
{noformat}

This is in spite of hte fact that the solrconfig.xml says...

{noformat}
  !-- If a dir option (with or without a regex) is used and nothing is found
   that matches, it will be ignored
--
  lib dir=../../contrib/clustering/lib/downloads/ /
  lib dir=../../contrib/clustering/lib/ /
  lib dir=/total/crap/dir/ignored / 
{noformat}

Note these errors are also logged when running the example, even though there 
are no {{lib/}} declarations that corrispond to them -- they seem to be 
errors coming from the default behavior of looking for $solr_home/lib (which is 
evidently happening twice?)...

{noformat}
Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to 'solr/'
Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader
SEVERE: Can't find (or read) file to add to classloader: solr/./lib
Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to 'solr/./'
Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader addToClassLoader
SEVERE: Can't find (or read) file to add to classloader: solr/././lib
{noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2011-02-15 Thread Jon Druse (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995098#comment-12995098
 ] 

Jon Druse commented on LUCENE-1824:
---

Has this had any progress?  I'm dealing with the same issues.  Or is there a 
workaround?  Thanks!

 FastVectorHighlighter truncates words at beginning and end of fragments
 ---

 Key: LUCENE-1824
 URL: https://issues.apache.org/jira/browse/LUCENE-1824
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
 Environment: any
Reporter: Alex Vigdor
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-1824.patch


 FastVectorHighlighter does not take word boundaries into consideration when 
 building fragments, so that in most cases the first and last word of a 
 fragment are truncated.  This makes the highlights less legible than they 
 should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
 by expanding the start and end boundaries of the fragment to the first 
 whitespace character on either side of the fragment, or the beginning or end 
 of the source text, whichever comes first.  This significantly improves 
 legibility, at the cost of returning a slightly larger number of characters 
 than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be

2011-02-15 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995102#comment-12995102
 ] 

Koji Sekiguchi commented on SOLR-2364:
--

Ah, sorry. I've committed the change.

http://svn.apache.org/viewvc?view=revisionrevision=1069656
http://svn.apache.org/viewvc?view=revisionrevision=1069657

I didn't know the background. I'll see now if I can revert it...

 lib dir=.../ directives are logging serious errors when they should not be
 --

 Key: SOLR-2364
 URL: https://issues.apache.org/jira/browse/SOLR-2364
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0


 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly 
 designed so that it would *not* log errors if the directory (or jars in that 
 directory) didn't exist -- this was designed to make it possible to have a 
 {{lib/}} directive that would optionally include jars if they are not 
 there, and ignore them if they can't be found ({{lib 
 path=foo/bar.jar.../}} can be used when you have an explict jar you want 
 to load and you want an error if it's not there)
 At some point in the not too distant past, something seems to have changed on 
 both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader 
 works, such that in the example you get errors logged like this...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: 
 /total/crap/dir/ignored
 {noformat}
 This is in spite of hte fact that the solrconfig.xml says...
 {noformat}
   !-- If a dir option (with or without a regex) is used and nothing is found
that matches, it will be ignored
 --
   lib dir=../../contrib/clustering/lib/downloads/ /
   lib dir=../../contrib/clustering/lib/ /
   lib dir=/total/crap/dir/ignored / 
 {noformat}
 Note these errors are also logged when running the example, even though there 
 are no {{lib/}} declarations that corrispond to them -- they seem to be 
 errors coming from the default behavior of looking for $solr_home/lib (which 
 is evidently happening twice?)...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/./lib
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/./'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/././lib
 {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be

2011-02-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995103#comment-12995103
 ] 

Hoss Man commented on SOLR-2364:


This seems to have been caused by the following commits on Feb 11...

http://svn.apache.org/viewvc?view=revisionrevision=1069656
http://svn.apache.org/viewvc?view=revisionrevision=1069657

...which koji attributed to SOLR-1449, even though that issue (which added the 
{{lib/}} feature was resolved back in 2009 and was included in Solr 1.4.1.

I really don't know why Koji did that ... as far as i'm concerned this is a 
break in compatibility: the whole point of how these directives were setup was 
to support the possibility of directories not existing. (and the examples 
documented them as working that way)

Unless i here a strong reason to the contrary, i plan to revert those commits.

 lib dir=.../ directives are logging serious errors when they should not be
 --

 Key: SOLR-2364
 URL: https://issues.apache.org/jira/browse/SOLR-2364
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0


 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly 
 designed so that it would *not* log errors if the directory (or jars in that 
 directory) didn't exist -- this was designed to make it possible to have a 
 {{lib/}} directive that would optionally include jars if they are not 
 there, and ignore them if they can't be found ({{lib 
 path=foo/bar.jar.../}} can be used when you have an explict jar you want 
 to load and you want an error if it's not there)
 At some point in the not too distant past, something seems to have changed on 
 both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader 
 works, such that in the example you get errors logged like this...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: 
 /total/crap/dir/ignored
 {noformat}
 This is in spite of hte fact that the solrconfig.xml says...
 {noformat}
   !-- If a dir option (with or without a regex) is used and nothing is found
that matches, it will be ignored
 --
   lib dir=../../contrib/clustering/lib/downloads/ /
   lib dir=../../contrib/clustering/lib/ /
   lib dir=/total/crap/dir/ignored / 
 {noformat}
 Note these errors are also logged when running the example, even though there 
 are no {{lib/}} declarations that corrispond to them -- they seem to be 
 errors coming from the default behavior of looking for $solr_home/lib (which 
 is evidently happening twice?)...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/./lib
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/./'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/././lib
 {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be

2011-02-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-2364:
--

Assignee: Koji Sekiguchi  (was: Hoss Man)

 lib dir=.../ directives are logging serious errors when they should not be
 --

 Key: SOLR-2364
 URL: https://issues.apache.org/jira/browse/SOLR-2364
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Koji Sekiguchi
 Fix For: 3.1, 4.0


 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly 
 designed so that it would *not* log errors if the directory (or jars in that 
 directory) didn't exist -- this was designed to make it possible to have a 
 {{lib/}} directive that would optionally include jars if they are not 
 there, and ignore them if they can't be found ({{lib 
 path=foo/bar.jar.../}} can be used when you have an explict jar you want 
 to load and you want an error if it's not there)
 At some point in the not too distant past, something seems to have changed on 
 both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader 
 works, such that in the example you get errors logged like this...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: 
 /total/crap/dir/ignored
 {noformat}
 This is in spite of hte fact that the solrconfig.xml says...
 {noformat}
   !-- If a dir option (with or without a regex) is used and nothing is found
that matches, it will be ignored
 --
   lib dir=../../contrib/clustering/lib/downloads/ /
   lib dir=../../contrib/clustering/lib/ /
   lib dir=/total/crap/dir/ignored / 
 {noformat}
 Note these errors are also logged when running the example, even though there 
 are no {{lib/}} declarations that corrispond to them -- they seem to be 
 errors coming from the default behavior of looking for $solr_home/lib (which 
 is evidently happening twice?)...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/./lib
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/./'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/././lib
 {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be

2011-02-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995105#comment-12995105
 ] 

Hoss Man commented on SOLR-2364:


Koji: thanks. 

FWIW: attributing a commit to an issue that was resolved two years ago doesn't 
seem like a good idea in any situation -- filling a new bug to track the change 
(wheter you considered it a bug or and improvement) would have made this 
more noticable.

If you think we should have an option to control whether it complains or not 
when trying to load libs out of a dir i'm open to suggestions -- but let's 
track that as a new issue.

 lib dir=.../ directives are logging serious errors when they should not be
 --

 Key: SOLR-2364
 URL: https://issues.apache.org/jira/browse/SOLR-2364
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0


 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly 
 designed so that it would *not* log errors if the directory (or jars in that 
 directory) didn't exist -- this was designed to make it possible to have a 
 {{lib/}} directive that would optionally include jars if they are not 
 there, and ignore them if they can't be found ({{lib 
 path=foo/bar.jar.../}} can be used when you have an explict jar you want 
 to load and you want an error if it's not there)
 At some point in the not too distant past, something seems to have changed on 
 both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader 
 works, such that in the example you get errors logged like this...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: 
 /total/crap/dir/ignored
 {noformat}
 This is in spite of hte fact that the solrconfig.xml says...
 {noformat}
   !-- If a dir option (with or without a regex) is used and nothing is found
that matches, it will be ignored
 --
   lib dir=../../contrib/clustering/lib/downloads/ /
   lib dir=../../contrib/clustering/lib/ /
   lib dir=/total/crap/dir/ignored / 
 {noformat}
 Note these errors are also logged when running the example, even though there 
 are no {{lib/}} declarations that corrispond to them -- they seem to be 
 errors coming from the default behavior of looking for $solr_home/lib (which 
 is evidently happening twice?)...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/./lib
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/./'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/././lib
 {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: wind down for 3.1?

2011-02-15 Thread Chris Hostetter

: 1. javadocs warnings/errors: this is a constant battle, its worth
: considering if the build should actually fail if you get one of these,
: in my opinion if we can do this we really should. its frustrating to

for a brief period we did, and then we rolled it back...

https://issues.apache.org/jira/browse/LUCENE-875

: 2. introducing new compiler warnings: another problem just being left
: for someone else to clean up later, another constant losing battle.
: 99% of the time (for non-autogenerated code) the warnings are
: useful... in my opinion we should not commit patches that create new
: warnings.

it's hard to spot new compiler warnings when there are already so many 
... if we can get down to 0 then we can add hacks to make hte build fail 
if someone adds 1 but until then we have an uphill battle.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2364) lib dir=.../ directives are logging serious errors when they should not be

2011-02-15 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2364.
--

Resolution: Fixed

The reverts were committed. trunk:1071121, 3x:1071122.
Thanks Hoss for taking your time for this issue!

 lib dir=.../ directives are logging serious errors when they should not be
 --

 Key: SOLR-2364
 URL: https://issues.apache.org/jira/browse/SOLR-2364
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Koji Sekiguchi
 Fix For: 3.1, 4.0


 The {{lib dir=foo ... /}} syntax for solrconfig.xml was specificly 
 designed so that it would *not* log errors if the directory (or jars in that 
 directory) didn't exist -- this was designed to make it possible to have a 
 {{lib/}} directive that would optionally include jars if they are not 
 there, and ignore them if they can't be found ({{lib 
 path=foo/bar.jar.../}} can be used when you have an explict jar you want 
 to load and you want an error if it's not there)
 At some point in the not too distant past, something seems to have changed on 
 both the 3x and trunk branches in how SolrResourceLoader.replaceClassLoader 
 works, such that in the example you get errors logged like this...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: 
 /total/crap/dir/ignored
 {noformat}
 This is in spite of hte fact that the solrconfig.xml says...
 {noformat}
   !-- If a dir option (with or without a regex) is used and nothing is found
that matches, it will be ignored
 --
   lib dir=../../contrib/clustering/lib/downloads/ /
   lib dir=../../contrib/clustering/lib/ /
   lib dir=/total/crap/dir/ignored / 
 {noformat}
 Note these errors are also logged when running the example, even though there 
 are no {{lib/}} declarations that corrispond to them -- they seem to be 
 errors coming from the default behavior of looking for $solr_home/lib (which 
 is evidently happening twice?)...
 {noformat}
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/./lib
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'solr/./'
 Feb 15, 2011 4:52:03 PM org.apache.solr.core.SolrResourceLoader 
 addToClassLoader
 SEVERE: Can't find (or read) file to add to classloader: solr/././lib
 {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2011-02-15 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned LUCENE-1824:
--

Assignee: Koji Sekiguchi

 FastVectorHighlighter truncates words at beginning and end of fragments
 ---

 Key: LUCENE-1824
 URL: https://issues.apache.org/jira/browse/LUCENE-1824
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
 Environment: any
Reporter: Alex Vigdor
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-1824.patch


 FastVectorHighlighter does not take word boundaries into consideration when 
 building fragments, so that in most cases the first and last word of a 
 fragment are truncated.  This makes the highlights less legible than they 
 should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
 by expanding the start and end boundaries of the fragment to the first 
 whitespace character on either side of the fragment, or the beginning or end 
 of the source text, whichever comes first.  This significantly improves 
 legibility, at the cost of returning a slightly larger number of characters 
 than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1553) extended dismax query parser

2011-02-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995125#comment-12995125
 ] 

Hoss Man commented on SOLR-1553:


bq. I'll keep the issue open in 3.1 for a few more days as discussed, then i'm 
moving it out.

it would be less confusing to just resolve it as fixed, and open new issues to 
track the outstanding problems/bugs/questions.

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 1.5, 3.1, 4.0

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch, 
 edismax.unescapedcolon.bug.test.patch, edismax.unescapedcolon.bug.test.patch, 
 edismax.userFields.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Any contribs available for Range field type?

2011-02-15 Thread Bill Bell
I did a similar thing at Kaango.com (classified system). The idea that I
used was to use dynamic fields based on type, and load them into SOLR.

For example,

Autos:
s_auto_make - String
s_auto_model - String
l_auto_year - Long

Real Estate:
l_real_estate_bedrooms - Long
l_read_estate_baths - Long

You get the idea. I created these by using the DIH handler and adding a
script at the top of the file that would take the field from the database,
and rename it based on what is was. Then I would load it into SOLR as a
dynamic field.

Then for the facets, I would configure the name of the dynamic fields that
need to be pulled (with facet.field or query).

For ranges: facet.query=l_years_of_service:[1999 TO 2004] or [1999 TO *]

That is how I solved a similar problem (if I understand the issue).

Bill


From:  mike anderson saidthero...@gmail.com
Reply-To:  dev@lucene.apache.org
Date:  Tue, 15 Feb 2011 10:51:43 -0500
To:  solr-...@lucene.apache.org
Cc:  ken.fos...@realestate.com
Subject:  Fwd: Any contribs available for Range field type?




-- Forwarded message --
From: kenf_nc ken.fos...@realestate.com
Date: Fri, Feb 11, 2011 at 8:49 AM
Subject: Any contribs available for Range field type?
To: solr-u...@lucene.apache.org



I have a huge need for a new field type. It would be a Poly field, similar
to
Point or Payload. It would take 2 data elements and a search would return a
hit if the search term fell within the range of the elements. For example
let's say I have a document representing an Employment record. I may want
to
create a field for years_of_service where it would take values 1999,2004.
Then in a query q=years_of_service:2001 would be a hit,
q=years_of_service:2010 would not. The field would need to take a data type
attribute as a parameter. I may need to do integer ranges, float/double
ranges, date ranges. I don't see the need now, but heck maybe even a string
range. This would be useful for things like Event dates. An event often
occurs between several days (or hours) but the query is something like
what
events are happening today. If I did q=event_date:NOW (or similar) it
should hit all documents where event_date has a range that in inclusive of
today. Another example would be product category document. A specific
automobile may have a fixed price, but a category of auto (2010 BMW
3-series
for example) would have a price range.

I hope you get the point. My question (finally) is, does anyone know of an
existing contribution to the public domain that already does this? I'm more
of a .Net/C# developer than a Java developer. I know my way around Java,
but
don't really have the right tools to build/test/etc. So was hoping to
borrow
rather than build if I could.

Thanks,
Ken
--
View this message in context:
http://lucene.472066.n3.nabble.com/Any-contribs-available-for-Range-field-t
ype-tp2473601p2473601.html

Sent from the Solr - User mailing list archive at Nabble.com.





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: MultiValued FC question

2011-02-15 Thread Bill Bell
I'll ask another wayŠ

If I use termsIndex. There does not appear to be a way to get a list of
terms for a document and field easily using ValueSource. Is that right?

If not, how would I go about getting a list of terms for a field in a
document? When I do fq=ids:56 it appears to work on multivalued fields. How
does it work?

Thanks.

From:  Bill Bell billnb...@gmail.com
Reply-To:  dev@lucene.apache.org
Date:  Sun, 13 Feb 2011 02:31:57 -0700
To:  dev@lucene.apache.org dev@lucene.apache.org
Subject:  MultiValued FC question

(I posted on solr-user by mistake)

I am working on https://issues.apache.org/jira/browse/SOLR-2155

Trying to get a list of multiValued fields from the cacheŠ

ValueSource vs = sf.getType().getValueSource(sf, fp);
DocValues llVals = vs.getValues(context, reader);
org.apache.lucene.spatial.geohash.GeoHashUtils.decode(llVals.strVal(doc));

 public String strVal(int doc) {
int ord=termsIndex.getOrd(doc);
if (ord == 0) {
  return null;
} else {
  return termsIndex.lookup(ord, new BytesRef()).utf8ToString();
}
  }

I figure the problem is that lookup only returns one. I need more than 1Š I
thought ./lucene/src/java/org/apache/lucene/document/Document.java would
help me, but it didn't much. Would I want to call getFieldables(name) ? Or
would that slow down the caching? Thoughts?

1. What is termIndex ? Why does ord() matter?
2. Is there a helper for getting a multiValue field from the Field cache?

The strVal(doc) only returns one of the multiValues. Thought one of you
gurus might know the answer.

Thanks.

Bill






RE: Please mark distributed date faceting for 3.1

2011-02-15 Thread Smiley, David W.
I may have added a test just now, but I and others have been using this 
[simple] code for some time now.  It has baked, it doesn't need more baking 
IMO.  If this patch wasn't the biggest reason to not use distributed search (a 
key feature) then I wouldn't be here arguing my point.  But I've apparently 
lost this argument already so I give up;... assign if for 3.2 if that's the 
best you can do Rob. It's better than being unassigned which is what it is now. 
 

~ David


From: Robert Muir [rcm...@gmail.com]
Sent: Tuesday, February 15, 2011 10:24 AM
To: dev@lucene.apache.org
Subject: Re: Please mark distributed date faceting for 3.1

On Tue, Feb 15, 2011 at 10:10 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Distributed date faceting now has a patch and is tested:

 https://issues.apache.org/jira/browse/SOLR-1709

 I’m posting to the dev list because I want a committer to mark this for
 3.1.  I don’t want to assume any of you guys see the comment activity.

Thanks very much for adding a test!

But, can't we just do this for 3.2 instead? I don't like the idea of
rushing features into 3.1 at the last minute because we are nearing a
release (0 open lucene issues, 2 open solr ones).

Right now the 3.x branch is feature-frozen for 3.1

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Why does DIH jar end up in Solr war?

2011-02-15 Thread Smiley, David W.
I noticed that the DIH .jar file ends up in the .war file.  It ends up this way 
because the DIH's build.xml copies it into a place so that it ultimately winds 
up there.  This seems like an odd thing because no other contrib module gets 
this special treatment.  I noticed that the dataimport.jsp has a trivial 
dependency on the DataImportHandler class for an instanceof check that could be 
replaced with a string comparison of the class name. With that in place, this 
JSP won't error out if the DIH is not included.  So does someone have a reason? 
 In the absence of a good one, I suggest this needless exception be removed on 
the basis of consistency.
~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Why does DIH jar end up in Solr war?

2011-02-15 Thread Erik Hatcher
Well, it doesn't really make any sense for dataimport.jsp to be in the WAR file 
if DIH isn't there (will it really work being loaded, and friends, by 
SolrResourceLoader)?

Erik

On Feb 16, 2011, at 00:57 , Smiley, David W. wrote:

 I noticed that the DIH .jar file ends up in the .war file.  It ends up this 
 way because the DIH's build.xml copies it into a place so that it ultimately 
 winds up there.  This seems like an odd thing because no other contrib module 
 gets this special treatment.  I noticed that the dataimport.jsp has a trivial 
 dependency on the DataImportHandler class for an instanceof check that could 
 be replaced with a string comparison of the class name. With that in place, 
 this JSP won't error out if the DIH is not included.  So does someone have a 
 reason?  In the absence of a good one, I suggest this needless exception be 
 removed on the basis of consistency.
 ~ David Smiley
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Why does DIH jar end up in Solr war?

2011-02-15 Thread Smiley, David W.
Yes it would be a slight anomaly for this .jsp file (a small text file) to be 
there but not the jar file.  But that feels like a better trade than this 
contrib module being the only contrib module that has it's jar file within 
Solr's war.
I don't see what issue there would be regarding SolrResourceLoader.  I tried 
out what I'm talking about and used example-DIH with the DIH .jar file in the 
multicore lib directory of that example and I used the db core fine.
I can submit a patch in JIRA if you're agreeable.  
~ David

From: Erik Hatcher [erik.hatc...@gmail.com]
Sent: Wednesday, February 16, 2011 1:06 AM
To: dev@lucene.apache.org
Subject: Re: Why does DIH jar end up in Solr war?

Well, it doesn't really make any sense for dataimport.jsp to be in the WAR file 
if DIH isn't there (will it really work being loaded, and friends, by 
SolrResourceLoader)?

Erik

On Feb 16, 2011, at 00:57 , Smiley, David W. wrote:

 I noticed that the DIH .jar file ends up in the .war file.  It ends up this 
 way because the DIH's build.xml copies it into a place so that it ultimately 
 winds up there.  This seems like an odd thing because no other contrib module 
 gets this special treatment.  I noticed that the dataimport.jsp has a trivial 
 dependency on the DataImportHandler class for an instanceof check that could 
 be replaced with a string comparison of the class name. With that in place, 
 this JSP won't error out if the DIH is not included.  So does someone have a 
 reason?  In the absence of a good one, I suggest this needless exception be 
 removed on the basis of consistency.
 ~ David Smiley
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2365) DIH should not be in the Solr war

2011-02-15 Thread David Smiley (JIRA)
DIH should not be in the Solr war
-

 Key: SOLR-2365
 URL: https://issues.apache.org/jira/browse/SOLR-2365
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor


The DIH has a build.xml that puts itself into the Solr war file.  This is the 
only contrib module that does this, and I don't think it should be this way. 
Granted there is a small dataimport.jsp file that would be most convenient to 
remain included, but the jar should not be.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2365) DIH should not be in the Solr war

2011-02-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2365:
---

Attachment: SOLR-2365_DIH_should_not_be_in_war.patch

This patch removes the line in the DIH build.xml that includes its jar file 
into the war. It makes a simple fix to dataimport.jsp so that it does not have 
a compile-time dependency on the DIH. And in example-DIH,  adds some dih jar 
file references via lib directives.

 DIH should not be in the Solr war
 -

 Key: SOLR-2365
 URL: https://issues.apache.org/jira/browse/SOLR-2365
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch


 The DIH has a build.xml that puts itself into the Solr war file.  This is the 
 only contrib module that does this, and I don't think it should be this way. 
 Granted there is a small dataimport.jsp file that would be most convenient to 
 remain included, but the jar should not be.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2365) DIH should not be in the Solr war

2011-02-15 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995198#comment-12995198
 ] 

Erik Hatcher commented on SOLR-2365:


Since DIH worked out of the box with Solr 1.4.x, we probably want to keep it 
that way moving forward (for now).  We should put the lib directive into 
Solr's main example solrconfig.xml (just we do with clustering, Solr Cell, etc) 
also.

Other than that, no objections to this.  

[tangent, but ideally we can eventually get all Solr UI to be Velocity 
generated, and plugins can then ship with their own .vm files in the JAR file 
to add in something like a dataimport.jsp]

 DIH should not be in the Solr war
 -

 Key: SOLR-2365
 URL: https://issues.apache.org/jira/browse/SOLR-2365
 Project: Solr
  Issue Type: Improvement
  Components: Build
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-2365_DIH_should_not_be_in_war.patch


 The DIH has a build.xml that puts itself into the Solr war file.  This is the 
 only contrib module that does this, and I don't think it should be this way. 
 Granted there is a small dataimport.jsp file that would be most convenient to 
 remain included, but the jar should not be.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session

2011-02-15 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch updated LUCENE-2881:
--

Attachment: lucene-2881.patch

I fixed a bug in FieldInfos that could lead to wrong field numbers, that might 
have been related to the wrong behavior you're seeing, Simon.

About codecIds:  I made the fix to FieldInfo.clone() to set the codecId on the 
clone.  I also made FieldInfo.codecId private and added getter and setter.  The 
setter checks whether the new value for codecId is different from the previous 
one, and throws in exception in that case (unless it was set to the default 0 
before, which I think means Preflex codec).

All tests pass.  Please let me know if that fixes your problem.  If not then 
you should at least see the new exception that I added, which might make 
debugging easier.

 Track FieldInfo per segment instead of per-IW-session
 -

 Key: LUCENE-2881
 URL: https://issues.apache.org/jira/browse/LUCENE-2881
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: Realtime Branch, CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Michael Busch
 Fix For: Realtime Branch, CSF branch, 4.0

 Attachments: lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, 
 lucene-2881.patch


 Currently FieldInfo is tracked per IW session to guarantee consistent global 
 field-naming / ordering. IW carries FI instances over from previous segments 
 which also carries over field properties like isIndexed etc. While having 
 consistent field ordering per IW session appears to be important due to bulk 
 merging stored fields etc. carrying over other properties might become 
 problematic with Lucene's Codec support.  Codecs that rely on consistent 
 properties in FI will fail if FI properties are carried over.
 The DocValuesCodec (DocValuesBranch) for instance writes files per segment 
 and field (using the field id within the file name). Yet, if a segment has no 
 DocValues indexed in a particular segment but a previous segment in the same 
 IW session had DocValues, FieldInfo#docValues will be true  since those 
 values are reused from previous segments. 
 We already work around this limitation in SegmentInfo with properties like 
 hasVectors or hasProx which is really something we should manage per Codec  
 Segment. Ideally FieldInfo would be managed per Segment and Codec such that 
 its properties are valid per segment. It also seems to be necessary to bind 
 FieldInfoS to SegmentInfo logically since its really just per segment 
 metadata.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >