Solr nightly build failure

2009-06-27 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 83 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 375 source files to /tmp/apache-solr-nightly/build/solr
[javac] 
/tmp/apache-solr-nightly/src/java/org/apache/solr/search/SolrIndexSearcher.java:627:
 cannot find symbol
[javac] symbol  : method simplifyQuery(org.apache.lucene.search.Query)
[javac] location: class org.apache.solr.search.QueryUtils
[javac] query = QueryUtils.simplifyQuery(query);
[javac]   ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error

BUILD FAILED
/tmp/apache-solr-nightly/build.xml:137: The following error occurred while 
executing this line:
/tmp/apache-solr-nightly/common-build.xml:155: Compile failed; see the compiler 
error output for details.

Total time: 10 seconds




Build failed in Hudson: Solr-trunk #845

2009-06-27 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/845/changes

Changes:

[yonik] SOLR-1248: fix IndexReader ref counting

--
[...truncated 1898 lines...]
AUclient/ruby/flare/public/images/pie_92.png
AUclient/ruby/flare/public/images/pie_56.png
AUclient/ruby/flare/public/images/pie_93.png
AUclient/ruby/flare/public/images/pie_57.png
AUclient/ruby/flare/public/images/pie_94.png
AUclient/ruby/flare/public/images/pie_58.png
AUclient/ruby/flare/public/images/pie_95.png
AUclient/ruby/flare/public/images/pie_59.png
AUclient/ruby/flare/public/images/pie_96.png
AUclient/ruby/flare/public/images/pie_97.png
AUclient/ruby/flare/public/images/pie_0.png
AUclient/ruby/flare/public/images/pie_98.png
AUclient/ruby/flare/public/images/pie_1.png
AUclient/ruby/flare/public/images/pie_99.png
AUclient/ruby/flare/public/images/pie_2.png
AUclient/ruby/flare/public/images/pie_3.png
AUclient/ruby/flare/public/images/pie_4.png
AUclient/ruby/flare/public/images/pie_5.png
AUclient/ruby/flare/public/images/pie_6.png
AUclient/ruby/flare/public/images/pie_7.png
AUclient/ruby/flare/public/images/pie_8.png
AUclient/ruby/flare/public/images/pie_9.png
AUclient/ruby/flare/public/images/pie_20.png
AUclient/ruby/flare/public/images/pie_21.png
AUclient/ruby/flare/public/images/pie_22.png
AUclient/ruby/flare/public/images/pie_23.png
AUclient/ruby/flare/public/images/pie_60.png
AUclient/ruby/flare/public/images/pie_24.png
AUclient/ruby/flare/public/images/pie_61.png
AUclient/ruby/flare/public/images/pie_25.png
AUclient/ruby/flare/public/images/pie_62.png
AUclient/ruby/flare/public/images/pie_26.png
AUclient/ruby/flare/public/images/pie_63.png
AUclient/ruby/flare/public/images/pie_27.png
AUclient/ruby/flare/public/images/pie_64.png
AUclient/ruby/flare/public/images/pie_28.png
AUclient/ruby/flare/public/images/pie_29.png
AUclient/ruby/flare/public/images/pie_65.png
AUclient/ruby/flare/public/images/pie_66.png
AUclient/ruby/flare/public/images/pie_67.png
AUclient/ruby/flare/public/images/pie_68.png
AUclient/ruby/flare/public/images/pie_69.png
AUclient/ruby/flare/public/images/pie_30.png
AUclient/ruby/flare/public/images/pie_31.png
AUclient/ruby/flare/public/images/pie_32.png
AUclient/ruby/flare/public/images/pie_33.png
AUclient/ruby/flare/public/images/pie_34.png
AUclient/ruby/flare/public/images/pie_70.png
AUclient/ruby/flare/public/images/pie_35.png
AUclient/ruby/flare/public/images/pie_71.png
AUclient/ruby/flare/public/images/pie_36.png
AUclient/ruby/flare/public/images/pie_72.png
AUclient/ruby/flare/public/images/pie_37.png
AUclient/ruby/flare/public/images/pie_73.png
AUclient/ruby/flare/public/images/pie_38.png
AUclient/ruby/flare/public/images/pie_74.png
AUclient/ruby/flare/public/images/pie_39.png
AUclient/ruby/flare/public/images/pie_75.png
AUclient/ruby/flare/public/images/pie_76.png
AUclient/ruby/flare/public/images/pie_77.png
AUclient/ruby/flare/public/images/pie_78.png
AUclient/ruby/flare/public/images/x-close.gif
AUclient/ruby/flare/public/dispatch.fcgi
A client/ruby/flare/public/robots.txt
A client/ruby/flare/public/500.html
A client/ruby/flare/public/javascripts
A client/ruby/flare/public/javascripts/prototype.js
A client/ruby/flare/public/javascripts/effects.js
A client/ruby/flare/public/javascripts/dragdrop.js
A client/ruby/flare/public/javascripts/application.js
A client/ruby/flare/public/javascripts/controls.js
A client/ruby/flare/public/404.html
A client/ruby/flare/public/.htaccess
A client/ruby/flare/public/stylesheets
A client/ruby/flare/public/stylesheets/flare.css
A client/ruby/flare/public/favicon.ico
A client/ruby/solr-ruby
A client/ruby/solr-ruby/solr
A client/ruby/solr-ruby/solr/conf
AUclient/ruby/solr-ruby/solr/conf/schema.xml
A client/ruby/solr-ruby/solr/conf/protwords.txt
A client/ruby/solr-ruby/solr/conf/stopwords.txt
AUclient/ruby/solr-ruby/solr/conf/solrconfig.xml
A client/ruby/solr-ruby/solr/conf/xslt
A client/ruby/solr-ruby/solr/conf/xslt/example.xsl
A client/ruby/solr-ruby/solr/conf/scripts.conf
A client/ruby/solr-ruby/solr/conf/admin-extra.html
A client/ruby/solr-ruby/solr/conf/synonyms.txt
A client/ruby/solr-ruby/solr/lib
A client/ruby/solr-ruby/test
A client/ruby/solr-ruby/test/unit
A client/ruby/solr-ruby/test/unit/standard_response_test.rb

Re: Solr nightly build failure

2009-06-27 Thread Yonik Seeley
My fault - committed part of a future patch with my last commit.  Fixing now...
-Yonik

On Sat, Jun 27, 2009 at 4:05 AM, solr-dev@lucene.apache.org wrote:

 init-forrest-entities:
    [mkdir] Created dir: /tmp/apache-solr-nightly/build
    [mkdir] Created dir: /tmp/apache-solr-nightly/build/web

 compile-solrj:
    [mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
    [javac] Compiling 83 source files to /tmp/apache-solr-nightly/build/solrj
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

 compile:
    [mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
    [javac] Compiling 375 source files to /tmp/apache-solr-nightly/build/solr
    [javac] 
 /tmp/apache-solr-nightly/src/java/org/apache/solr/search/SolrIndexSearcher.java:627:
  cannot find symbol
    [javac] symbol  : method simplifyQuery(org.apache.lucene.search.Query)
    [javac] location: class org.apache.solr.search.QueryUtils
    [javac]     query = QueryUtils.simplifyQuery(query);
    [javac]                       ^
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 1 error

 BUILD FAILED
 /tmp/apache-solr-nightly/build.xml:137: The following error occurred while 
 executing this line:
 /tmp/apache-solr-nightly/common-build.xml:155: Compile failed; see the 
 compiler error output for details.

 Total time: 10 seconds





[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-06-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724847#action_12724847
 ] 

Yonik Seeley commented on SOLR-769:
---

The response structure is a bit funny (it's like normal XML, which we don't 
really use in Solr-land), and certainly not optimal for JSON responses:

{code}
 clusters:[
  cluster,[
labels,[
 label,DDR],
docs,[
 doc,TWINX2048-3200PRO,
 doc,VS1GB400C3,
 doc,VDBDB1A16]],
  cluster,[
labels,[
 label,Car Power Adapter],
docs,[
 doc,F8V7067-APL-KIT,
 doc,IW-02]],
[...]
{code}

Is labels  is needed because there could be multiple labels per cluster in 
the future?  ( I assume yes)
Do we need more per-doc information than just the id?  (I assume no)
Could we want other per-cluster information in the future (I assume yes)
What other possible information could be added in the future?

Given the assumptions above, clusters, docs, and labels should all be 
arrays instead of NamedLists (the names are just repeated redundant info).
All of the remaining NamedLists(just each cluster) should be a 
SimpleOrderedMap since access by key is more important than order... that will 
give us something along the lines of:

{code}
clusters : [
{ labels : [DDR],
docs:[TWINX2048-3200PRO,VS1GB400C3,VDBDB1A16]
}
,
{ labels : [Car Power Adapter],
docs:[F8V7067-APL-KIT,IW-02]
}
]
{code}

Make sense?

 Support Document and Search Result clustering
 -

 Key: SOLR-769
 URL: https://issues.apache.org/jira/browse/SOLR-769
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
 clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip


 Clustering is a useful tool for working with documents and search results, 
 similar to the notion of dynamic faceting.  Carrot2 
 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
 search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
 suited for whole-corpus clustering.  
 The patch I lays out a contrib module that starts off w/ an integration of a 
 SearchComponent for doing clustering and an implementation using Carrot.  In 
 search results mode, it will use the DocList as the input for the cluster.   
 While Carrot2 comes w/ a Solr input component, it is not the same as the 
 SearchComponent that I have in that the Carrot example actually submits a 
 query to Solr, whereas my SearchComponent is just chained into the Component 
 list and uses the ResponseBuilder to add in the cluster results.
 While not fully fleshed out yet, the collection based mode will take in a 
 list of ids or just use the whole collection and will produce clusters.  
 Since this is a longer, typically offline task, there will need to be some 
 type of storage mechanism (and replication??) for the clusters.  I _may_ 
 push this off to a separate JIRA issue, but I at least want to present the 
 use case as part of the design of this component/contrib.  It may even make 
 sense that we split this out, such that the building piece is something like 
 an UpdateProcessor and then the SearchComponent just acts as a lookup 
 mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-06-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724854#action_12724854
 ] 

Yonik Seeley commented on SOLR-769:
---

I hit an error trying to cluster some documents I added with solr cell - 400 
unknown field Author.
Seems like it would be nice if we could handle unknown field types gracefully?

 Support Document and Search Result clustering
 -

 Key: SOLR-769
 URL: https://issues.apache.org/jira/browse/SOLR-769
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
 clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip


 Clustering is a useful tool for working with documents and search results, 
 similar to the notion of dynamic faceting.  Carrot2 
 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
 search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
 suited for whole-corpus clustering.  
 The patch I lays out a contrib module that starts off w/ an integration of a 
 SearchComponent for doing clustering and an implementation using Carrot.  In 
 search results mode, it will use the DocList as the input for the cluster.   
 While Carrot2 comes w/ a Solr input component, it is not the same as the 
 SearchComponent that I have in that the Carrot example actually submits a 
 query to Solr, whereas my SearchComponent is just chained into the Component 
 list and uses the ResponseBuilder to add in the cluster results.
 While not fully fleshed out yet, the collection based mode will take in a 
 list of ids or just use the whole collection and will produce clusters.  
 Since this is a longer, typically offline task, there will need to be some 
 type of storage mechanism (and replication??) for the clusters.  I _may_ 
 push this off to a separate JIRA issue, but I at least want to present the 
 use case as part of the design of this component/contrib.  It may even make 
 sense that we split this out, such that the building piece is something like 
 an UpdateProcessor and then the SearchComponent just acts as a lookup 
 mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-284) Parsing Rich Document Types

2009-06-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724855#action_12724855
 ] 

Yonik Seeley commented on SOLR-284:
---

Not sure if I should open a new issue or keep improvements here.
I think we need to improve the OOTB experience with this...
http://search.lucidimagination.com/search/document/302440b8a2451908/solr_cell

Ideas for improvement:
- auto-mapping names of the form Last-Modified to a more solrish field name 
like last_modified
- drop ext. from parameter names, and revisit naming to try and unify with 
other update handlers like CSV
  note: in the future, one could see generic functionality like boosting 
fields, setting field value defaults, etc, being handled by a generic component 
or update processor... all the better reason to drop the ext prefix.
-  I imagine that metadata is normally useful, so we should
  1. predefine commonly used metadata fields in the example schema... there's 
really no cost to this
  2. use mappings to normalize any metadata names (if such normalization isn't 
already done in Tika)
  3. ignore or drop fields that have little use
  4. provide a way to handle new attributes w/o dropping them or throwing an 
error
- enable the handler by default - lazy to avoid a dependency on having all the 
tika libs available


 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, 
 test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Cell

2009-06-27 Thread Yonik Seeley
On Fri, Jun 26, 2009 at 6:49 PM, Yonik Seeleyyo...@lucidimagination.com wrote:
 Finally getting around to reviewing Solr Cell

 ext.ignore.und.fl looks like it defaults to true instead of false

Ah... it was actually defined as a default in the request handler

Anyway, I returned some of this discussion to
https://issues.apache.org/jira/browse/SOLR-284
(284... wow has this issue been on the burner for a while ;-)

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-284) Parsing Rich Document Types

2009-06-27 Thread Eric Pugh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724856#action_12724856
 ] 

Eric Pugh commented on SOLR-284:


I am out of the office 6/29 - 6/30.  For urgent issues, please contact
Jason Hull at jh...@opensourceconnections.com or phone at (434)
409-8451.


 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, 
 test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Cell

2009-06-27 Thread Yonik Seeley
On Fri, Jun 26, 2009 at 8:17 PM, Erik Hatchere...@ehatchersolutions.com wrote:
 Seems like just a dynamic * mapping would suffice in this case, but
 dynamic field pattern would be fine with me too.

Using * removes the ability to detect field-naming errors.
We should be able to allow both.

 naming:
 -  fl originally stood for field list in Solr, yet I see it being
 used for single fields?
 -  do we really need to proceed all param names with ext.?

 Yeah, I've commented on the ext.ens.ive parameter names before too.  It's
 not pretty to have to flatten a set of parameters into a single namespace
 though.  hl.*, facet.*, v.* (for VelocityResponseWriter) etc.  But...

response writers and search components need to coexist with eachother
and it can be nice to both avoid collisions and tell at a glance what
component a parameter is targeted at... but it doesn't seem
particularly necessary for update handlers.

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-284) Parsing Rich Document Types

2009-06-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724859#action_12724859
 ] 

Yonik Seeley commented on SOLR-284:
---

ext.capture seems problematic in that one needs a separate ext.map statement to 
move what you capture... but it doesn't seem to work well if you already have 
fieldnames that might match something you are trying to capture.

perhaps something of the form
capture.targetfield=expression
would work better?

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, 
 test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-284) Parsing Rich Document Types

2009-06-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724862#action_12724862
 ] 

Yonik Seeley commented on SOLR-284:
---

I just tried setting ext.idx.attr=false, and I didn't see any change after 
indexing a PDF.
Perhaps we don't even need this option if we map attributes to an ignored_ 
field that is ignored?
In any case, the default seems like it should generate / index attributes.

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
 SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, 
 test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.