Re: ant example, tika

2008-12-12 Thread Chris Hostetter

: The only issue I see now is that DIH has been released as part of the core, so
: I would vote that it stays in there.  It is also quite popular, I think, so
: I'd hate to break people.

...which is why having a kitchen-sink war with all the contribs might make 
sense.  But frankly i don't see it as a very problematic to document how 
to use a DIH jar for people who upgrade ... we have to document how to use 
contribs in general.



-Hoss



Re: [VOTE] LOGO

2008-12-12 Thread Mike Klaas

https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png


Re: DIH / ExtractingRequestHandler over solrj?

2008-12-12 Thread Shalin Shekhar Mangar
On Fri, Dec 12, 2008 at 10:19 PM, Ryan McKinley  wrote:

> I have not looked into either DIH or Solr Cell, so forgive me if this is
> way off base.   I understand the desire to have these baked into the server
> -- it simplifies things dramatically for anyone wanting to use them.
>  However is it possible to write them so they could be easily extracted to
> the client?
>
> (IMHO) ideally these could be written against the solrj API and then work
> either with the EmbeddedSolrServer or a remote one.  For example the core of
> Solr Cell and perhaps DIH would be great to plug directly into droids.
>

Look at SOLR-853 which aims to make DIH available as an API to be used from
SolrJ or plain old Lucene itself. There can be other uses too.


>
> If that seems reasonable, are there things we should consider about the
> existing API before it gets baked into 1.4?


I feel that we should try to get 1.4 out of the door and attempt this in the
next iteration. We have enough features for 1.4 which are needed by most
users (think replication, faceting improvement, cell). Let us not delay 1.4
for things that a smaller section of users would need. Thoughts?

-- 
Regards,
Shalin Shekhar Mangar.


[jira] Updated: (SOLR-906) Buffered / Streaming SolrServer implementaion

2008-12-12 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-906:
---

Attachment: SOLR-906-StreamingHttpSolrServer.patch

removes @Override from interfaces

I guess:
  

does not take check everything!

> Buffered / Streaming SolrServer implementaion
> -
>
> Key: SOLR-906
> URL: https://issues.apache.org/jira/browse/SOLR-906
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Reporter: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-906-StreamingHttpSolrServer.patch, 
> SOLR-906-StreamingHttpSolrServer.patch, StreamingHttpSolrServer.java
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( 
> SolrInputDocument ) is less then optimal.  This makes a new request for each 
> document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to 
> a single open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



DIH / ExtractingRequestHandler over solrj?

2008-12-12 Thread Ryan McKinley
I have not looked into either DIH or Solr Cell, so forgive me if this  
is way off base.   I understand the desire to have these baked into  
the server -- it simplifies things dramatically for anyone wanting to  
use them.  However is it possible to write them so they could be  
easily extracted to the client?


(IMHO) ideally these could be written against the solrj API and then  
work either with the EmbeddedSolrServer or a remote one.  For example  
the core of Solr Cell and perhaps DIH would be great to plug directly  
into droids.


If that seems reasonable, are there things we should consider about  
the existing API before it gets baked into 1.4?


ryan


Re: ant build-site

2008-12-12 Thread Koji Sekiguchi

It works on my PC. Did you check:
http://www.nabble.com/-jira--Created%3A-(FOR-984)-Forrest-doesn%27t-work-properly-with-Sun%27s-JDK-1.6-td9963791.html#a14247180

koji

Grant Ingersoll wrote:

Anyone else seeing this when running ant build-site:

 [exec]
 [exec] -prepare-classpath:
 [exec]
 [exec] check-contentdir:
 [exec]
 [exec] examine-proj:
 [exec]
 [exec] validation-props:
 [exec]
 [exec] validate-xdocs:
 [exec] 8 file(s) have been successfully validated.
 [exec] ...validated xdocs
 [exec]
 [exec] validate-skinconf:
 [exec] 1 file(s) have been successfully validated.
 [exec] ...validated skinconf
 [exec]
 [exec] validate-sitemap:
 [exec]
 [exec] BUILD FAILED
 [exec] /usr/local/forrest/main/targets/validate.xml:158: 
java.lang.NullPointerException

 [exec]
 [exec] Total time: 3 seconds
 [exec] Result: 1

BUILD FAILED
/Volumes/User/grantingersoll/projects/lucene/solr/solr-clean/build.xml:637: 
/Volumes/User/grantingersoll/projects/lucene/solr/solr-clean/src/site/build/site 
not found.









[jira] Commented: (SOLR-284) Parsing Rich Document Types

2008-12-12 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656032#action_12656032
 ] 

Grant Ingersoll commented on SOLR-284:
--

OK, I just committed:

1. Upgraded to Tika 0.2 official release
2. Put in POM support
3. Hooked in various other build things.

> Parsing Rich Document Types
> ---
>
> Key: SOLR-284
> URL: https://issues.apache.org/jira/browse/SOLR-284
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Eric Pugh
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, 
> test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



ant build-site

2008-12-12 Thread Grant Ingersoll

Anyone else seeing this when running ant build-site:

 [exec]
 [exec] -prepare-classpath:
 [exec]
 [exec] check-contentdir:
 [exec]
 [exec] examine-proj:
 [exec]
 [exec] validation-props:
 [exec]
 [exec] validate-xdocs:
 [exec] 8 file(s) have been successfully validated.
 [exec] ...validated xdocs
 [exec]
 [exec] validate-skinconf:
 [exec] 1 file(s) have been successfully validated.
 [exec] ...validated skinconf
 [exec]
 [exec] validate-sitemap:
 [exec]
 [exec] BUILD FAILED
 [exec] /usr/local/forrest/main/targets/validate.xml:158:  
java.lang.NullPointerException

 [exec]
 [exec] Total time: 3 seconds
 [exec] Result: 1

BUILD FAILED
/Volumes/User/grantingersoll/projects/lucene/solr/solr-clean/build.xml: 
637: /Volumes/User/grantingersoll/projects/lucene/solr/solr-clean/src/ 
site/build/site not found.






[jira] Commented: (SOLR-284) Parsing Rich Document Types

2008-12-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656023#action_12656023
 ] 

Rogério Pereira Araújo commented on SOLR-284:
-

Grant, lemme know how can I help.

> Parsing Rich Document Types
> ---
>
> Key: SOLR-284
> URL: https://issues.apache.org/jira/browse/SOLR-284
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Eric Pugh
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, 
> test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-284) Parsing Rich Document Types

2008-12-12 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656018#action_12656018
 ] 

Grant Ingersoll commented on SOLR-284:
--

Forgot a couple of things on this:

1. To hook into the release/javadoc mechanism.
2. In order to facilitate separation of the javadocs and other things, I'm 
going to move the code to o.a.s.handler.extraction package.
3. Need to publish the Maven artifacts.

> Parsing Rich Document Types
> ---
>
> Key: SOLR-284
> URL: https://issues.apache.org/jira/browse/SOLR-284
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Eric Pugh
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, 
> test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: ant example, tika

2008-12-12 Thread Grant Ingersoll
It occurred to me that we could also add a "core-example" target that  
only builds the core example for those impatient types w/ slow  
machines ;-)



On Dec 12, 2008, at 8:03 AM, Grant Ingersoll wrote:



On Dec 11, 2008, at 10:50 PM, Chris Hostetter wrote:



: Ignoring the JSP dilemma... DIH's JAR doesn't need to be in the  
WAR, but can
: ship in a lib/ directory outside the WAR and come in as a  
plugin.  And Solr
: can ship with all of the contribs wired in to a kitchen-sink  
example

: configuration.
:
: There is merit to keeping Solr's WAR and core to the most minimal  
size
: possible and leveraging the plugin capability to let users reduce  
the

: footprint and un-used parts.

+1 ... there really shouldn't be any contrib's in the war.  If we're
worried that asking people to put the DIH jar in the plugin  
directory is
too complicated for new users to understand (and i really can't  
believe
that: if someone can understand ow to write a data-config.xml then  
copying
a jar file should be trivial) we can make a "solr-kitchen-sink.war"  
that
contains *every* contrib and *every* dependency in addition to the  
regular

one.

But even that seems less useful in general then having a more  
robust set
of examples -- where each one gets a lib directory populated with  
just the
plugins it's demonstrating (and possibly a "kitchen-sink" example  
showing

off all of them)

Honestly: I didn't even realize DIH was adding itself to the war  
untill

recently, but then again i've been a little out of touch.




The only issue I see now is that DIH has been released as part of  
the core, so I would vote that it stays in there.  It is also quite  
popular, I think, so I'd hate to break people.





Re: ant example, tika

2008-12-12 Thread Grant Ingersoll


On Dec 11, 2008, at 10:50 PM, Chris Hostetter wrote:



: Ignoring the JSP dilemma... DIH's JAR doesn't need to be in the  
WAR, but can
: ship in a lib/ directory outside the WAR and come in as a plugin.   
And Solr

: can ship with all of the contribs wired in to a kitchen-sink example
: configuration.
:
: There is merit to keeping Solr's WAR and core to the most minimal  
size
: possible and leveraging the plugin capability to let users reduce  
the

: footprint and un-used parts.

+1 ... there really shouldn't be any contrib's in the war.  If we're
worried that asking people to put the DIH jar in the plugin  
directory is
too complicated for new users to understand (and i really can't  
believe
that: if someone can understand ow to write a data-config.xml then  
copying
a jar file should be trivial) we can make a "solr-kitchen-sink.war"  
that
contains *every* contrib and *every* dependency in addition to the  
regular

one.

But even that seems less useful in general then having a more robust  
set
of examples -- where each one gets a lib directory populated with  
just the
plugins it's demonstrating (and possibly a "kitchen-sink" example  
showing

off all of them)

Honestly: I didn't even realize DIH was adding itself to the war  
untill

recently, but then again i've been a little out of touch.




The only issue I see now is that DIH has been released as part of the  
core, so I would vote that it stays in there.  It is also quite  
popular, I think, so I'd hate to break people.


[jira] Commented: (SOLR-236) Field collapsing

2008-12-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655952#action_12655952
 ] 

Iván de Prado commented on SOLR-236:


You can try with collapse.facet=before, but then you'll notice that the list of 
documents returned is all, not only the collapsed ones. 

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.4
>
> Attachments: collapsing-patch-to-1.3.0-ivan.patch, 
> collapsing-patch-to-1.3.0-ivan_2.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-821) replication must allow copying conf file in a different name to slave

2008-12-12 Thread Akshay K. Ukey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay K. Ukey updated SOLR-821:


Attachment: SOLR-821.patch

Patch with minor bug fix.

> replication must allow copying conf file in a different name to slave
> -
>
> Key: SOLR-821
> URL: https://issues.apache.org/jira/browse/SOLR-821
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-821.patch, SOLR-821.patch, SOLR-821.patch
>
>
> It is likely that a file is different in master and slave. for instance 
> replicating solrconfig.xml is not possible with the current config if master 
> and slave has diffferent solrconfig.xml (which is always true)
> We can add an alias feature in the confFiles as
> {code}
>  name="confFiles">slave_solrconfig.xml:solrconfig.xml,slave_schema.xml:schema.xml
> {code}
> This means that the file slave_solrconfig.xml should be copied to the slave 
> as solrconfig.xml and slave_schema.xml must be saved to slave as schema.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-906) Buffered / Streaming SolrServer implementaion

2008-12-12 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655917#action_12655917
 ] 

Shalin Shekhar Mangar commented on SOLR-906:


Ryan, I'm seeing compile errors related to @Override with interface methods 
(that's a Java 6 feature). Also, new IOException( e ) is not defined (also Java 
6, I guess).

> Buffered / Streaming SolrServer implementaion
> -
>
> Key: SOLR-906
> URL: https://issues.apache.org/jira/browse/SOLR-906
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Reporter: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-906-StreamingHttpSolrServer.patch, 
> StreamingHttpSolrServer.java
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( 
> SolrInputDocument ) is less then optimal.  This makes a new request for each 
> document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to 
> a single open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.