Re: Solr Perl Interface

2008-02-17 Thread Otis Gospodnetic
Very nice, Tim.  But that FAST->Solr document collection transfer would also be 
cool to see and have/use for those lucky FAST customers who want to quickly 
give Solr a try with their existing data.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Timothy Garafola <[EMAIL PROTECTED]>
> To: solr-dev@lucene.apache.org
> Sent: Friday, February 15, 2008 4:47:19 PM
> Subject: Solr Perl Interface
> 
> I recently released a simple perl wrapper module to CPAN which supplies
> methods in perl for posting adds, deletes, commits, and optimizes to a
> solr server.  Originally I had written some simple processes to handle
> this when porting a large collection of documents from FAST into SOLR.
> Mark Backman and Yousef Ourabi suggested that I take this a step further
> and submit something to the open source community.  So now it's
> available for download at via
> http://search.cpan.org/author/GARAFOLA/Solr-0.03/lib/Solr.pm.
> 
>  
> 
> As time permits, I'm adding to this.  I also welcome contributions and
> hope to release the first updated version in a week or so.  Yousef has
> stated interest in extending it to supply querying functionality.  
> 
>  
> 
> One of the things I have a need for is the ability to update by query;
> something similar in functionality to SOLRs delete_by_query.  Is there
> anything like this already in SOLR?I'm playing with doing this in
> perl by first issuing a query, parsing the returned xml into  a perl
> data structure, updating element values and/or extending it with dynamic
> fields in the data structure, then reposting the docs returned in the
> initial query.  Can anyone tell me if I'm reinventing a wheel here or
> suggest an alternative approach?
> 
>  
> 
> Thanks,
> 
> Tim Garafola
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 




[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-02-17 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

fixed test cases that relied on parsing previous explain format

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-02-17 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

New patch attached... last one had an unfinished change that prevented 
compilation (using the generic SolrResponse instead of SolrQueryResponse).

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, 
> distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Distributed Search over HTTP

2008-02-17 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569712#action_12569712
 ] 

Yonik Seeley commented on SOLR-303:
---

> I really need the ShardDoc's classes to be split up into public classes

ShardDoc is public already... can you elaborate?

> It would also be fantastic to open up QueryComponent, my component only needs 
> to override a few functions

What is yours trying to accomplish?

> A solution would be to maintain map of unique fields as adding the ShardDocs 
> to the priority queue, and continue on duplicates.

Agree.  It should fall into the category of robustness though, rather than a 
duplicates detection feature (since it will mean that facets will be off, and 
it will be possible to get fewer docs than requested if duplicates do exist).

We also need to be robust in the face of a commit on a shard happening between 
phases of a request (a doc that we request info for may no longer exist, etc).  
That would probably cause us to blow up currently.

Hopefully this can be committed after some basic tests are added, and that will 
make it much easier for others to contribute patches.  In the future maybe we 
should try a branch for changes this large.


> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-02-17 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

updated patch:
- refactored some distributed search code to make things easier (added 
modifyRequest, etc)
- added merging of debugging info timing info (including timing info, via 
generic recursive merging)
- merge explain info, drops internal id from explain key for easier merging
- Many small changes: don't return scores if they aren't requested (even if 
needed for shard requests to merge), return maxScore
  if scores are requested, enable escaping for shards parameter.

> Distributed Search over HTTP
> 
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Assignee: Yonik Seeley
> Attachments: distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
> distributed.patch, distributed.patch, distributed_pjaol.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
> fedsearch.stu.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr Perl Interface

2008-02-17 Thread Erik Hatcher


On Feb 16, 2008, at 5:47 AM, Timothy Garafola wrote:
I recently released a simple perl wrapper module to CPAN which  
supplies

methods in perl for posting adds, deletes, commits, and optimizes to a
solr server.


Nice!


One of the things I have a need for is the ability to update by query;
something similar in functionality to SOLRs delete_by_query.  Is there
anything like this already in SOLR?


No, not really, except the modifiable document patch (which is  
probably out of sync with trunk at this point).   SOLR-139 (?  not  
online at the moment to look it up)



I'm playing with doing this in
perl by first issuing a query, parsing the returned xml into  a perl
data structure, updating element values and/or extending it with  
dynamic

fields in the data structure, then reposting the docs returned in the
initial query.  Can anyone tell me if I'm reinventing a wheel here or
suggest an alternative approach?


The issue with modifying documents in Solr becomes tricky/impossible  
if you allow unstored fields.   For your approach to work, all fields  
need to be stored, and in large index scenarios it could be  
prohibitive to store all field data, and thus the client might need  
to be involved in re-adding field data that is irretrievable from  
Solr.Just food for thought.   In many cases, though, storing all  
field data is fine and your approach, or the modifiable document  
patch, is reasonable.


Erik