Re: Solr Perl Interface
Very nice, Tim. But that FAST->Solr document collection transfer would also be cool to see and have/use for those lucky FAST customers who want to quickly give Solr a try with their existing data. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Timothy Garafola <[EMAIL PROTECTED]> > To: solr-dev@lucene.apache.org > Sent: Friday, February 15, 2008 4:47:19 PM > Subject: Solr Perl Interface > > I recently released a simple perl wrapper module to CPAN which supplies > methods in perl for posting adds, deletes, commits, and optimizes to a > solr server. Originally I had written some simple processes to handle > this when porting a large collection of documents from FAST into SOLR. > Mark Backman and Yousef Ourabi suggested that I take this a step further > and submit something to the open source community. So now it's > available for download at via > http://search.cpan.org/author/GARAFOLA/Solr-0.03/lib/Solr.pm. > > > > As time permits, I'm adding to this. I also welcome contributions and > hope to release the first updated version in a week or so. Yousef has > stated interest in extending it to supply querying functionality. > > > > One of the things I have a need for is the ability to update by query; > something similar in functionality to SOLRs delete_by_query. Is there > anything like this already in SOLR?I'm playing with doing this in > perl by first issuing a query, parsing the returned xml into a perl > data structure, updating element values and/or extending it with dynamic > fields in the data structure, then reposting the docs returned in the > initial query. Can anyone tell me if I'm reinventing a wheel here or > suggest an alternative approach? > > > > Thanks, > > Tim Garafola > > > > > > > > > > > > > > > >
[jira] Updated: (SOLR-303) Distributed Search over HTTP
[ https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-303: -- Attachment: distributed.patch fixed test cases that relied on parsing previous explain format > Distributed Search over HTTP > > > Key: SOLR-303 > URL: https://issues.apache.org/jira/browse/SOLR-303 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Sharad Agarwal >Assignee: Yonik Seeley > Attachments: distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.stu.patch, fedsearch.stu.patch > > > Searching over multiple shards and aggregating results. > Motivated by http://wiki.apache.org/solr/DistributedSearch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-303) Distributed Search over HTTP
[ https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-303: -- Attachment: distributed.patch New patch attached... last one had an unfinished change that prevented compilation (using the generic SolrResponse instead of SolrQueryResponse). > Distributed Search over HTTP > > > Key: SOLR-303 > URL: https://issues.apache.org/jira/browse/SOLR-303 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Sharad Agarwal >Assignee: Yonik Seeley > Attachments: distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, > distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.stu.patch, fedsearch.stu.patch > > > Searching over multiple shards and aggregating results. > Motivated by http://wiki.apache.org/solr/DistributedSearch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-303) Distributed Search over HTTP
[ https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569712#action_12569712 ] Yonik Seeley commented on SOLR-303: --- > I really need the ShardDoc's classes to be split up into public classes ShardDoc is public already... can you elaborate? > It would also be fantastic to open up QueryComponent, my component only needs > to override a few functions What is yours trying to accomplish? > A solution would be to maintain map of unique fields as adding the ShardDocs > to the priority queue, and continue on duplicates. Agree. It should fall into the category of robustness though, rather than a duplicates detection feature (since it will mean that facets will be off, and it will be possible to get fewer docs than requested if duplicates do exist). We also need to be robust in the face of a commit on a shard happening between phases of a request (a doc that we request info for may no longer exist, etc). That would probably cause us to blow up currently. Hopefully this can be committed after some basic tests are added, and that will make it much easier for others to contribute patches. In the future maybe we should try a branch for changes this large. > Distributed Search over HTTP > > > Key: SOLR-303 > URL: https://issues.apache.org/jira/browse/SOLR-303 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Sharad Agarwal >Assignee: Yonik Seeley > Attachments: distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed_pjaol.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, > fedsearch.stu.patch > > > Searching over multiple shards and aggregating results. > Motivated by http://wiki.apache.org/solr/DistributedSearch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-303) Distributed Search over HTTP
[ https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-303: -- Attachment: distributed.patch updated patch: - refactored some distributed search code to make things easier (added modifyRequest, etc) - added merging of debugging info timing info (including timing info, via generic recursive merging) - merge explain info, drops internal id from explain key for easier merging - Many small changes: don't return scores if they aren't requested (even if needed for shard requests to merge), return maxScore if scores are requested, enable escaping for shards parameter. > Distributed Search over HTTP > > > Key: SOLR-303 > URL: https://issues.apache.org/jira/browse/SOLR-303 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Sharad Agarwal >Assignee: Yonik Seeley > Attachments: distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed.patch, distributed.patch, > distributed.patch, distributed.patch, distributed_pjaol.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, > fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, > fedsearch.stu.patch > > > Searching over multiple shards and aggregating results. > Motivated by http://wiki.apache.org/solr/DistributedSearch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr Perl Interface
On Feb 16, 2008, at 5:47 AM, Timothy Garafola wrote: I recently released a simple perl wrapper module to CPAN which supplies methods in perl for posting adds, deletes, commits, and optimizes to a solr server. Nice! One of the things I have a need for is the ability to update by query; something similar in functionality to SOLRs delete_by_query. Is there anything like this already in SOLR? No, not really, except the modifiable document patch (which is probably out of sync with trunk at this point). SOLR-139 (? not online at the moment to look it up) I'm playing with doing this in perl by first issuing a query, parsing the returned xml into a perl data structure, updating element values and/or extending it with dynamic fields in the data structure, then reposting the docs returned in the initial query. Can anyone tell me if I'm reinventing a wheel here or suggest an alternative approach? The issue with modifying documents in Solr becomes tricky/impossible if you allow unstored fields. For your approach to work, all fields need to be stored, and in large index scenarios it could be prohibitive to store all field data, and thus the client might need to be involved in re-adding field data that is irretrievable from Solr.Just food for thought. In many cases, though, storing all field data is fine and your approach, or the modifiable document patch, is reasonable. Erik