GData

2006-04-20 Thread Doug Cutting
How hard would it be to build a GData server using Solr?  An 
open-source, Lucene-based GData server would be a good thing to have. 
Does this fit in Solr, or should it be a separate project?


http://code.google.com/apis/gdata/overview.html

Another summer of code project?

Doug


Re: GData

2006-04-20 Thread Erik Hatcher
It would require some work to add this to Solr, but not a huge  
effort.  One of the most crucial missing pieces that I'm beginning to  
feel a strong need for is being able to update a single field in a  
Lucene index.  I notice the GData protocol supports this:


http://code.google.com/apis/gdata/protocol.html#Updating-an-entry

So to turn it around to ask you a question, what would it take to  
allow a Lucene document to be updatable at the field granularity,  
such that no other fields need to be specified again?


The idea of using HTTP 1.1 PUT/DELETE methods has been discussed for  
Solr before, and I think it'd be a great idea to support Atom and  
GData, and perhaps even legacy RSS.  Currently Solr's request and  
response handling are pretty intertwined with the rest of the system  
and some decoupling needs to take place to facilitate plug-ability in  
the external interfaces.  Nothing too awfully difficult I don't  
think, but not something that is currently possible out of the box.


Erik



On Apr 20, 2006, at 11:59 AM, Doug Cutting wrote:

How hard would it be to build a GData server using Solr?  An open- 
source, Lucene-based GData server would be a good thing to have.  
Does this fit in Solr, or should it be a separate project?


http://code.google.com/apis/gdata/overview.html

Another summer of code project?

Doug




Re: GData

2006-04-20 Thread Yonik Seeley
On 4/20/06, Erik Hatcher [EMAIL PROTECTED] wrote:
 So to turn it around to ask you a question, what would it take to
 allow a Lucene document to be updatable at the field granularity,
 such that no other fields need to be specified again?

That sounds like quite a job in Lucene... one thing for a stored
field, but quite another for indexed fields.   Even if you could
update things like TermDocs, you don't know what terms are currently
pointing to your document.  I personally don't see an easy (or
remotely practical) way.

The easiest way I can think of to get that effect is to store all the
fields so you can re-create the Document and change the field being
updated.

-Yonik


Re: GData

2006-04-20 Thread Chris Hostetter

: The easiest way I can think of to get that effect is to store all the
: fields so you can re-create the Document and change the field being
: updated.

My brief reading of hte GData URL Doug sent suggestes that the overall
theme is content storage -- if that's the goal, mandating that modify
operations require all fields be stored wouldn't sacrifice functionality.

As far as the output format -- it seems to me that just like A9's
OpenSearch this cold probably be done entirely with an XSLT.

The input format is the trickier part .. that's where we'd definitely need
a more pluggable parser for dealing with incoming data. ... but we're
going to need that anyway if we want to support posting CSV files and
things like that.


-Hoss



[jira] Commented: (SOLR-3) create test harness and port TestApp to junit

2006-04-20 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/SOLR-3?page=comments#action_12375416 
] 

Yonik Seeley commented on SOLR-3:
-

Yes, I suppose SolrTest can be removed nostalgic sniff...

 create test harness and port TestApp to junit
 -

  Key: SOLR-3
  URL: http://issues.apache.org/jira/browse/SOLR-3
  Project: Solr
 Type: Task

 Reporter: Hoss Man
 Assignee: Hoss Man
 Priority: Minor
  Attachments: SOLR-3.zip, TestBasicFunctionality.java, TestHarness.java

 To both encourage good internal development, and to make it easy for plugin 
 developers to write unit tests of their own code I think we need a harness 
 that makes it easy to unit test updates and queries against Solr (without 
 needing a servlet container)
 Once we have this, i think we can/should also retire TestApp in favor of 
 some JUnit tests (which would probably make more sense for other developers)
 Iv'e already started on this, i thought i'd have something to commit tonight, 
 but i got distracted ... filing this bug as a tracker for now.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: summer of code: solr for apache mail archives

2006-04-20 Thread Yonik Seeley
On 4/19/06, Doug Cutting [EMAIL PROTECTED] wrote:
 http://wiki.apache.org/general/SummerOfCode2006

 I'd be happy to co-mentor this with someone from Solr.

OK, I added solr-mail-archive to the Wiki (pretty much copied your description)

-Yonik


Re: summer of code: solr for apache mail archives

2006-04-20 Thread Ian Holsman
If you could get it so that it interfaces with mod-mbox (what they
currently use)
that would be a better solution for the ASF infrastructure I think.

On 4/21/06, Yonik Seeley [EMAIL PROTECTED] wrote:
 On 4/19/06, Doug Cutting [EMAIL PROTECTED] wrote:
  http://wiki.apache.org/general/SummerOfCode2006
 
  I'd be happy to co-mentor this with someone from Solr.

 OK, I added solr-mail-archive to the Wiki (pretty much copied your 
 description)

 -Yonik



--
[EMAIL PROTECTED] -- blog: http://feh.holsman.net/ -- PH: ++61-3-9818-0132

If everything seems under control, you're not going fast enough. -
Mario Andretti


Re: GData

2006-04-20 Thread Doug Cutting

Yoav Shapira wrote:

Getting back to Doug's original point about this as a possible SoC
project: it seems a little too big from the technical discussion so
far.


It might actually be a simpler project if it were standalone: not built 
into Solr, but rather a Lucene contrib project.  One only has to write a 
few servlets that translate each requests into Lucene events: add, 
delete, delete+add, or query.  It wouldn't have lots of Solr's fancy 
features (faceted searching, replication, etc.) but could still be a 
very useful thing.  Do folks think that would be a tractable SoC project?


Doug


Re: GData

2006-04-20 Thread Yoav Shapira
Hola,

 It might actually be a simpler project if it were standalone: not built
 into Solr, but rather a Lucene contrib project.  One only has to write a
 few servlets that translate each requests into Lucene events: add,
 delete, delete+add, or query.  It wouldn't have lots of Solr's fancy
 features (faceted searching, replication, etc.) but could still be a
 very useful thing.  Do folks think that would be a tractable SoC project?

 Doug

Yeah, and a cool one at that, +1.

Yoav


Re: summer of code: solr for apache mail archives

2006-04-20 Thread Doug Cutting

Ian Holsman wrote:

If you could get it so that it interfaces with mod-mbox (what they
currently use)
that would be a better solution for the ASF infrastructure I think.


I assume that search results would be displayed with mod-mbox, i.e., 
links in the hit list would be links to mail-archive.a.o.  Is that what 
you mean?


Also, in my original message I said: We can setup a notification 
mechanism for new messages with Apache infrastructure.  I now note that 
mod_mbox provides Atom feeds for each list.  So we can just poll those 
to index new messages.  We could generate the current list of feeds by 
scraping http://mail-archives.apache.org/mod_mbox/.


Doug


Re: GData

2006-04-20 Thread Yonik Seeley
On 4/20/06, Doug Cutting [EMAIL PROTECTED] wrote:
 It might actually be a simpler project if it were standalone: not built
 into Solr, but rather a Lucene contrib project. One only has to write a
 few servlets that translate each requests into Lucene events: add,
 delete, delete+add, or query.

At first blush, that's the approach I would take with Solr too (a
gdata specific Servlet that interfaced to Solr).  So I don't see a big
difference in difficulty level.

It shouldn't be hard to take a straight lucene-servlet version and
adapt it to Solr later, so It would be a benefit regardless.

-Yonik


Re: GData

2006-04-20 Thread Yoav Shapira
Good.  I'll be available on a time-permitting basis as always, but I
don't want to commit as a mentor for this, so having you two makes me
feel at ease ;)

Yoav

On 4/20/06, Yonik Seeley [EMAIL PROTECTED] wrote:
 On 4/20/06, Doug Cutting [EMAIL PROTECTED] wrote:
  Would you (or someone else) be willing to co-mentor this one with me?
  I'm travelling the month of July, so I'm hesitant to be the sole mentor.
(I'll be online, but at reduced capacity.)
 
  If I have a co-mentor, then I'd be happy to write up the proposal.

 OK, I'm up for it...
 I don't really know my summer schedule, but I doubt I would be out
 more than a week at a time.

 -Yonik



--
Yoav Shapira
Nimalex LLC
1 Mifflin Place, Suite 310
Cambridge, MA, USA
[EMAIL PROTECTED] / www.yoavshapira.com