Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread Ryan McKinley

data and wrote it out in the current update response format .. so the
current SolrUpdateServlet could be completley replaced with a simple url
mapping...

   /update -- /select?qt=xmlupdatewt=legacyxmlupdate



Using the filter method above, it could (and i think should) be mapped to:
/update


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread Chris Hostetter

talking about the URL structure made me realize that the Servlet should
dicate the URL structure and the param parsing, but it should do it after
giving the RequestParser a crack at any streams it wants (actually i think
that may be a direct quote from JJ ... can't remember now) ... *BUT* the
RequestParser may not want to provide a list of streams, untill the params
have been parsed (if for example, one of the params is the name of a file)

so what if the interface for RequestParser looked like this...

  interface RequestParser {
public init(NamedList nl); // the usual
/** will be passed the raw input stream from the
 * HttpServletRequest, ... may need other HttpServletRequest info as
 * SolrParam (ie: method, content-type/content-length, ...but we use
 * a SolrParam instance instead of the HttpServletRequest to
 * maintain an abstraction.
 */
public IterableContentStream preProcess(SolrParam headers,
  InputStream s);
/** garunteed that the second arg will be the result from
 * a previous call to preProcess, and that that Iterable from
 * preProcess will not have been inspected or touched in anyway, nor
 * will any refrences to it be maintained after this call.
 * this method is responsible for calling
 * request.setContentStreams(IterableContentStreams) as it sees fit
 */
public void process(SolrRequest request, IterableContentStream i);

  }

...the idea being that many RequestParsers will choose to impliment one or
both of those methods as a NOOP that just returns null but if they want
to impliment both, they have the choice of obliterating the Iterable
returned by preProcess and completely replacing it once they see the
SolrParams in the request

: specifically what i had in mind was something like this...
:
:   class SolrUberServlet extends HttpServlet {
: public service(HttpServletRequest req, HttpServletResponse response) {
:   SolrCore core = getCore();
:   Solr(Query)Response solrRsp = new Solr(Query)Response();
:
:   // servlet specific method which does minimal inspection of
:   // req to determine the parser name
:   String p = pickRequestParser(req);
:
:   // looks up a registered instance (from solrconfig.xml)
:   // matching that name
:   RequestParser solrParser = coreGetParserByName(p);
:

// let the parser preprocess the streams if it wants...
IterableContentStreams s = solrParser.preprocess(req.getInputStream())

// build the request using servlet specific URL rules
Solr(Query)Request solrReq = makeSolrRequest(req);

// let the parser decide what to do with the existing streams,
// or provide new ones
solrParser.process(solrReq, s);

:   // does exactly what it does now: picks the RequestHandler to
:   // use based on the params, calls it's handleRequest method
:   core.execute(solrReq, solrRsp)
:
:   // the rest of this is cut/paste from the current SolrServlet.
:   // use SolrParams to pick OutputWriter name, ask core for instance,
:   // have that writer write the results.
:   QueryResponseWriter responseWriter = 
core.getQueryResponseWriter(solrReq);
:   response.setContentType(responseWriter.getContentType(solrReq, 
solrRsp));
:   PrintWriter out = response.getWriter();
:   responseWriter.write(out, solrReq, solrRsp);
:
: }
:   }
:
:
: -Hoss
:



-Hoss



Re: Can this be achieved? (Was: document support for file system crawling)

2007-01-17 Thread Chris Hostetter

:  2) contrib code that runs as it's own process to crawl documents and
:  send them to a Solr server. (mybe it parses them, or maybe it relies on
:  the next item...)
:
: Do you know FAST? It uses a step-by-step approach (pipeline) in which
: all of these tasks are done. Much of it is tuned in a easy web tool.
:
: The point I'm trying to make is that contrib code is nice, but a
: complete package with these possibilities could broaden Solr's appeal
: somewhat.

in my limited experience, commercial applications tend to be all in one
solutions not so much because it really adds value that they are all in
one but because it helps with vendor lock in -- companies tend to want
to give you a single monolithic product, because if they gave you lots of
little products that tried to do just one thing very well, you might
decide that one of their little products is crap, and write your own
replacement for just that piece using a great open-source library you
found .. and then you might realize that this *other* open-source library
would make it really easy for you to replace this other little piece of
their system and would be a lot more efficient ... etc.  the point being
that once they've got you using a monolithic application, it's a lot
harder to stop using the whole thing all at once, then it would be for you
to stop using 1 of N mini-applications they provide.

open source projects on the other hand, tend to work well when they are
composed of lots of little pieces -- because little pieces are easier to
work on when you have a finite number of developers working in their spare
time, because each developer can work on a few peices at a time, and those
peices can be reviewed/used by other people even if the system as a whole
isn't finished.

I ramble about this to try and explain why Solr may not be what you would
consider a complete package at the moment  and why it may never
reach the state you think would make it a complete package ... because
there are a lot of people out there who don't need it to be -- it would be
hard to be a full blown GUI configurable, web crawling, document
detecting, customizable schema based application and still allow for
people to use small pieces of it.

To put it another way: it's a lot easier for people to put reusable
components with clean APIs together in interesting ways, then it is for
people to extract reusable components with clean APIs from a monolithic
application.

: Exactly, this sounds more like it. But if similar inputstreams can be
: handled by Nutch, what's the point in using Solr at all? The http API's?
:   In other words, both Nutch and Solr seem to have functionality that
: enterprises would want. But neither gives you the total solution.

if what you care about is extracting text from arbitrary documents, that's
what Nutch does well -- it doesn't worry about trying to extract
complex structure from the documents, so it can parse/index lots of
document formats into the same index.  Solr's goal is to let *you* define
the index format, but that requires you defining what data goes into which
fields as well, and that makes generic reusable document crawlers parsers
harder to get right in a way that can work for anyone.





-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread Alan Burlison

Chris Hostetter wrote:


i'm totally on board now ... the RequestParser decides where the streams
come from if any (post body, file upload, local file, remote url, etc...);
the RequestHandler decides what it wants to do with those streams, and has
a library of DocumentProcessors it can pick from to help it parse them if
it wants to, then it takes whatever actions it wants, and puts the
response information in the existing Solr(Query)Response class, which the
core hands off to any of the various OutputWriters to format according to
the users wishes.


+1

--
Alan Burlison
--


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread Alan Burlison

Ryan McKinley wrote:


In addition, consider the case where you want to index a SVN
repository.  Yes, this could be done in SolrRequestParser that logs in
and returns the files as a stream iterator.  But this seems like more
'work' then the RequestParser is supposed to do.  Not to mention you
would need to augment the Document with svn specific attributes.


This is indeed one of the things I'd like to do - use Solr as a back-end
for OpenGrok (http://www.opensolaris.org/os/project/opengrok/)

--
Alan Burlison
--



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread J.J. Larrea
At 11:48 PM -0800 1/16/07, Chris Hostetter wrote:
yeah ... once we have a RequestHandler doing that work, and populating a
SolrQueryResponse with it's result info, it
would probably be pretty trivial to make an extremely bare-bones
LegacyUpdateOutputWRiter that only expected that simple mount of response
data and wrote it out in the current update response format .. so the
current SolrUpdateServlet could be completley replaced with a simple url
mapping...

   /update -- /select?qt=xmlupdatewt=legacyxmlupdate

Yah!  But in my vision it would be

/update - qt=update

because pathInfo is update.  There's no need to remap anything in the URL, 
the existing SolrServlet is ready for dispatch once it:
  - Prepares request params into SolrParams
  - Sets params(qt) to pathInfo
  - Somehow (perhaps with StreamIterator) prepares streams for RequestParser use

I'm still trying to conceptually maintain a separation of concerns between 
handling the details of HTTP (servlet-layer) and handling different payload 
encodings (a different layer, one I believe can be invoked after config is 
read).

The following is vision more than proposal or suggestion...

requestHandler name=update class=lets.write.this.UpdateRequestHandler
lst name=invariants
str name=wtlegacyxml/str
/lst
lst name=defaults
!-- rp matches queryRequestParser --
str name=rpxml/str
/lst
/request

!-- only if standard responseWriter is not up to the task --
queryResponseWriter name=legacyxml
class=do.we.really.need.LegacyUpdateOutputWRiter/

queryRequestParser name=xml class=solr.XMLStreamRequestParser/

queryRequestParser name=json class=solr.JSONStreamRequestParser/

So when incoming URL comes in:

/update?rp=json

the pipeline which is established is:

SolrServlet -
solr.JSONStreamRequestParser
|
|- request data carrier e.g. SolrQueryRequest
|
lets.write.this.UpdateRequestHandler
|
|- response data carrier e.g. SolrQueryResponse
|
do.we.really.need.LegacyUpdateOutputWRiter

I expect this is all fairly straightforward, except for one sticky question:

Is there a universal format which can efficiently (e.g. lazily, for stream 
input) convey all kinds of different request body encodings, such that the 
RequestHandler has no idea how it was dispatched?

Something to think about...

- J.J.


Re: Solr graduates and joins Lucene as sub-project

2007-01-17 Thread Thorsten Scherler
On Wed, 2007-01-17 at 10:07 -0500, Yonik Seeley wrote:
 Solr has just graduated from the Incubator, and has been accepted as a
 Lucene sub-project!
 Thanks to all the Lucene and Solr users, contributors, and developers
 who helped make this happen!
 

Yeah congrats to the whole community and especially to the incubator
mentors and first minute solr project members.

Thanks for this awesome project.

 I have a feeling we're just getting started :-)

+1

salu2

 -Yonik



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread Ryan McKinley

I'm not sure i underestand preProcess( ) and what it gets us.

I like the model that

1. The URL path selectes the RequestHandler
2. RequestParser = RequestHandler.getRequestParser()  (typically from
its default params)
3. SolrRequest = RequestParser.parse( HttpServletRequest )
4. handler.handleRequest( req, res );
5. write the response

If anyone needs to customize this chain of events, they could easily
write their own Servlet/Filter


On 1/17/07, Chris Hostetter [EMAIL PROTECTED] wrote:


Acctually, i have to amend that ... it occured to me in my slep last night
that calling HttpServletRequest.getInputStream() wasn't safe unless we
*now* the Requestparser wasnts it, and will close it if it's non-null, so
the API for preProcess would need to look more like this...

 interface PointerT {
   T get();
 }
 interface RequestParser {
   ...
   /** the will be passed a Pointer to the raw input stream from the
* HttpServletRequest, ... if this method accesses the IputStream
* from the pointer, it is required to close it if it is non-null.
*/
   public IterableContentStream preProcess(SolrParam headers,
 PointerInputStream s);
   ...
 }



-Hoss




Re: Java version for solr development (was Re: Update Plugins)

2007-01-17 Thread Bill Au

I also think it is too early to move to 1.6.  Only Sun has released their
1.6 JVM.

Bill


On 1/17/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote:


On 1/17/07, Thorsten Scherler [EMAIL PROTECTED] wrote:

 ...Should I use 1.6 for a patch or above mentioned libs?...

IMHO moving to 1.6 is way too soon, and if it's only to save two jars
it's not worth it.

-Bertrand



[jira] Updated: (SOLR-104) Update Plugins

2007-01-17 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-104:
---

Attachment: DispatchFilter.patch

 Update Plugins
 --

 Key: SOLR-104
 URL: https://issues.apache.org/jira/browse/SOLR-104
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.2
Reporter: Ryan McKinley
 Fix For: 1.2

 Attachments: DispatchFilter.patch, HandlerRefactoring-DRAFT-SRC.zip, 
 HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring.DRAFT.patch, 
 HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.zip


 The plugin framework should work for 'update' actions in addition to 'search' 
 actions.
 For more discussion on this, see:
 http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: graduation todo list

2007-01-17 Thread Yonik Seeley

On 1/17/07, Chris Hostetter [EMAIL PROTECTED] wrote:


: OK, here's the TODO list I can think of.

i added this as a new section on the TaskList (like we did for the first
release) so it can evolve as people think of other things that need done
(or do things on the list)


Hopefully it won't take as long as the last release :-)

-Yonik


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread Ryan McKinley

On 1/17/07, Chris Hostetter [EMAIL PROTECTED] wrote:


: I'm not sure i underestand preProcess( ) and what it gets us.

it gets us the abiliity for a RequestParser to be able to pull out the raw
InputStream from the HTTP POST body, and make it available to the
RequestHandler as a ContentStream and/or it can wait untill the servlet
has parsed the URL to get the params and *then* it can generate
ContentStreams based on those param values.

 - preProcess is neccessary to write a RequestParser that can handle the
   current POST raw XML model,
 - process is neccessary to write RequestParsers that can get file names
   or URLs out of escaped query params and fetch them as streams



I think the confusion is that (in my view) the RequestParser is the
*only* object able to touch the stream.  I don't think anything should
happen between preProcess() and process();  A RequestParser converts a
HttpServletRequest to a SolrRequest.  Nothing else will touch the
servlet request.



: 1. The URL path selectes the RequestHandler
: 2. RequestParser = RequestHandler.getRequestParser()  (typically from
: its default params)
: 3. SolrRequest = RequestParser.parse( HttpServletRequest )
: 4. handler.handleRequest( req, res );
: 5. write the response

the problem i see with that, is that the RequestHandler shouldn't have any
say in what RequestParser is used -- ...



got it.  Then i vote we use a syntax like:

/path/registered/in/solr/config:requestparser?params

If no ':' is in the URL, use 'standard' parser

1. The URL path determins the RequestHandler
2. The URL path determins the RequestParser
3. SolrRequest = RequestParser.parse( HttpServletRequest )
4. handler.handleRequest( req, res );
5. write the response



: If anyone needs to customize this chain of events, they could easily
: write their own Servlet/Filter

this is why i was confused about your Filter comment earlier: if the only
way a user can customize behavior is by writting a Servlet, they can't
specify that servlet in a solr config file -- they'd have to unpack the
war and manually eidt the web.xml ... which makes upgrading a pain.



I don't *think* this would happen often, and the people would only do
it if they are unhappy with the default URL structure - behavior
mapping.  I am not suggesting this would be the normal way to
configure solr.

The main case where I imagine someone would need to write their own
servlet/filter is if they insist the parameters need to be in the URL.
For example:

 /delete/id/

The URL structure I am proposing could not support this (unless you
had a handler mapped to each id :)

ryan


RE: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-17 Thread Cook, Jeryl
Sorry for the flame , but I've used spring on 2 large projects and it
worked out great.. you should check out some of the GUIs to help manage
the XML configuration files, if that is reason your team thought it was
a nightmare because of the configuration(we broke ours up to help).. 

Jeryl Cook

-Original Message-
From: Alan Burlison [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 16, 2007 10:52 AM
To: solr-dev@lucene.apache.org
Subject: Re: Update Plugins (was Re: Handling disparate data sources in
Solr)

Bertrand Delacretaz wrote:

 With all this talk about plugins, registries etc., /me can't help
 thinking that this would be a good time to introduce the Spring IoC
 container to manage this stuff.
 
 More info at http://www.springframework.org/docs/reference/beans.html
 for people who are not familiar with it. It's very easy to use for
 simple cases like the ones we're talking about.

Please, no.  I work on a big webapp that uses spring - it's a complete 
nightmare to figure out what's going on.

-- 
Alan Burlison
--


[jira] Updated: (SOLR-104) Update Plugins

2007-01-17 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-104:
---

Attachment: DispatchFilter.patch

 Update Plugins
 --

 Key: SOLR-104
 URL: https://issues.apache.org/jira/browse/SOLR-104
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.2
Reporter: Ryan McKinley
 Fix For: 1.2

 Attachments: DispatchFilter.patch, DispatchFilter.patch, 
 HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, 
 HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, 
 HandlerRefactoring.DRAFT.zip


 The plugin framework should work for 'update' actions in addition to 'search' 
 actions.
 For more discussion on this, see:
 http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-104) Update Plugins

2007-01-17 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465563
 ] 

Ryan McKinley commented on SOLR-104:


removed getRequestParser() from Handler interface.

using ':' in the URL to specify the request parser.

 http://localhost:8983/solr/standard:requestparser?q=video

NOTE:  it still uses a defalt request parser.

 Update Plugins
 --

 Key: SOLR-104
 URL: https://issues.apache.org/jira/browse/SOLR-104
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.2
Reporter: Ryan McKinley
 Fix For: 1.2

 Attachments: DispatchFilter.patch, DispatchFilter.patch, 
 HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, 
 HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, 
 HandlerRefactoring.DRAFT.zip


 The plugin framework should work for 'update' actions in addition to 'search' 
 actions.
 For more discussion on this, see:
 http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Can this be achieved? (Was: document support for file system crawling)

2007-01-17 Thread Zaheed Haque

On 1/17/07, Eivind Hasle Amundsen [EMAIL PROTECTED] wrote:

 (...) the point being
 that once they've got you using a monolithic application, it's a lot
 harder to stop using the whole thing all at once, then it would be for you
 to stop using 1 of N mini-applications they provide.

Well, FAST is composed of many small, modular products that can be
replaced by other (open source) parts. It is not monolithic. The first
time you install it, it might appear to be just one giant beast. However
it is not.

 I ramble about this to try and explain why Solr may not be what you would
 consider a complete package at the moment  and why it may never
 reach the state you think would make it a complete package ... because
 there are a lot of people out there who don't need it to be -- it would be
 hard to be a full blown GUI configurable, web crawling, document
 detecting, customizable schema based application and still allow for
 people to use small pieces of it.

I am not arguing on this. I think my point didn't get through, then.

Compare this to Linux distributions. People still use them, right? What
about an enterprise search distro? That is exactly what some
commercial vendors offer, only far less elegant than anything containing
Lucene et al would probably be.

 To put it another way: it's a lot easier for people to put reusable
 components with clean APIs together in interesting ways, then it is for
 people to extract reusable components with clean APIs from a monolithic
 application.

Yes, I agree completely, and the strength is exactly what you say - they
focus on doing a small thing very well. I believe this fact would make
such a search distribution even more appealing.


I am not sure I follow. Enterprise search distro?. Anyway any
enterprise interested
in having a serious search solution (i.e. buy FAST, Autonomy or do
open source lucene)
will want a custom solution i.e. pick and choose the module/feature
they need/want and then
let an integrator/consultancy-firm/IT department to do the actual
implementation.  So
a search distribution as pointed out is somewhat meaningless if customization is
important.

Now there are organization that will want to have a black-box solution
i.e. Google-mini or Searchblox or the new IBM/Yahoo/Lucene search
solution (sorry I can't remember the name). These are pre-packaged
solution and low cost alternative, in some case free and offer no
customization and I am 100% sure those organization don not even want
customization.

So having the possibility to pick and choose and make a custom
solution from Lucene, Solr, Nutch, Hadoop is super perfect..You can do
more cool things then if all of these are bundles.

Just some thoughts.
Cheers
Zaheed


Re: [Solr Wiki] Update of TaskList by YonikSeeley

2007-01-17 Thread Doug Cutting

Apache Wiki wrote:

 * have everyone update their subversion working directories (remember to 
update SVN paths in IDEs too, etc)


Note that 'svn switch' makes this easy.

Doug


Bucketing result set (Dev list posting)...

2007-01-17 Thread escher2k

I have a requirement wherein the documents that are retrieved based on the
similarity computation
are bucketed and resorted based on user score. 
An example -

Let us say a search returns the following data set -

Doc ID   Lucene score User score
10001000  125
1000  900  225
1000  800  25
1000  700  525
100050  25
100040  125

Assuming two bucket are created, the expected result is - 
Doc ID   Lucene score User score
1000  900  225
10001000  125
1000  800  25
---
1000  700  525
100040  125
100050  25

I am assuming that the only way to do this is to change some of the Solr
internals.  Any pointers would
be most helpful on the best way to go about it.

Thanks.

-- 
View this message in context: 
http://www.nabble.com/Bucketing-result-set-%28Dev-list-posting%29...-tf3031130.html#a8421969
Sent from the Solr - Dev mailing list archive at Nabble.com.



subversion move

2007-01-17 Thread Yonik Seeley

Solr's source in subversion has moved within the ASF repository to
to https://svn.apache.org/repos/asf/lucene/solr/
(Thanks Doug!)

The easiest way to change your working directories is to use svn switch.
For example, if you have the trunk of solr checked out, cd to that
directory and execute
svn switch https://svn.apache.org/repos/asf/lucene/solr/trunk

Don't forget to change any SVN paths that may be configured in your IDEs too.

-Yonik


Re: Can this be achieved? (Was: document support for file system crawling)

2007-01-17 Thread Eivind Hasle Amundsen

(...) any enterprise interested
in having a serious search solution (i.e. buy FAST, Autonomy or do
open source lucene) will want a custom solution (...) then
let an integrator/consultancy-firm/IT department to do the actual
implementation.  So
a search distribution as pointed out is somewhat meaningless if 
customization is important.


I'm talking about creating something that works much more easily out of 
the box, and that can be customized as much as now - at the same time.


Of course serious search solutions would be completely customized, 
always. And there are out of the box solutions (Google Appliance 
etc.). But is there no market for a middle ground here?



Now there are organization that will want to have a black-box solution
i.e. Google-mini or Searchblox or the new IBM/Yahoo/Lucene search
solution (sorry I can't remember the name). These are pre-packaged
solution and low cost alternative, in some case free and offer no
customization and I am 100% sure those organization don not even want
customization.


Which ones are free? Are there any FLOSS alternatives to these black box 
solutions? (IANAL, but the Apache license is more like LGPL than GPL, 
right?)



So having the possibility to pick and choose and make a custom
solution from Lucene, Solr, Nutch, Hadoop is super perfect..You can do
more cool things then if all of these are bundles.


What I am really talking about, is this: There is a growing market for 
simple search solutions that can work out of the box, and that can still 
be customized. Something that:

- organizations can use on their network, out of the box
- on their intraweb, out of the box, just give credentials
- can handle user access out of the box (LDAP/NIS/AD)
- is FLOSS(!)
- can be fully customized, if desired
- modularized for even more customization if needed

Sure, one can argue like you have done so far by saying that they could 
just compose their own solution completely... But then we are falling 
outside the market again - which I hypotesize exist.


I am not looking to change Solr in that direction. But take a look at 
Solr. Or Nutch. They are already built on Lucene and many other 
projects. Why/not build something on top of this? Something more/else?


Thanks for all the feedback :) Please keep it coming.


Re: [Solr Wiki] Update of TaskList by YonikSeeley

2007-01-17 Thread Doug Cutting

Apache Wiki wrote:

   * move website
 * checkout in new location (from the new svn location too)


Note that you can update the .htaccess file in 
/www/incubator.apache.org/solr to redirect the old site to the new site.


http://svn.apache.org/repos/asf/incubator/public/trunk/site-publish/.htaccess

Doug


Re: Can this be achieved? (Was: document support for file system crawling)

2007-01-17 Thread Mike Klaas

On 1/17/07, Eivind Hasle Amundsen [EMAIL PROTECTED] wrote:


What I am really talking about, is this: There is a growing market for
simple search solutions that can work out of the box, and that can still
be customized. Something that:
- organizations can use on their network, out of the box
- on their intraweb, out of the box, just give credentials
- can handle user access out of the box (LDAP/NIS/AD)
- is FLOSS(!)
- can be fully customized, if desired
- modularized for even more customization if needed




I am not looking to change Solr in that direction. But take a look at
Solr. Or Nutch. They are already built on Lucene and many other
projects. Why/not build something on top of this? Something more/else?


I don't think that anyone is arguing that this product shouldn't exist
in the open-source world, just that it shouldn't be part of Solr's
mandate.  It sounds like a cool project (though the closer you get to
commercial product the more important support, packaging, marketing,
etc. become--some of which are very difficult to achieve in a purely
open-source setting).

-Mike