Make ant example faster

2009-04-13 Thread Shalin Shekhar Mangar
Hello,

As part of SOLR-934, I'd like to setup an example for indexing mail boxes
with the existing example/example-DIH demo. I see that ant example has a
dependency on example-contrib. Do we want to do that? I vaguely remember
Yonik complaining about the time ant example takes.

For setting up the MailEntityProcessor, I'd have to copy mail, activation
and tika jars to example-DIH/solr/mail/lib, which will make it extra slow.
How about we remove the dependency to example-contrib and keep it as an
independent target?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Make ant example faster

2009-04-13 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 12:33 AM, Grant Ingersoll wrote:

>
> Instead of a kitchen-sink example directory, we "revert" it back to being
> the tutorial example.  It still can get built by ant example, but ultimately
> we "deprecate" it (more later).
>
> Then, as a replacement, we create a directory containing what I would call
> Solr Templates, which contain subdirectories named appropriately for the
> kind of example.  Rather than explain, I'll give an example:
>
> The templates directory would contain the configurations (i.e. schema.xml
> and solrconfig.xml) and any sample docs (but not the libraries) for:
>tutorial - The current tutorial example
>dih - The DIH example
>extraction - Solr Cell example
>geo - geo spatial example (once 773 is committed)
>clustering - once SOLR-769 is committed
>simple - A barebones schema and config (mainly used for
> bootstrapping a new project for experienced users)
>exploratory - Basically, the same as simple, but the schema defines
> a single dynamic field -  Think of Hoss's Solr Out of the Box talk from
> ApacheCon whereby you want to quickly explore a new data set without having
> to define a schema.
>[other] -
>
> Note, the templates directory could also live under each contrib, but it
> isn't necessarily a 1-1 thing (e.g. simple and exploratory templates are not
> contrib-specific).
>
> Then, typing "ant example" would copy the necessary tutorial stuff to the
> example directory (which still contains the Jetty stuff) but would not have
> to recurse into any of the contribs.
>
> Typing "ant example -Dtype=clustering"  would copy the clustering
> requirements, plus go to contrib/clustering (or whatever) and get the
> appropriate material such that the example directory.  Similarly for any of
> the other "templates"
>

Isn't this the same as the current setup with the name of the directory
changed and different ant targets to set them up? The new ant target will
setup the default solr instance to be 'extraction' or 'dih' or 'clustering'
and avoid the need to type -Dsolr.solr.home.


>
> Additionally, you could also define -DoutputDir such that it would take and
> copy the whole example directory (including the appropriate type) to some
> output dir.  This would allow one to quickly bootstrap a Solr project
> without having to do a lot of schema editing.
>

I like this idea. I have myself needed to do this a couple of times.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Make ant example faster

2009-04-13 Thread Grant Ingersoll
Funny you should mention it, b/c I had an idea the other day of how to  
speed all this up, plus will satisfy one of my other annoyances with  
the example and make it easier for people to get started (I think).   
So, here goes:


Instead of a kitchen-sink example directory, we "revert" it back to  
being the tutorial example.  It still can get built by ant example,  
but ultimately we "deprecate" it (more later).


Then, as a replacement, we create a directory containing what I would  
call Solr Templates, which contain subdirectories named appropriately  
for the kind of example.  Rather than explain, I'll give an example:


The templates directory would contain the configurations (i.e.  
schema.xml and solrconfig.xml) and any sample docs (but not the  
libraries) for:

tutorial - The current tutorial example
dih - The DIH example
extraction - Solr Cell example
geo - geo spatial example (once 773 is committed)
clustering - once SOLR-769 is committed
	simple - A barebones schema and config (mainly used for bootstrapping  
a new project for experienced users)
	exploratory - Basically, the same as simple, but the schema defines a  
single dynamic field -  Think of Hoss's Solr Out of the Box talk from  
ApacheCon whereby you want to quickly explore a new data set without  
having to define a schema.

[other] -

Note, the templates directory could also live under each contrib, but  
it isn't necessarily a 1-1 thing (e.g. simple and exploratory  
templates are not contrib-specific).


Then, typing "ant example" would copy the necessary tutorial stuff to  
the example directory (which still contains the Jetty stuff) but would  
not have to recurse into any of the contribs.


Typing "ant example -Dtype=clustering"  would copy the clustering  
requirements, plus go to contrib/clustering (or whatever) and get the  
appropriate material such that the example directory.  Similarly for  
any of the other "templates"


Additionally, you could also define -DoutputDir such that it would  
take and copy the whole example directory (including the appropriate  
type) to some output dir.  This would allow one to quickly bootstrap a  
Solr project without having to do a lot of schema editing.


WDYT?

-Grant




On Apr 13, 2009, at 1:56 PM, Shalin Shekhar Mangar wrote:


Hello,

As part of SOLR-934, I'd like to setup an example for indexing mail  
boxes
with the existing example/example-DIH demo. I see that ant example  
has a
dependency on example-contrib. Do we want to do that? I vaguely  
remember

Yonik complaining about the time ant example takes.

For setting up the MailEntityProcessor, I'd have to copy mail,  
activation
and tika jars to example-DIH/solr/mail/lib, which will make it extra  
slow.
How about we remove the dependency to example-contrib and keep it as  
an

independent target?

--
Regards,
Shalin Shekhar Mangar.





Re: Make ant example faster

2009-04-13 Thread Grant Ingersoll


On Apr 13, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:

On Tue, Apr 14, 2009 at 12:33 AM, Grant Ingersoll  
wrote:




Instead of a kitchen-sink example directory, we "revert" it back to  
being
the tutorial example.  It still can get built by ant example, but  
ultimately

we "deprecate" it (more later).

Then, as a replacement, we create a directory containing what I  
would call
Solr Templates, which contain subdirectories named appropriately  
for the

kind of example.  Rather than explain, I'll give an example:

The templates directory would contain the configurations (i.e.  
schema.xml

and solrconfig.xml) and any sample docs (but not the libraries) for:
  tutorial - The current tutorial example
  dih - The DIH example
  extraction - Solr Cell example
  geo - geo spatial example (once 773 is committed)
  clustering - once SOLR-769 is committed
  simple - A barebones schema and config (mainly used for
bootstrapping a new project for experienced users)
  exploratory - Basically, the same as simple, but the schema  
defines
a single dynamic field -  Think of Hoss's Solr Out of the Box talk  
from
ApacheCon whereby you want to quickly explore a new data set  
without having

to define a schema.
  [other] -

Note, the templates directory could also live under each contrib,  
but it
isn't necessarily a 1-1 thing (e.g. simple and exploratory  
templates are not

contrib-specific).

Then, typing "ant example" would copy the necessary tutorial stuff  
to the
example directory (which still contains the Jetty stuff) but would  
not have

to recurse into any of the contribs.

Typing "ant example -Dtype=clustering"  would copy the clustering
requirements, plus go to contrib/clustering (or whatever) and get the
appropriate material such that the example directory.  Similarly  
for any of

the other "templates"



Isn't this the same as the current setup with the name of the  
directory
changed and different ant targets to set them up? The new ant target  
will
setup the default solr instance to be 'extraction' or 'dih' or  
'clustering'

and avoid the need to type -Dsolr.solr.home.



It is similar, indeed, but I think it results in there only ever being  
one active Solr example and the user need not worry about setting solr  
home.


-Grant


Re: Make ant example faster

2009-04-13 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 4:25 AM, Grant Ingersoll wrote:

>
> On Apr 13, 2009, at 3:44 PM, Shalin Shekhar Mangar wrote:
>
>>
>> Isn't this the same as the current setup with the name of the directory
>> changed and different ant targets to set them up? The new ant target will
>> setup the default solr instance to be 'extraction' or 'dih' or
>> 'clustering'
>> and avoid the need to type -Dsolr.solr.home.
>>
>
>
> It is similar, indeed, but I think it results in there only ever being one
> active Solr example and the user need not worry about setting solr home.
>

+1

Lets do it.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Make ant example faster

2009-04-16 Thread Chris Hostetter

: It is similar, indeed, but I think it results in there only ever being one
: active Solr example and the user need not worry about setting solr home.

Hmmm... this seems like a bad idea.

we want to make sure that *users* who have downloaded Solr can run all of 
the examples without needing ant ... having a single "active" example and 
using an ant target to change it would mean that if i install solr and 
then go through the tutorial (using the tutorial example), i would need to 
(understand and run) ant to see the DIH example.

It seems like it would make a lot more sense to have lots of examples and 
let the user set the solr home to try them out -- that's very easy to do.

I'm not sure i really understand the concern about how long "ant example" 
takes ... it's a build time task, and it only take ~15 seconds on my box 
if everything is up to date (if everything isn't up todate then 
compilation is going to take much longer then what "example" does) ... the 
longest contributor to the time seems to be contrib/javascript's "docs" 
target -- but i'm guessing some ant tricks to check directory mod times 
before runing the jsrun.jar could shave that off as well.

-Hoss



Re: Make ant example faster

2009-04-20 Thread Grant Ingersoll


On Apr 16, 2009, at 7:37 PM, Chris Hostetter wrote:



: It is similar, indeed, but I think it results in there only ever  
being one
: active Solr example and the user need not worry about setting solr  
home.


Hmmm... this seems like a bad idea.

we want to make sure that *users* who have downloaded Solr can run  
all of
the examples without needing ant ... having a single "active"  
example and

using an ant target to change it would mean that if i install solr and
then go through the tutorial (using the tutorial example), i would  
need to

(understand and run) ant to see the DIH example.


It seems like it would make a lot more sense to have lots of  
examples and
let the user set the solr home to try them out -- that's very easy  
to do.


I'm not sure i really understand the concern about how long "ant  
example"
takes ... it's a build time task, and it only take ~15 seconds on my  
box

if everything is up to date (if everything isn't up todate then
compilation is going to take much longer then what "example"  
does) ... the
longest contributor to the time seems to be contrib/javascript's  
"docs"
target -- but i'm guessing some ant tricks to check directory mod  
times

before runing the jsrun.jar could shave that off as well.



Fair enough.  FWIW, I'd still like to be able to generate a Solr  
container from an example (i.e. "minimal" or "DIH" or whatever)


Re: Make ant example faster

2009-04-20 Thread Chris Hostetter

: Fair enough.  FWIW, I'd still like to be able to generate a Solr container
: from an example (i.e. "minimal" or "DIH" or whatever)

by "container" do you mean a Solr home with configs and neccessary libs 
ready to be tweaked to suite your purposes?

assuming we have more use-case specific examples, wouldn't that just be 
something that copies one of them to a target directory?



-Hoss



Re: Make ant example faster

2009-04-22 Thread Erik Hatcher
Wouldn't one solution to this bundling and aggregating/separating of  
examples and plugins be made a lot less painful if SolrResourceLoader  
could load from a list of directories rather than only a single  
directory?   What are the negatives to adding that support?  Let's  
keep solr.war lean and mean, with all extensions simply appended to a  
list of JAR containing directories?


I know, we're recreating a container of sorts, but we already got  
SolrResourceLoader, so maybe just some tweaks there can make example  
bundling a lot more pleasurable?


Erik



Re: Make ant example faster

2009-04-22 Thread Grant Ingersoll


On Apr 20, 2009, at 5:45 PM, Chris Hostetter wrote:



: Fair enough.  FWIW, I'd still like to be able to generate a Solr  
container

: from an example (i.e. "minimal" or "DIH" or whatever)

by "container" do you mean a Solr home with configs and neccessary  
libs

ready to be tweaked to suite your purposes?

assuming we have more use-case specific examples, wouldn't that just  
be

something that copies one of them to a target directory?


I guess what I really want is a way to be able to say:  Give me a Solr  
home that has these X features (DIH, Solr Cell, spell checking,  
highlighting, plus whatever libs are needed) with some basic  
configuration + my choice of a schema ranging from one that is  
barebones (maybe just an "id" field defined) to a "full fledged" one  
(the current example) and I want to be able to do it as simple as  
possible (i.e. as few commands as possible).


-Grant


Re: Make ant example faster

2009-04-22 Thread Grant Ingersoll
Even better, is probably something like OSGI where we can make sure  
that we have some level of isolation between the class loaders so that  
we can have different versions of different JARs w/o breaking the  
application.  Since it is clear that Solr is entering into a "contrib"  
phase, it is only a matter of time before we start having version  
clashes between libraries.


On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:

Wouldn't one solution to this bundling and aggregating/separating of  
examples and plugins be made a lot less painful if  
SolrResourceLoader could load from a list of directories rather than  
only a single directory?   What are the negatives to adding that  
support?  Let's keep solr.war lean and mean, with all extensions  
simply appended to a list of JAR containing directories?


I know, we're recreating a container of sorts, but we already got  
SolrResourceLoader, so maybe just some tweaks there can make example  
bundling a lot more pleasurable?


Erik






Re: Make ant example faster

2009-04-22 Thread Erik Hatcher
I was aiming simple... like some simple tweaks to SolrResourceLoader,  
at least a way to allow plugins to all live separately and wired into  
a single Solr instance without copying files and such.


What would it take to wire in OSGI (I know nothing about it)?

Erik


On Apr 22, 2009, at 12:18 PM, Grant Ingersoll wrote:

Even better, is probably something like OSGI where we can make sure  
that we have some level of isolation between the class loaders so  
that we can have different versions of different JARs w/o breaking  
the application.  Since it is clear that Solr is entering into a  
"contrib" phase, it is only a matter of time before we start having  
version clashes between libraries.


On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:

Wouldn't one solution to this bundling and aggregating/separating  
of examples and plugins be made a lot less painful if  
SolrResourceLoader could load from a list of directories rather  
than only a single directory?   What are the negatives to adding  
that support?  Let's keep solr.war lean and mean, with all  
extensions simply appended to a list of JAR containing directories?


I know, we're recreating a container of sorts, but we already got  
SolrResourceLoader, so maybe just some tweaks there can make  
example bundling a lot more pleasurable?


Erik







Re: Make ant example faster

2009-04-22 Thread Ryan McKinley


On Apr 22, 2009, at 12:20 PM, Erik Hatcher wrote:

I was aiming simple... like some simple tweaks to  
SolrResourceLoader, at least a way to allow plugins to all live  
separately and wired into a single Solr instance without copying  
files and such.


What would it take to wire in OSGI (I know nothing about it)?



From my brief experience with OSGi, I don't think it is something we  
can easily tack on to our existing structure.  However it is something  
we should definitely consider for 2.0


I think extending SolrResourceLoader is a good option for 1.4

ryan



Erik


On Apr 22, 2009, at 12:18 PM, Grant Ingersoll wrote:

Even better, is probably something like OSGI where we can make sure  
that we have some level of isolation between the class loaders so  
that we can have different versions of different JARs w/o breaking  
the application.  Since it is clear that Solr is entering into a  
"contrib" phase, it is only a matter of time before we start having  
version clashes between libraries.


On Apr 22, 2009, at 12:05 PM, Erik Hatcher wrote:

Wouldn't one solution to this bundling and aggregating/separating  
of examples and plugins be made a lot less painful if  
SolrResourceLoader could load from a list of directories rather  
than only a single directory?   What are the negatives to adding  
that support?  Let's keep solr.war lean and mean, with all  
extensions simply appended to a list of JAR containing directories?


I know, we're recreating a container of sorts, but we already got  
SolrResourceLoader, so maybe just some tweaks there can make  
example bundling a lot more pleasurable?


Erik









Re: Make ant example faster

2009-04-28 Thread Chris Hostetter
: Wouldn't one solution to this bundling and aggregating/separating of examples
: and plugins be made a lot less painful if SolrResourceLoader could load from a
: list of directories rather than only a single directory?   What are the

I'm not understanding how that would help the example situation.  what 
are you envisioning that the instanceDir would look like?  how 
would SolrResourceLoader know which directories to use?

right now SolrResourceLoader assumes (instanceDir + "lib/") will contain a 
bunch of jars ... i can imagine that we could let that directory contain 
other directories and walk it recursively looking for jars, and then 
people could put symlinks in it to other lib directories -- but how would 
that help us with the example?  would we create the symlinks via ant? can 
tgz/zip files store symlinks efficiently?

Or are you thinking that we would add a new way to specify additional lib 
dir paths in the solrconfig.xml? ... i suppose that would be posible, but 
i think it would require some funky changes to SolrConfig and Config to 
parse out the lib dirs before parsing anything else (that would need to 
kow about the SolrResourceLoader)
 


-Hoss



Re: Make ant example faster

2009-04-28 Thread Chris Hostetter

: > assuming we have more use-case specific examples, wouldn't that just be
: > something that copies one of them to a target directory?
: 
: I guess what I really want is a way to be able to say:  Give me a Solr home
: that has these X features (DIH, Solr Cell, spell checking, highlighting, plus
: whatever libs are needed) with some basic configuration + my choice of a
: schema ranging from one that is barebones (maybe just an "id" field defined)
: to a "full fledged" one (the current example) and I want to be able to do it
: as simple as possible (i.e. as few commands as possible).

Ah i'm understanding now.  you don't just want a lot of good 
micro-examples of each feature, you want an easy way to generate "default" 
configs that work for an arbitrary set of features specified by the user.

That seems like a hard problem to get right in a generic way.

The simplest method i can think of for achiving that would be to start 
with a kitchen-sink type example that includes *everything* (because then 
it's easy to test that all of the pieces work well together and don't 
collide -- duplicate fieldnames or hanler names etc...) and then use xml 
comments or some other templating to be able to split that kitchen sink 
file up into snippets -- which could then be combined again in lots of 
combinations.

(Or ... I suppose the snippets could be maintained by hand and then the 
build system could generate the kitchen sink and run tests to ensure that 
none of them collide ... but maintaining the kitchen-sink by hand seems 
easier in a weird way)


-Hoss