Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
Another persistence solution is ehcache with diskstore. It even has replication

I have never used  ehcache . So I cannot comment on it

any comments?

--Noble

On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:

 On Dec 3, 2008, at 1:28 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 The code can be written against JDBC. But we need to test the DDL and
 data types on al the supported DBs

 But , which one would we like to ship with Solr as a default option?

 Why do we need a default option?  Is this something that is intended to be
 on by default?  Or, do you mean just to have one for unit tests to work?
 Default does not mean that it is enabled bby default. But if it is
 enabled I can have defaults for stuff like driver, url , DDL etc. And
 the user may not need to provide an extra jar

 I don't know if it is still the case, but I often find embedded dbs to be
 quite annoying since you often can't connect to them from other clients
 outside of the JVM which makes debugging harder.  Of course, maybe I just
 don't know the tricks to do it.  Derby is one DB that you can still connect
 to even when it is embedded.
 Embedded is the best bet for us because of performance reasons and
 zero management.
 The users can still read the data through Solr itself .

 Also, whatever is chosen needs to scale to millions of documents, and I
 wonder about an embedded DB doing that.  I also have a hard time believing
 that both a DB w/ millions of docs and Solr can live on the same machine,
 which is presumably what an embedded DB must do.  Presumably, it also needs
 to be able to be replicated, right?
 millions of docs.?
 then you must configure a remote DB for storage reasons
 and must manage the replication separately




 H2 looks impressive. the jar (small)  is just 667KB and the memory
 footprint is small too
 --Noble

 On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote:

 check http://www.h2database.com/  in my view the best embedded DB out
 there.

 from the maker of HSQLDB...  is second round.

 However, from anything solr, I would hope it would just rely on JDBC.


 On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:

 HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
 beyond
 that without a commit.

 On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
 [EMAIL PROTECTED]wrote:


 Isn't HSQLDB an option? Its performance ranges a lot depending on the
 volume of data and queries, but otherwise the license looks BSDish.

 http://hsqldb.org/web/hsqlLicense.html

 Dawid




 --
 Regards,
 Shalin Shekhar Mangar.





 --
 --Noble Paul

 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ














 --
 --Noble Paul




-- 
--Noble Paul


[jira] Updated: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport

2008-12-04 Thread Dan Rosher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Rosher updated SOLR-893:


Attachment: SOLR-893.patch

Thanks Paul ... I've made that change ... additionally I've noticed that during 
any particular delta import you might have both and update/create AND a delete, 
the current code would not honor the delete hence I've added something to cater 
for this, and updated the test to confirm.  

 Unable to delete documents via SQL and deletedPkQuery with deltaimport
 --

 Key: SOLR-893
 URL: https://issues.apache.org/jira/browse/SOLR-893
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Dan Rosher
 Fix For: 1.3

 Attachments: SOLR-893.patch, SOLR-893.patch


 DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator 
 for the modified rows, but when it comes time to call 
 entityProcessor.nextDeletedRowKey, this is skipped as although no rows are 
 returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is 
 still not null

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport

2008-12-04 Thread Dan Rosher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653244#action_12653244
 ] 

rosher edited comment on SOLR-893 at 12/4/08 2:23 AM:
--

Thanks Noble ... I've made that change ... additionally I've noticed that 
during any particular delta import you might have both and update/create AND a 
delete, the current code would not honor the delete hence I've added something 
to cater for this, and updated the test to confirm.  

  was (Author: rosher):
Thanks Paul ... I've made that change ... additionally I've noticed that 
during any particular delta import you might have both and update/create AND a 
delete, the current code would not honor the delete hence I've added something 
to cater for this, and updated the test to confirm.  
  
 Unable to delete documents via SQL and deletedPkQuery with deltaimport
 --

 Key: SOLR-893
 URL: https://issues.apache.org/jira/browse/SOLR-893
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Dan Rosher
 Fix For: 1.3

 Attachments: SOLR-893.patch, SOLR-893.patch


 DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator 
 for the modified rows, but when it comes time to call 
 entityProcessor.nextDeletedRowKey, this is skipped as although no rows are 
 returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is 
 still not null

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-894) Distributed Search in combination with fl=score returns inconsistent number of fields

2008-12-04 Thread Mario Klaver (JIRA)
Distributed Search in combination with fl=score returns inconsistent number of 
fields
-

 Key: SOLR-894
 URL: https://issues.apache.org/jira/browse/SOLR-894
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
 Environment: Setup distributed search
Reporter: Mario Klaver
Priority: Minor


1) http://localhost:8983/solr/select?indent=trueq=ipod+solr 
== Returns all configured fields

2) http://localhost:8983/solr/select?indent=trueq=ipod+solrfl=score 
== Returns all configured fields + score

3) 
http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
== Returns all configured fields

4) 
http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solrfl=score
== Returns unique ID and score field

Result 4) is inconsistent with result 2).

Solutions:
1) Request 2) will only return score (in this case, also the java client needs 
to be updated (query.addScoreField(true))
2) Request 4) will return all configured fields including score







-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: putting UnInvertedField instances in a SolrCache?

2008-12-04 Thread Yonik Seeley
Right.
  - we need a blocking cache to avoid more than one thread
attempting to generate, but that can be done outside the SolrCache for
now.
  - prob want to expose the statistics collected... (see logging
output of new faceting stuff)
  - might want a way to dynamically add caches.. but for now adding a
magic facetCache that exists even when not on solrconfig.xml is prob
easiest (the current solr caches do not get instantiated if they are
not in solrconfig.xml - they are seen as optional).

-Yonik

On Tue, Dec 2, 2008 at 6:27 PM, Chris Hostetter
[EMAIL PROTECTED] wrote:

 recent wiki updates have be looking at UnInvertedField for the first time (i
 haven't been very good at keeping up with commits the last few months) and
 i'm wondering about the use of a static Cache multiValuedFieldCache keyed
 off of SolrIndexSearcher.

 Lucene-Java is trying to move away from this pattern in FieldCache, and in
 Solr we already have a nice and robust cache mechanism on each
 SolrIndexSearcher -- that includes the possibility of doing auto-warming via
 regenerators -- so why don't we suse that for UnInvertedField?

 suggested changes...

 1) add a new special (as opposed to user) SolrCache instance named
 facetCache to SolrIndexSearcher (just like filterCache and
 queryResultCache) where the key is a field name and the value is an
 UnInvertedField instance.

 2) I think the way the special caches are initialized they eist with
 defaults even if they aren't declared in solrconfig.xml, but if i'm wrong we
 should consier making facetCache work that way.

 2) add a regenerator for facetCache (relatively trivial)

 3) remove all of the static cashing code from UnInvertedField

thoughts?


 -Hoss


Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Sami Siren

Yet another possibility: http://wiki.apache.org/incubator/Cassandra

It at least claims to be scalable, no personal experience.

--
Sami Siren

Noble Paul ??? ?? wrote:

Another persistence solution is ehcache with diskstore. It even has replication

I have never used  ehcache . So I cannot comment on it

any comments?

--Noble

On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
[EMAIL PROTECTED] wrote:
  

On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:


On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:

  

The code can be written against JDBC. But we need to test the DDL and
data types on al the supported DBs

But , which one would we like to ship with Solr as a default option?


Why do we need a default option?  Is this something that is intended to be
on by default?  Or, do you mean just to have one for unit tests to work?
  

Default does not mean that it is enabled bby default. But if it is
enabled I can have defaults for stuff like driver, url , DDL etc. And
the user may not need to provide an extra jar


I don't know if it is still the case, but I often find embedded dbs to be
quite annoying since you often can't connect to them from other clients
outside of the JVM which makes debugging harder.  Of course, maybe I just
don't know the tricks to do it.  Derby is one DB that you can still connect
to even when it is embedded.
  

Embedded is the best bet for us because of performance reasons and
zero management.
The users can still read the data through Solr itself .


Also, whatever is chosen needs to scale to millions of documents, and I
wonder about an embedded DB doing that.  I also have a hard time believing
that both a DB w/ millions of docs and Solr can live on the same machine,
which is presumably what an embedded DB must do.  Presumably, it also needs
to be able to be replicated, right?
  

millions of docs.?
then you must configure a remote DB for storage reasons
and must manage the replication separately

  

H2 looks impressive. the jar (small)  is just 667KB and the memory
footprint is small too
--Noble

On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote:


check http://www.h2database.com/  in my view the best embedded DB out
there.

from the maker of HSQLDB...  is second round.

However, from anything solr, I would hope it would just rely on JDBC.


On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:

  

HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
beyond
that without a commit.

On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
[EMAIL PROTECTED]wrote:



Isn't HSQLDB an option? Its performance ranges a lot depending on the
volume of data and queries, but otherwise the license looks BSDish.

http://hsqldb.org/web/hsqlLicense.html

Dawid

  


--
Regards,
Shalin Shekhar Mangar.

  


--
--Noble Paul


--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











  


--
--Noble Paul






  




Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
Cassandra does not meet our requirements.
we do not need that kind of scalability

Moreover its future is uncertain and they are trying to incubate it into Solr


On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote:
 Yet another possibility: http://wiki.apache.org/incubator/Cassandra

 It at least claims to be scalable, no personal experience.

 --
 Sami Siren

 Noble Paul ??? ?? wrote:

 Another persistence solution is ehcache with diskstore. It even has
 replication

 I have never used  ehcache . So I cannot comment on it

 any comments?

 --Noble

 On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
 [EMAIL PROTECTED] wrote:


 On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:


 On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:



 The code can be written against JDBC. But we need to test the DDL and
 data types on al the supported DBs

 But , which one would we like to ship with Solr as a default option?


 Why do we need a default option?  Is this something that is intended to
 be
 on by default?  Or, do you mean just to have one for unit tests to work?


 Default does not mean that it is enabled bby default. But if it is
 enabled I can have defaults for stuff like driver, url , DDL etc. And
 the user may not need to provide an extra jar


 I don't know if it is still the case, but I often find embedded dbs to
 be
 quite annoying since you often can't connect to them from other clients
 outside of the JVM which makes debugging harder.  Of course, maybe I
 just
 don't know the tricks to do it.  Derby is one DB that you can still
 connect
 to even when it is embedded.


 Embedded is the best bet for us because of performance reasons and
 zero management.
 The users can still read the data through Solr itself .


 Also, whatever is chosen needs to scale to millions of documents, and I
 wonder about an embedded DB doing that.  I also have a hard time
 believing
 that both a DB w/ millions of docs and Solr can live on the same
 machine,
 which is presumably what an embedded DB must do.  Presumably, it also
 needs
 to be able to be replicated, right?


 millions of docs.?
 then you must configure a remote DB for storage reasons
 and must manage the replication separately




 H2 looks impressive. the jar (small)  is just 667KB and the memory
 footprint is small too
 --Noble

 On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED]
 wrote:


 check http://www.h2database.com/  in my view the best embedded DB out
 there.

 from the maker of HSQLDB...  is second round.

 However, from anything solr, I would hope it would just rely on JDBC.


 On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:



 HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
 beyond
 that without a commit.

 On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
 [EMAIL PROTECTED]wrote:



 Isn't HSQLDB an option? Its performance ranges a lot depending on
 the
 volume of data and queries, but otherwise the license looks BSDish.

 http://hsqldb.org/web/hsqlLicense.html

 Dawid



 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul


 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ













 --
 --Noble Paul











-- 
--Noble Paul


logo contest

2008-12-04 Thread Ryan McKinley
We have discussed with the apache PRC (public relations committee),  
and they agree that the top choice in the logo contest should be  
disqualified for its similarity to the solaris logo.


Given the rules agreed upon in http://wiki.apache.org/solr/ 
LogoContest, the next step is for Solr committers to use the results  
of the community poll to decide what the official logo should be.


I posted the results here:
 http://people.apache.org/~ryan/solr-logo-results.html
If we count a vote that came in 12 hours late, the results are quite  
different:

 http://people.apache.org/~ryan/solr-logo-results-late.html

Using the direct scoring method agreed upon, the logo with the most  
points is:

 
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

However, it is tough to gauge the real intent/preference since the  
vote totals are so low.


I see two options:

1.  Have solr committers vote to accept:
 
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

2. Have a 'runoff' poll with the top contenders:
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg


Following the rules strictly points to option #1, but I think option  
#2 may better reflect the original intent of the community poll.


Personally, I am happy with any of these options (and logos); I just  
want to make sure we have a process that everyone feels is/was fair.


ryan







Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Ryan McKinley

Again, I would hope that solr builds a storage agnostic solution.

As long as we have a simple interface to load/store documents, it  
should be easy to write a JDBC/ehcache/disk/Cassandra/whatever  
implementation.


ryan


On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



Cassandra does not meet our requirements.
we do not need that kind of scalability

Moreover its future is uncertain and they are trying to incubate it  
into Solr



On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote:

Yet another possibility: http://wiki.apache.org/incubator/Cassandra

It at least claims to be scalable, no personal experience.

--
Sami Siren

Noble Paul ??? ?? wrote:


Another persistence solution is ehcache with diskstore. It even has
replication

I have never used  ehcache . So I cannot comment on it

any comments?

--Noble

On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
[EMAIL PROTECTED] wrote:



On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] 


wrote:



On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:




The code can be written against JDBC. But we need to test the  
DDL and

data types on al the supported DBs

But , which one would we like to ship with Solr as a default  
option?




Why do we need a default option?  Is this something that is  
intended to

be
on by default?  Or, do you mean just to have one for unit tests  
to work?




Default does not mean that it is enabled bby default. But if it is
enabled I can have defaults for stuff like driver, url , DDL etc.  
And

the user may not need to provide an extra jar



I don't know if it is still the case, but I often find embedded  
dbs to

be
quite annoying since you often can't connect to them from other  
clients
outside of the JVM which makes debugging harder.  Of course,  
maybe I

just
don't know the tricks to do it.  Derby is one DB that you can  
still

connect
to even when it is embedded.



Embedded is the best bet for us because of performance reasons and
zero management.
The users can still read the data through Solr itself .



Also, whatever is chosen needs to scale to millions of  
documents, and I

wonder about an embedded DB doing that.  I also have a hard time
believing
that both a DB w/ millions of docs and Solr can live on the same
machine,
which is presumably what an embedded DB must do.  Presumably, it  
also

needs
to be able to be replicated, right?



millions of docs.?
then you must configure a remote DB for storage reasons
and must manage the replication separately






H2 looks impressive. the jar (small)  is just 667KB and the  
memory

footprint is small too
--Noble

On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley  
[EMAIL PROTECTED]

wrote:



check http://www.h2database.com/  in my view the best embedded  
DB out

there.

from the maker of HSQLDB...  is second round.

However, from anything solr, I would hope it would just rely  
on JDBC.



On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:




HSQLDB has a limit of upto 8GB of data. In Solr, you might  
want to go

beyond
that without a commit.

On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
[EMAIL PROTECTED]wrote:




Isn't HSQLDB an option? Its performance ranges a lot  
depending on

the
volume of data and queries, but otherwise the license looks  
BSDish.


http://hsqldb.org/web/hsqlLicense.html

Dawid




--
Regards,
Shalin Shekhar Mangar.






--
--Noble Paul



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ














--
--Noble Paul














--
--Noble Paul




Re: logo contest

2008-12-04 Thread Ryan McKinley
 I see two options:
 1.  Have solr committers vote to accept:
  
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
 2. Have a 'runoff' poll with the top contenders:
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
 https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
 https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
 https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg

 Following the rules strictly points to option #1, but I think option #2 may
 better reflect the original intent of the community poll.
 Personally, I am happy with any of these options (and logos); I just want to
 make sure we have a process that everyone feels is/was fair.


I'll add that i have a slight preference for option #1 since it would
get this process done with sooner :)

ryan


Re: logo contest

2008-12-04 Thread Mark Miller
Hoss may lay down the rules on us, but if he doesn't (or if hes in a 
good mood today), +1 on the runoff vote.


Ryan McKinley wrote:
We have discussed with the apache PRC (public relations committee), 
and they agree that the top choice in the logo contest should be 
disqualified for its similarity to the solaris logo.


Given the rules agreed upon in 
http://wiki.apache.org/solr/LogoContest, the next step is for Solr 
committers to use the results of the community poll to decide what the 
official logo should be.


I posted the results here:
 http://people.apache.org/~ryan/solr-logo-results.html
If we count a vote that came in 12 hours late, the results are quite 
different:

 http://people.apache.org/~ryan/solr-logo-results-late.html

Using the direct scoring method agreed upon, the logo with the most 
points is:
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 



However, it is tough to gauge the real intent/preference since the 
vote totals are so low.


I see two options:

1.  Have solr committers vote to accept:
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 



2. Have a 'runoff' poll with the top contenders:
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 

https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg 

https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg 

https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg 




Following the rules strictly points to option #1, but I think option 
#2 may better reflect the original intent of the community poll.


Personally, I am happy with any of these options (and logos); I just 
want to make sure we have a process that everyone feels is/was fair.


ryan










Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
The solution will be an UpdateRequestProcessor (which itself is
pluggable).I am implementing a JDBC based one. I'll test with H2 and
MySql (and may be Derby)

We will ship the H2 (embedded) jar






On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
 Again, I would hope that solr builds a storage agnostic solution.

 As long as we have a simple interface to load/store documents, it should be
 easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.

 ryan


 On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 Cassandra does not meet our requirements.
 we do not need that kind of scalability

 Moreover its future is uncertain and they are trying to incubate it into
 Solr


 On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote:

 Yet another possibility: http://wiki.apache.org/incubator/Cassandra

 It at least claims to be scalable, no personal experience.

 --
 Sami Siren

 Noble Paul ??? ?? wrote:

 Another persistence solution is ehcache with diskstore. It even has
 replication

 I have never used  ehcache . So I cannot comment on it

 any comments?

 --Noble

 On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
 [EMAIL PROTECTED] wrote:


 On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:


 On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:



 The code can be written against JDBC. But we need to test the DDL and
 data types on al the supported DBs

 But , which one would we like to ship with Solr as a default option?


 Why do we need a default option?  Is this something that is intended
 to
 be
 on by default?  Or, do you mean just to have one for unit tests to
 work?


 Default does not mean that it is enabled bby default. But if it is
 enabled I can have defaults for stuff like driver, url , DDL etc. And
 the user may not need to provide an extra jar


 I don't know if it is still the case, but I often find embedded dbs to
 be
 quite annoying since you often can't connect to them from other
 clients
 outside of the JVM which makes debugging harder.  Of course, maybe I
 just
 don't know the tricks to do it.  Derby is one DB that you can still
 connect
 to even when it is embedded.


 Embedded is the best bet for us because of performance reasons and
 zero management.
 The users can still read the data through Solr itself .


 Also, whatever is chosen needs to scale to millions of documents, and
 I
 wonder about an embedded DB doing that.  I also have a hard time
 believing
 that both a DB w/ millions of docs and Solr can live on the same
 machine,
 which is presumably what an embedded DB must do.  Presumably, it also
 needs
 to be able to be replicated, right?


 millions of docs.?
 then you must configure a remote DB for storage reasons
 and must manage the replication separately




 H2 looks impressive. the jar (small)  is just 667KB and the memory
 footprint is small too
 --Noble

 On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED]
 wrote:


 check http://www.h2database.com/  in my view the best embedded DB
 out
 there.

 from the maker of HSQLDB...  is second round.

 However, from anything solr, I would hope it would just rely on
 JDBC.


 On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:



 HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
 go
 beyond
 that without a commit.

 On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
 [EMAIL PROTECTED]wrote:



 Isn't HSQLDB an option? Its performance ranges a lot depending on
 the
 volume of data and queries, but otherwise the license looks
 BSDish.

 http://hsqldb.org/web/hsqlLicense.html

 Dawid



 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul


 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ













 --
 --Noble Paul











 --
 --Noble Paul





-- 
--Noble Paul


Re: logo contest

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
I prefertaking option #1 and not delaying this any further

On Thu, Dec 4, 2008 at 9:57 PM, Mark Miller [EMAIL PROTECTED] wrote:
 Hoss may lay down the rules on us, but if he doesn't (or if hes in a good
 mood today), +1 on the runoff vote.

 Ryan McKinley wrote:

 We have discussed with the apache PRC (public relations committee), and
 they agree that the top choice in the logo contest should be disqualified
 for its similarity to the solaris logo.

 Given the rules agreed upon in http://wiki.apache.org/solr/LogoContest,
 the next step is for Solr committers to use the results of the community
 poll to decide what the official logo should be.

 I posted the results here:
  http://people.apache.org/~ryan/solr-logo-results.html
 If we count a vote that came in 12 hours late, the results are quite
 different:
  http://people.apache.org/~ryan/solr-logo-results-late.html

 Using the direct scoring method agreed upon, the logo with the most points
 is:

  
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

 However, it is tough to gauge the real intent/preference since the vote
 totals are so low.

 I see two options:

 1.  Have solr committers vote to accept:

  
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

 2. Have a 'runoff' poll with the top contenders:

 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

 https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg

 https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg

 https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg


 Following the rules strictly points to option #1, but I think option #2
 may better reflect the original intent of the community poll.

 Personally, I am happy with any of these options (and logos); I just want
 to make sure we have a process that everyone feels is/was fair.

 ryan











-- 
--Noble Paul


Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Yonik Seeley
A database, just to store uncommitted documents in case they might be
updated, seems like it will have a pretty major impact on indexing
performance.  A lucene-only implementation would seem to be much
lighter on resources.

-Yonik

On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 The solution will be an UpdateRequestProcessor (which itself is
 pluggable).I am implementing a JDBC based one. I'll test with H2 and
 MySql (and may be Derby)

 We will ship the H2 (embedded) jar






 On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
 Again, I would hope that solr builds a storage agnostic solution.

 As long as we have a simple interface to load/store documents, it should be
 easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.

 ryan


 On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 Cassandra does not meet our requirements.
 we do not need that kind of scalability

 Moreover its future is uncertain and they are trying to incubate it into
 Solr


 On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote:

 Yet another possibility: http://wiki.apache.org/incubator/Cassandra

 It at least claims to be scalable, no personal experience.

 --
 Sami Siren

 Noble Paul ??? ?? wrote:

 Another persistence solution is ehcache with diskstore. It even has
 replication

 I have never used  ehcache . So I cannot comment on it

 any comments?

 --Noble

 On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
 [EMAIL PROTECTED] wrote:


 On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:


 On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:



 The code can be written against JDBC. But we need to test the DDL and
 data types on al the supported DBs

 But , which one would we like to ship with Solr as a default option?


 Why do we need a default option?  Is this something that is intended
 to
 be
 on by default?  Or, do you mean just to have one for unit tests to
 work?


 Default does not mean that it is enabled bby default. But if it is
 enabled I can have defaults for stuff like driver, url , DDL etc. And
 the user may not need to provide an extra jar


 I don't know if it is still the case, but I often find embedded dbs to
 be
 quite annoying since you often can't connect to them from other
 clients
 outside of the JVM which makes debugging harder.  Of course, maybe I
 just
 don't know the tricks to do it.  Derby is one DB that you can still
 connect
 to even when it is embedded.


 Embedded is the best bet for us because of performance reasons and
 zero management.
 The users can still read the data through Solr itself .


 Also, whatever is chosen needs to scale to millions of documents, and
 I
 wonder about an embedded DB doing that.  I also have a hard time
 believing
 that both a DB w/ millions of docs and Solr can live on the same
 machine,
 which is presumably what an embedded DB must do.  Presumably, it also
 needs
 to be able to be replicated, right?


 millions of docs.?
 then you must configure a remote DB for storage reasons
 and must manage the replication separately




 H2 looks impressive. the jar (small)  is just 667KB and the memory
 footprint is small too
 --Noble

 On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED]
 wrote:


 check http://www.h2database.com/  in my view the best embedded DB
 out
 there.

 from the maker of HSQLDB...  is second round.

 However, from anything solr, I would hope it would just rely on
 JDBC.


 On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:



 HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
 go
 beyond
 that without a commit.

 On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
 [EMAIL PROTECTED]wrote:



 Isn't HSQLDB an option? Its performance ranges a lot depending on
 the
 volume of data and queries, but otherwise the license looks
 BSDish.

 http://hsqldb.org/web/hsqlLicense.html

 Dawid



 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul


 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ













 --
 --Noble Paul











 --
 --Noble Paul





 --
 --Noble Paul



Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
I tried that and the solution looked so clumsy .
I need to commit the to read anything was making things difficult
DB provides me 'immediate' reads .
I am sure performance will be hit anyway.
Is Lucene write much faster than DB (embedded) writes?
http://www.h2database.com/html/performance.html


On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 A database, just to store uncommitted documents in case they might be
 updated, seems like it will have a pretty major impact on indexing
 performance.  A lucene-only implementation would seem to be much
 lighter on resources.

 -Yonik

 On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 The solution will be an UpdateRequestProcessor (which itself is
 pluggable).I am implementing a JDBC based one. I'll test with H2 and
 MySql (and may be Derby)

 We will ship the H2 (embedded) jar






 On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
 Again, I would hope that solr builds a storage agnostic solution.

 As long as we have a simple interface to load/store documents, it should be
 easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.

 ryan


 On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 Cassandra does not meet our requirements.
 we do not need that kind of scalability

 Moreover its future is uncertain and they are trying to incubate it into
 Solr


 On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote:

 Yet another possibility: http://wiki.apache.org/incubator/Cassandra

 It at least claims to be scalable, no personal experience.

 --
 Sami Siren

 Noble Paul ??? ?? wrote:

 Another persistence solution is ehcache with diskstore. It even has
 replication

 I have never used  ehcache . So I cannot comment on it

 any comments?

 --Noble

 On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
 [EMAIL PROTECTED] wrote:


 On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:


 On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:



 The code can be written against JDBC. But we need to test the DDL and
 data types on al the supported DBs

 But , which one would we like to ship with Solr as a default option?


 Why do we need a default option?  Is this something that is intended
 to
 be
 on by default?  Or, do you mean just to have one for unit tests to
 work?


 Default does not mean that it is enabled bby default. But if it is
 enabled I can have defaults for stuff like driver, url , DDL etc. And
 the user may not need to provide an extra jar


 I don't know if it is still the case, but I often find embedded dbs to
 be
 quite annoying since you often can't connect to them from other
 clients
 outside of the JVM which makes debugging harder.  Of course, maybe I
 just
 don't know the tricks to do it.  Derby is one DB that you can still
 connect
 to even when it is embedded.


 Embedded is the best bet for us because of performance reasons and
 zero management.
 The users can still read the data through Solr itself .


 Also, whatever is chosen needs to scale to millions of documents, and
 I
 wonder about an embedded DB doing that.  I also have a hard time
 believing
 that both a DB w/ millions of docs and Solr can live on the same
 machine,
 which is presumably what an embedded DB must do.  Presumably, it also
 needs
 to be able to be replicated, right?


 millions of docs.?
 then you must configure a remote DB for storage reasons
 and must manage the replication separately




 H2 looks impressive. the jar (small)  is just 667KB and the memory
 footprint is small too
 --Noble

 On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED]
 wrote:


 check http://www.h2database.com/  in my view the best embedded DB
 out
 there.

 from the maker of HSQLDB...  is second round.

 However, from anything solr, I would hope it would just rely on
 JDBC.


 On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:



 HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
 go
 beyond
 that without a commit.

 On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
 [EMAIL PROTECTED]wrote:



 Isn't HSQLDB an option? Its performance ranges a lot depending on
 the
 volume of data and queries, but otherwise the license looks
 BSDish.

 http://hsqldb.org/web/hsqlLicense.html

 Dawid



 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul


 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ













 --
 --Noble Paul











 --
 --Noble Paul





 --
 --Noble Paul





-- 
--Noble Paul


Re: logo contest

2008-12-04 Thread Andrzej Bialecki

Ryan McKinley wrote:


I see two options:

1.  Have solr committers vote to accept:
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 



2. Have a 'runoff' poll with the top contenders:
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 

https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg 

https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg 

https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg 




Following the rules strictly points to option #1, but I think option #2 
may better reflect the original intent of the community poll.


I prefer option #2 as well.



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: logo contest

2008-12-04 Thread Erik Hatcher
I'm with Noble.  #1 for me as well for the sake of making a decision  
and running with a quality logo sooner rather than later.


ObBiasTransparency: Steve Stedman,the designer of sslogo* submissions,  
is a good friend of mine.  Awesome dude.  He does really nice work.


Erik

On Dec 4, 2008, at 11:36 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



I prefertaking option #1 and not delaying this any further

On Thu, Dec 4, 2008 at 9:57 PM, Mark Miller [EMAIL PROTECTED]  
wrote:
Hoss may lay down the rules on us, but if he doesn't (or if hes in  
a good

mood today), +1 on the runoff vote.

Ryan McKinley wrote:


We have discussed with the apache PRC (public relations  
committee), and
they agree that the top choice in the logo contest should be  
disqualified

for its similarity to the solaris logo.

Given the rules agreed upon in http://wiki.apache.org/solr/LogoContest 
,
the next step is for Solr committers to use the results of the  
community

poll to decide what the official logo should be.

I posted the results here:
http://people.apache.org/~ryan/solr-logo-results.html
If we count a vote that came in 12 hours late, the results are quite
different:
http://people.apache.org/~ryan/solr-logo-results-late.html

Using the direct scoring method agreed upon, the logo with the  
most points

is:

https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

However, it is tough to gauge the real intent/preference since the  
vote

totals are so low.

I see two options:

1.  Have solr committers vote to accept:

https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

2. Have a 'runoff' poll with the top contenders:

https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg

https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg

https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg


Following the rules strictly points to option #1, but I think  
option #2

may better reflect the original intent of the community poll.

Personally, I am happy with any of these options (and logos); I  
just want

to make sure we have a process that everyone feels is/was fair.

ryan













--
--Noble Paul




[jira] Created: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

2008-12-04 Thread Cameron Pope (JIRA)
DataImportHandler does not import multiple documents specified in 
db-data-config.xml


 Key: SOLR-895
 URL: https://issues.apache.org/jira/browse/SOLR-895
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.3.1, 1.4
Reporter: Cameron Pope


In our system we have multiple kinds of items that need to be indexed. In the 
database, they are represented as 'one table per concrete class'. We are using 
the DataImportHandler to automatically create an index from our database. The 
db-data-config.xml file that we are using contains two 'Document' elements: one 
for each class of item that we are indexing.

Expected behavior: the DataImportHandler imports items for each 'Document' tag 
defined in the configuration file
Actual behavior: the DataImportHandler stops importing it completes indexing of 
the first document

I am attaching a patch, with a unit test that verifies the correct behavior, it 
should apply against the trunk without problems. I can also supply a patch 
against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

2008-12-04 Thread Cameron Pope (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cameron Pope updated SOLR-895:
--

Attachment: import-multiple-documents.patch

This is a patch to DataImporter that causes it to import all documents defined 
in the config file. There is also a unit test to verify correct behavior. It 
should apply against the svn trunk without any problems.

 DataImportHandler does not import multiple documents specified in 
 db-data-config.xml
 

 Key: SOLR-895
 URL: https://issues.apache.org/jira/browse/SOLR-895
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.3.1, 1.4
Reporter: Cameron Pope
 Attachments: import-multiple-documents.patch


 In our system we have multiple kinds of items that need to be indexed. In the 
 database, they are represented as 'one table per concrete class'. We are 
 using the DataImportHandler to automatically create an index from our 
 database. The db-data-config.xml file that we are using contains two 
 'Document' elements: one for each class of item that we are indexing.
 Expected behavior: the DataImportHandler imports items for each 'Document' 
 tag defined in the configuration file
 Actual behavior: the DataImportHandler stops importing it completes indexing 
 of the first document
 I am attaching a patch, with a unit test that verifies the correct behavior, 
 it should apply against the trunk without problems. I can also supply a patch 
 against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

2008-12-04 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653366#action_12653366
 ] 

Noble Paul commented on SOLR-895:
-

why can't this be achieved using multiple root-entities under the same document?

 DataImportHandler does not import multiple documents specified in 
 db-data-config.xml
 

 Key: SOLR-895
 URL: https://issues.apache.org/jira/browse/SOLR-895
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.3.1, 1.4
Reporter: Cameron Pope
 Attachments: import-multiple-documents.patch


 In our system we have multiple kinds of items that need to be indexed. In the 
 database, they are represented as 'one table per concrete class'. We are 
 using the DataImportHandler to automatically create an index from our 
 database. The db-data-config.xml file that we are using contains two 
 'Document' elements: one for each class of item that we are indexing.
 Expected behavior: the DataImportHandler imports items for each 'Document' 
 tag defined in the configuration file
 Actual behavior: the DataImportHandler stops importing it completes indexing 
 of the first document
 I am attaching a patch, with a unit test that verifies the correct behavior, 
 it should apply against the trunk without problems. I can also supply a patch 
 against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr nightly build failure

2008-12-04 Thread Yonik Seeley
On Wed, Dec 3, 2008 at 3:29 PM, Ryan McKinley [EMAIL PROTECTED] wrote:

   [junit] Tests run: 9, Failures: 1, Errors: 0, Time elapsed: 17.101 sec
   [junit] Test org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
 FAILED

 Any thoughts on this?

 Things are building fine on my local system and on hudson:
 http://hudson.zones.apache.org/hudson/job/Solr-trunk/

 Where is the test output stored so we can look at what is actually
 happening?

/tmp directory on the lucene zone - but it's too late now, last build succeeded.
I'll try and be quicker next time to try and grab the output or
someone could try and hack the nightly build script to email the
failed test output (only the first 1 or 2 to avoid spamming things
perhaps).

-Yonik


Re: logo contest

2008-12-04 Thread Chris Hostetter

: 1.  Have solr committers vote to accept:
:  
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

The process as outlined on the wiki was that the commiters should have a 
ranked prefrence vote, after considering the point totals from the first 
vote. (with the added caveat that a -1 veto needs to be allowed since it's 
a vote to commit a change to the project)

Considering the community prefrences expressed, I suggest that the 
committers hold a vote of the high scoring entries.  Picking a score 
of 10 as the cut off, that would give us 10 entries to vote on

https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png

(given the distribution of scores, 10 just seems like a natural cutoff)


-Hoss



[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml

2008-12-04 Thread Cameron Pope (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653396#action_12653396
 ] 

Cameron Pope commented on SOLR-895:
---

I tried moving both root entities under the same document element and 
specifying 'docRoot=true' for both of them and that appears to work. Thanks.

Since I am new to Solr, please forgive me for logging what is probably not a 
bug at all. Is specifying multiple 'root' entities the envisioned way to solve 
this problem, or is it a workaround? Just curious and trying to gain a better 
understanding of the design (I noticed parts of the DataImporter assume 
multiple Document elements and other parts assume only one), and if so, I'd 
be happy to update the wiki to include it -- I imagine I am not the only one 
who has a database schema like this who wants to create an index with Solr. 

All in all, I have been hugely impressed with Solr and the DataImportHandler - 
both are incredible pieces of work. Thanks!


 DataImportHandler does not import multiple documents specified in 
 db-data-config.xml
 

 Key: SOLR-895
 URL: https://issues.apache.org/jira/browse/SOLR-895
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.3.1, 1.4
Reporter: Cameron Pope
 Attachments: import-multiple-documents.patch


 In our system we have multiple kinds of items that need to be indexed. In the 
 database, they are represented as 'one table per concrete class'. We are 
 using the DataImportHandler to automatically create an index from our 
 database. The db-data-config.xml file that we are using contains two 
 'Document' elements: one for each class of item that we are indexing.
 Expected behavior: the DataImportHandler imports items for each 'Document' 
 tag defined in the configuration file
 Actual behavior: the DataImportHandler stops importing it completes indexing 
 of the first document
 I am attaching a patch, with a unit test that verifies the correct behavior, 
 it should apply against the trunk without problems. I can also supply a patch 
 against the 1.3 branch if you would like.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: logo contest

2008-12-04 Thread Yonik Seeley
The methodology will very likely determine the outcome here, with

https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg

Being the likely two candidates for winning.  My guess is that
narrowing to the two most popular options first would make #2 the
winner, while voting on the top 10 (w/o any strategy for winning)
would make #1 the winner.

fun, fun.  So people who want one of these options to win should vote
only for that option, really.

-Yonik


the two most popular would make the second option win, while expanding
it would make fir

On Thu, Dec 4, 2008 at 1:16 PM, Chris Hostetter
[EMAIL PROTECTED] wrote:

 : 1.  Have solr committers vote to accept:
 :  
 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

 The process as outlined on the wiki was that the commiters should have a
 ranked prefrence vote, after considering the point totals from the first
 vote. (with the added caveat that a -1 veto needs to be allowed since it's
 a vote to commit a change to the project)

 Considering the community prefrences expressed, I suggest that the
 committers hold a vote of the high scoring entries.  Picking a score
 of 10 as the cut off, that would give us 10 entries to vote on

 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
 https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
 https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
 https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
 https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png
 https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
 https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
 https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
 https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
 https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png

 (given the distribution of scores, 10 just seems like a natural cutoff)


 -Hoss




Backwards compatibility

2008-12-04 Thread Michael Busch

Hi,

I was wondering what the backwards-compatibility rules in Solr are? Is 
it the same as in Lucene, i.e. public and protected APIs can only be 
changed in a major release (X.Y - (X+1).0) ?
I'd like to consolidate the function queries in Solr and Lucene and it's 
gonna be quite messy if we have to keep all classes in Solr's 
search/function package around.


-Michael


Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 11:47 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 I tried that and the solution looked so clumsy .
 I need to commit the to read anything was making things difficult

In a high update environment, most documents would be exposed to an
open reader with no need to commit or reopen the index to retrieve the
stored fields.
In a way, solving the more realtime update issue removes the necessity
for this altogether.

 Is Lucene write much faster than DB (embedded) writes?

More to the point, we're already doing the Lucene write (for the most
part) anyway, and the DB write is overhead to the indexing process.

-Yonik

 On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 A database, just to store uncommitted documents in case they might be
 updated, seems like it will have a pretty major impact on indexing
 performance.  A lucene-only implementation would seem to be much
 lighter on resources.

 -Yonik

 On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 The solution will be an UpdateRequestProcessor (which itself is
 pluggable).I am implementing a JDBC based one. I'll test with H2 and
 MySql (and may be Derby)

 We will ship the H2 (embedded) jar






 On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
 Again, I would hope that solr builds a storage agnostic solution.

 As long as we have a simple interface to load/store documents, it should be
 easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.

 ryan


 On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 Cassandra does not meet our requirements.
 we do not need that kind of scalability

 Moreover its future is uncertain and they are trying to incubate it into
 Solr


 On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote:

 Yet another possibility: http://wiki.apache.org/incubator/Cassandra

 It at least claims to be scalable, no personal experience.

 --
 Sami Siren

 Noble Paul ??? ?? wrote:

 Another persistence solution is ehcache with diskstore. It even has
 replication

 I have never used  ehcache . So I cannot comment on it

 any comments?

 --Noble

 On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
 [EMAIL PROTECTED] wrote:


 On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:


 On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:



 The code can be written against JDBC. But we need to test the DDL and
 data types on al the supported DBs

 But , which one would we like to ship with Solr as a default option?


 Why do we need a default option?  Is this something that is intended
 to
 be
 on by default?  Or, do you mean just to have one for unit tests to
 work?


 Default does not mean that it is enabled bby default. But if it is
 enabled I can have defaults for stuff like driver, url , DDL etc. And
 the user may not need to provide an extra jar


 I don't know if it is still the case, but I often find embedded dbs to
 be
 quite annoying since you often can't connect to them from other
 clients
 outside of the JVM which makes debugging harder.  Of course, maybe I
 just
 don't know the tricks to do it.  Derby is one DB that you can still
 connect
 to even when it is embedded.


 Embedded is the best bet for us because of performance reasons and
 zero management.
 The users can still read the data through Solr itself .


 Also, whatever is chosen needs to scale to millions of documents, and
 I
 wonder about an embedded DB doing that.  I also have a hard time
 believing
 that both a DB w/ millions of docs and Solr can live on the same
 machine,
 which is presumably what an embedded DB must do.  Presumably, it also
 needs
 to be able to be replicated, right?


 millions of docs.?
 then you must configure a remote DB for storage reasons
 and must manage the replication separately




 H2 looks impressive. the jar (small)  is just 667KB and the memory
 footprint is small too
 --Noble

 On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED]
 wrote:


 check http://www.h2database.com/  in my view the best embedded DB
 out
 there.

 from the maker of HSQLDB...  is second round.

 However, from anything solr, I would hope it would just rely on
 JDBC.


 On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:



 HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
 go
 beyond
 that without a commit.

 On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
 [EMAIL PROTECTED]wrote:



 Isn't HSQLDB an option? Its performance ranges a lot depending on
 the
 volume of data and queries, but otherwise the license looks
 BSDish.

 http://hsqldb.org/web/hsqlLicense.html

 Dawid



 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul


 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ













 --
 --Noble Paul











 --
 

[jira] Assigned: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport

2008-12-04 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-893:
--

Assignee: Shalin Shekhar Mangar

 Unable to delete documents via SQL and deletedPkQuery with deltaimport
 --

 Key: SOLR-893
 URL: https://issues.apache.org/jira/browse/SOLR-893
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Dan Rosher
Assignee: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-893.patch, SOLR-893.patch


 DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator 
 for the modified rows, but when it comes time to call 
 entityProcessor.nextDeletedRowKey, this is skipped as although no rows are 
 returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is 
 still not null

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-887) HTMLStripTransformer for DIH

2008-12-04 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-887.


Resolution: Fixed

Committed revision 723410.

Thanks Ahmed!

I didn't want to delay committing this fine contribution :) We can add more 
capabilities through another issue if needed.

 HTMLStripTransformer for DIH
 

 Key: SOLR-887
 URL: https://issues.apache.org/jira/browse/SOLR-887
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Ahmed Hammad
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.4

 Attachments: patch-887.patch, SOLR-887.patch


 A Transformer implementation for DIH which strip off HTML tags using the Solr 
 class org.apache.solr.analysis.HTMLStripReader
 This is useful in case you don't need this HTML tags anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport

2008-12-04 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653452#action_12653452
 ] 

Shalin Shekhar Mangar commented on SOLR-893:


Thanks for the patch Dan.

{code}
if(modifiedRow.get(entity.pk) == row.get(entity.pk)){
{code}
Wouldn't this need an equals check?

 Unable to delete documents via SQL and deletedPkQuery with deltaimport
 --

 Key: SOLR-893
 URL: https://issues.apache.org/jira/browse/SOLR-893
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Dan Rosher
Assignee: Shalin Shekhar Mangar
 Fix For: 1.3

 Attachments: SOLR-893.patch, SOLR-893.patch


 DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator 
 for the modified rows, but when it comes time to call 
 entityProcessor.nextDeletedRowKey, this is skipped as although no rows are 
 returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is 
 still not null

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: logo contest

2008-12-04 Thread Ryan McKinley


On Dec 4, 2008, at 1:16 PM, Chris Hostetter wrote:



: 1.  Have solr committers vote to accept:
:  
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png

The process as outlined on the wiki was that the commiters should  
have a
ranked prefrence vote, after considering the point totals from the  
first
vote. (with the added caveat that a -1 veto needs to be allowed  
since it's

a vote to commit a change to the project)

Considering the community prefrences expressed, I suggest that the
committers hold a vote of the high scoring entries.  Picking a score
of 10 as the cut off, that would give us 10 entries to vote on


right, but what should be the role of committers voting in the second  
round? Is it:


1. Rank the entries the committers like best
 or
2. Rank the entries the committers think best represent the community  
preferences.


My understanding of the purpose of the second round is to interpret  
the results of the community poll and cast a binding VOTE.  I think we  
should either have committers vote on the community intent is or re- 
run the poll with the full community, since deciphering the #2 choice  
is unclear.


As yonik said fun fun fun

ryan




Re: logo contest

2008-12-04 Thread Guillaume Smet
On Thu, Dec 4, 2008 at 7:34 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 The methodology will very likely determine the outcome here, with

 https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
 https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg

 Being the likely two candidates for winning.  My guess is that
 narrowing to the two most popular options first would make #2 the
 winner, while voting on the top 10 (w/o any strategy for winning)
 would make #1 the winner.

+1.

All apache_solr_c_red.jpg flavoured logos have a total score of 94.
That should be taken into account IMHO and we should reduce the number
of choices for these ones.

-- 
Guillaume


[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling

2008-12-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653484#action_12653484
 ] 

Yonik Seeley commented on SOLR-799:
---

Why not plug in an entirely new chain?  That is one of the way it would be done 
for users of this component, right?

  updateRequestProcessorChain name=hash [...]

And then in the test send in update.processor=hash as a parameter.

 Add support for hash based exact/near duplicate document handling
 -

 Key: SOLR-799
 URL: https://issues.apache.org/jira/browse/SOLR-799
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Mark Miller
Priority: Minor
 Attachments: SOLR-799.patch, SOLR-799.patch, SOLR-799.patch


 Hash based duplicate document detection is efficient and allows for blocking 
 as well as field collapsing. Lets put it into solr. 
 http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Cleaning up a Few things

2008-12-04 Thread Ryan McKinley

So do we want to move forward on this?

IIUC, we all agree it should happen, the issues are just what the  
specific names should be.


We have a few components:
1. 'common' code that does not depend on anything (even lucene)
2. 'client' (solrj) code that depends on 'common'
3. 'server' (solr) code that depends on #1 and #2
4. webapp code that depends on everything + javax.servlet
  4.a -- embedded solrj code

While we could separate this into 4 jar files (in maven that might be  
a good idea), I think two jar files makes the most sense:


solr-{solrj/client}.jar = #1 + #2
solr-{server?}.jar = #3 + #4

In my view the most reasonable jar file names would be:
 solr-solrj-1.x.jar
 solr-1.x.jar

Alternativly, this could be:
 solr-client-1.x.jar
 solr-server-1.x.jar

I like the names that avoid using 'client' and 'server' since it gets  
a bit strange when you say the server depends on the client.


Even if we package as two jar files, I think we should have 4 src  
directories to keep the dependancies clean:


Ideally this would be:
/src/main/java/common
/src/main/java/solrj
/src/main/java/solr
/src/main/java/web
/src/main/webapp/... jsp stuff here

However that my be more pain for existing patches then it is worth.   
With that in mind I suggest:

/src/common
/src/solrj
/src/java (no change)
/src/webapp/src (no change)

thoughts?

ryan





On Nov 24, 2008, at 3:16 PM, Grant Ingersoll wrote:

I was wondering what people thought of the following things that  
have been bothering me, off and on, for a while.


1. Let's bring SolrJ into the core and have, as part of the release  
packaging, a target that builds a standalone SolrJ jar for  
distribution.  Right now, we have circular dependencies between the  
core and SolrJ such that I think it makes it painful to startup a  
project in Eclipse or IntelliJ which thus makes it just that little  
bit more difficult for new people to understand and contribute to  
Solr.  Besides, SolrJ is used by distributed search and is thus core


2.  Likewise, let's refactor the appropriate servlet dependencies  
such that it is in the core lib, but excluded from packaging, and  
then utilized/copied out to the example where needed.   I think  
these are just the servlet apis used by the webapp part of the code.


The goal of both 1 and 2 is to have the core only depend on the lib  
directory for dependencies such that people need only point their  
IDE at the core/lib directory to get up and compiling/contributing,  
etc.


I also think we could stand to simplify the example directory quite  
a bit.  Not quite sure what to do there just yet.  While the  
original example is still pretty easy to use, I think it's  
confused by the proliferation (of which I am guilty) of other  
examples that are thrown into the directory.


Thoughts?

Cheers,
Grant




logging revisited...

2008-12-04 Thread Ryan McKinley

While I'm on a roll tossing stuff out there

Since SOLR-560, solr depends on SLF4j as the logging interface.   
However since we also depend on HttpClient we *also* depend on commons- 
logging.  This is a strange.  Our maven artifacts now depend on two  
logging frameworks!


However the good folks at SLF4j have a nice solution -- a drop in  
replacement for commons-logging that uses slf4j.


HttpClient discussed switching to SLF4j for version 4.  They decided  
not to because the slfj4 drop-in replacement gives their users even  
more options.  In Droids we had the same discussion, and now use  
commons-logging API.


So, with that in mind I think we should consider using the commons- 
logging API and shipping the .war file with the slf4j drop in  
replacement.  The behavior will be identical and their will be one  
fewer libraries.  The loss is the potential to use some of slf4j's  
more advanced logging features, but I don't see us taking advantage of  
that anyway.


ryan











Re: logging revisited...

2008-12-04 Thread Erik Hatcher

LOL!


On Dec 4, 2008, at 4:43 PM, Ryan McKinley wrote:


While I'm on a roll tossing stuff out there

Since SOLR-560, solr depends on SLF4j as the logging interface.   
However since we also depend on HttpClient we *also* depend on  
commons-logging.  This is a strange.  Our maven artifacts now depend  
on two logging frameworks!


However the good folks at SLF4j have a nice solution -- a drop in  
replacement for commons-logging that uses slf4j.


HttpClient discussed switching to SLF4j for version 4.  They decided  
not to because the slfj4 drop-in replacement gives their users even  
more options.  In Droids we had the same discussion, and now use  
commons-logging API.


So, with that in mind I think we should consider using the commons- 
logging API and shipping the .war file with the slf4j drop in  
replacement.  The behavior will be identical and their will be one  
fewer libraries.  The loss is the potential to use some of slf4j's  
more advanced logging features, but I don't see us taking advantage  
of that anyway.


ryan













Re: logo contest

2008-12-04 Thread Chris Hostetter

: All apache_solr_c_red.jpg flavoured logos have a total score of 94.
: That should be taken into account IMHO and we should reduce the number
: of choices for these ones.

To re-iterate a comment I made in SOLR-84 that wouldn't be fair to the 
people who have been submitting ideas and then retracting them and 
resubmitting variations based on feedback.

People were told many, MANY, times throughout the process that submiting 
multiple variant entries would risk diluting the votes.  One of the 
purposes of the long period for submissions was to give people time to 
post ideas, get feedback, and then tweak submissions and people who did 
that shouldn't be excluded from the final vote for following the rules. 
(sslogo-solr-finder2.0.png is a prime example of this)








-Hoss



Re: logo contest

2008-12-04 Thread Chris Hostetter

: Being the likely two candidates for winning.  My guess is that
: narrowing to the two most popular options first would make #2 the
: winner, while voting on the top 10 (w/o any strategy for winning)
: would make #1 the winner.

limiting to only voting for the top 2 seems unrepresentative since more   
then one apache_solr_c_red.jpg variant tied for 2nd.
 
: fun, fun.  So people who want one of these options to win should vote
: only for that option, really.

Perhaps instead of just ranking top 5, we should ask committers to
rank all of the choices on the final ballot to eliminate the 
strategy factor you are refering to ... i think we can trust all 
committers to understand this, but if someone botches it (or refuses?) 
we'll just shift the number of points each item earns down by the 
appropraite number (so if you want your 1st rank to earn 10 
points, you must list all 10, if you only list 4 then your top ranked item 
only earns 4 points)
 
that won't violate anything in the rules as orriginally spelled out, and 
should help take into account the variant score dilution. (even though i 
don't think we should be overly accomidating this seems fair)

-Hoss



Re: logo contest

2008-12-04 Thread Chris Hostetter
: right, but what should be the role of committers voting in the second round?
: Is it:
: 
: 1. Rank the entries the committers like best
:  or
: 2. Rank the entries the committers think best represent the community
: preferences.
: 
: My understanding of the purpose of the second round is to interpret the
: results of the community poll and cast a binding VOTE.  I think we should

committers should cast their votes as they feel appropriate to best serve 
the interests of the community -- it's not really different then voting on 
an implementation approach for a feature, or what logging framework to 
use, or a decisison to switch from java 1.5 to 1.6 ... we have to make a 
subjective decision based on the feedback we've observed from the 
community as a whole (with solr-logo-results.html serving as our cliff 
notes)


-Hoss



Re: logo contest

2008-12-04 Thread Chris Hostetter

: Hoss may lay down the rules on us, but if he doesn't (or if hes in a good mood

For the record: i'm (almost) always i na good mood -- it's just hard to 
tell because i spell like an angry unibomber wanna-be and i have a moral 
objection to using emoticons so my email based sarcasim is very, very, 
dry.


-Hoss



[jira] Created: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser

2008-12-04 Thread Chris Harris (JIRA)
Solr Query Parser Plugin for Mark Miller's Qsol Parser
--

 Key: SOLR-896
 URL: https://issues.apache.org/jira/browse/SOLR-896
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Chris Harris


An extremely basic plugin to get the Qsol query parser 
(http://www.myhardshadow.com/qsol.php) working in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: logo contest

2008-12-04 Thread Mike Klaas

On 4-Dec-08, at 2:33 PM, Chris Hostetter wrote:



: Being the likely two candidates for winning.  My guess is that
: narrowing to the two most popular options first would make #2 the
: winner, while voting on the top 10 (w/o any strategy for winning)
: would make #1 the winner.

limiting to only voting for the top 2 seems unrepresentative since  
more

then one apache_solr_c_red.jpg variant tied for 2nd.

: fun, fun.  So people who want one of these options to win should  
vote

: only for that option, really.

Perhaps instead of just ranking top 5, we should ask committers to
rank all of the choices on the final ballot to eliminate the
strategy factor you are refering to ... i think we can trust all
committers to understand this, but if someone botches it (or refuses?)
we'll just shift the number of points each item earns down by the
appropraite number (so if you want your 1st rank to earn 10
points, you must list all 10, if you only list 4 then your top  
ranked item

only earns 4 points)


Eliminating strategic voting merely biases the outcome toward the logo  
without the vote splitting problem.  That is no solution.
It is better to allow strategic voting, as that is the only way for  
voters to express certain preferences in this system.


I would personally prefer more of an elimination-style vote (i.e.,  
STV).  Each voter lists the logos they prefer, in order.  The logos  
are ranked by first place votes.  The last in the rank is eliminated  
from the contest, and anyone who had that logo as their first-place  
vote has their vote transferred to the next logo on the list, if any.   
Iterate until two logos remain.  There is no danger of vote-splitting  
and the outcome maximizes global welfare in terms of binary  
preferences (well, probably not, due to Arrow's theorem, but it does a  
good job regardless).


-Mike


Re: logging revisited...

2008-12-04 Thread Chris Hostetter

: Subject: logging revisited...

I'm starting to think Ryan woke up today and asked himself what's the 
best way to screw with Hoss on his day off when he's only casually 
skimming email?

: So, with that in mind I think we should consider using the commons-logging API
: and shipping the .war file with the slf4j drop in replacement.  The behavior
: will be identical and their will be one fewer libraries.  The loss is the
: potential to use some of slf4j's more advanced logging features, but I don't
: see us taking advantage of that anyway.

so if i'm understanding your suggestion correctly:

1) we change all of the logging calls in solr to compile against the 
commons-logging API. 
2) we do *not* ship with the commons-logging api. 
3) we ship with an slf4j provided jar that implements the commons-logging 
api, funnels the log messages through slf4j and uses java.util.logging as 
it's output by default.
4) people who want to configure solr logging via some other favorite 
logging framework (log4j, etc...) can still add another magic slf4j jar to 
make slf4j write to their framework of choice instead of 
java.util.logging.

...do i have that correctly?

I feel dirty just thinking about this.

I think i may just abstain from any and all current or future discussions 
or decisions about logging.  I'm really not that old, but I feel like I 
age 5 years every time the topic comes up.



-Hoss



[jira] Updated: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser

2008-12-04 Thread Chris Harris (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Harris updated SOLR-896:
--

Attachment: SOLR-896.patch

I don't know if this first stab will be useful to anyone else or not, but it 
might be slightly easier to get started with than writing your own. Limitations 
include:

* No ability to configure qsol (even though qsol is highly configurable) -- 
you're stuck with the defaults
* This doesn't alter qsol itself at all, so you don't get support for certain 
Solr goodies, like function queries

Usage:

* This patch creates solrroot/contrib/qsol.
* Download qsol from the qsol home page and put qsol jar into 
solrroot/contrib/qsol/lib
* cd solrroot/contrib/qsol
* Run ant (no args needed) to create the qsol Solr plugin 
(solrroot/contrib/qsol/build/apache-solr-qsol-1.4-dev.jar or some such)
* To deploy, copy both the qsol Solr plugin jar and qsol.jar to your solr lib 
directory. In the example jetty setup that comes with solr, that should be 
solrroot/example/solr/lib/. In a multicore setup, you can specify where the 
lib directory is in solr.xml.


 Solr Query Parser Plugin for Mark Miller's Qsol Parser
 --

 Key: SOLR-896
 URL: https://issues.apache.org/jira/browse/SOLR-896
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Chris Harris
 Attachments: SOLR-896.patch


 An extremely basic plugin to get the Qsol query parser 
 (http://www.myhardshadow.com/qsol.php) working in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser

2008-12-04 Thread Chris Harris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653535#action_12653535
 ] 

ryguasu edited comment on SOLR-896 at 12/4/08 3:06 PM:


I don't know if this first stab will be useful to anyone else or not, but it 
might be slightly easier to get started with than writing your own. Limitations 
include:

* No ability to configure qsol (even though qsol is highly configurable) -- 
you're stuck with the defaults
* This doesn't alter qsol itself at all, so you don't get support for certain 
Solr goodies, like function queries

Usage:

* This patch creates solrroot/contrib/qsol.
* Download qsol from the qsol home page and put qsol jar into 
solrroot/contrib/qsol/lib
* cd solrroot/contrib/qsol
* Run ant (no args needed) to create the qsol Solr plugin 
(solrroot/contrib/qsol/build/apache-solr-qsol-1.4-dev.jar or some such)
* To deploy, copy both the qsol Solr plugin jar and qsol.jar to your solr lib 
directory. In the example jetty setup that comes with solr, that should be 
solrroot/example/solr/lib/. In a multicore setup, you can specify where the 
lib directory is in solr.xml.
* There are a few different ways to make qsol accessible from Solr now. One is 
to add queryParser name=qsol 
class=org.apache.solr.search.QsolQParserPlugin/ to your solrconfig.xml, and 
then to prepend {!qsol} to your queries URLs, e.g. ...?q={!qsol}term1 | 
term2. See http://wiki.apache.org/solr/SolrPlugins for more info.


  was (Author: ryguasu):
I don't know if this first stab will be useful to anyone else or not, but 
it might be slightly easier to get started with than writing your own. 
Limitations include:

* No ability to configure qsol (even though qsol is highly configurable) -- 
you're stuck with the defaults
* This doesn't alter qsol itself at all, so you don't get support for certain 
Solr goodies, like function queries

Usage:

* This patch creates solrroot/contrib/qsol.
* Download qsol from the qsol home page and put qsol jar into 
solrroot/contrib/qsol/lib
* cd solrroot/contrib/qsol
* Run ant (no args needed) to create the qsol Solr plugin 
(solrroot/contrib/qsol/build/apache-solr-qsol-1.4-dev.jar or some such)
* To deploy, copy both the qsol Solr plugin jar and qsol.jar to your solr lib 
directory. In the example jetty setup that comes with solr, that should be 
solrroot/example/solr/lib/. In a multicore setup, you can specify where the 
lib directory is in solr.xml.

  
 Solr Query Parser Plugin for Mark Miller's Qsol Parser
 --

 Key: SOLR-896
 URL: https://issues.apache.org/jira/browse/SOLR-896
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Chris Harris
 Attachments: SOLR-896.patch


 An extremely basic plugin to get the Qsol query parser 
 (http://www.myhardshadow.com/qsol.php) working in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: logging revisited...

2008-12-04 Thread Will Johnson
To a certain extent SLF4j makes this decision a fairly small one, namely
what API do you want to code to inside SOLR and what jars do you want to
ship as a part of the distribution.  It doesn't really matter if you pick
commons-logging, log4j or slf4j; all have drop in replacements via SLF4j.
They also have one for java.util.logging however it requires custom code to
activate since you can't replace java.* classes.  End users get to do pretty
much whatever they want as far as logging goes if you use SLF4j.

SLF4j has also updated their 'legacy' page since the last time I looked
which was the ~last time this came up:

http://www.slf4j.org/legacy.html

We choose to code against slf4j APIs as it seemed like it was where things
were going (including solr) and gave us and our customers the ability to
switch to something else with minimal effort.  We also ship log4j+config
jars by default because it had the richest config/appender set at the time
however the logback project seems like it might be catching up.  (good thing
we can switch with no code changes)

- will



-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 04, 2008 4:44 PM
To: solr-dev@lucene.apache.org
Subject: logging revisited...

While I'm on a roll tossing stuff out there

Since SOLR-560, solr depends on SLF4j as the logging interface.   
However since we also depend on HttpClient we *also* depend on commons- 
logging.  This is a strange.  Our maven artifacts now depend on two  
logging frameworks!

However the good folks at SLF4j have a nice solution -- a drop in  
replacement for commons-logging that uses slf4j.

HttpClient discussed switching to SLF4j for version 4.  They decided  
not to because the slfj4 drop-in replacement gives their users even  
more options.  In Droids we had the same discussion, and now use  
commons-logging API.

So, with that in mind I think we should consider using the commons- 
logging API and shipping the .war file with the slf4j drop in  
replacement.  The behavior will be identical and their will be one  
fewer libraries.  The loss is the potential to use some of slf4j's  
more advanced logging features, but I don't see us taking advantage of  
that anyway.

ryan












Re: logging revisited...

2008-12-04 Thread Ryan McKinley


On Dec 4, 2008, at 5:55 PM, Chris Hostetter wrote:



: Subject: logging revisited...

I'm starting to think Ryan woke up today and asked himself what's the
best way to screw with Hoss on his day off when he's only casually
skimming email?


If I knew you had the day off, I would ask about moving to jdk 1.6!




: So, with that in mind I think we should consider using the commons- 
logging API
: and shipping the .war file with the slf4j drop in replacement.   
The behavior
: will be identical and their will be one fewer libraries.  The loss  
is the
: potential to use some of slf4j's more advanced logging features,  
but I don't

: see us taking advantage of that anyway.

so if i'm understanding your suggestion correctly:

1) we change all of the logging calls in solr to compile against the
commons-logging API.
2) we do *not* ship with the commons-logging api.
3) we ship with an slf4j provided jar that implements the commons- 
logging
api, funnels the log messages through slf4j and uses  
java.util.logging as

it's output by default.
4) people who want to configure solr logging via some other favorite
logging framework (log4j, etc...) can still add another magic slf4j  
jar to

make slf4j write to their framework of choice instead of
java.util.logging.

...do i have that correctly?

I feel dirty just thinking about this.


I'm afraid so, but I'll describe it differently so it does not sound  
as crazy.


1. We compile everything against the commons-logging API (JCL)

2. We ship the .war file with a JCL implementation that behaves  
identical to solr-1.3.  Currently the best option is: jcl-over- 
slf4j.jar + slf4j-jdk14.


3. Anyone using the solr.jar could use JCL or SLF4j magic




I think i may just abstain from any and all current or future  
discussions
or decisions about logging.  I'm really not that old, but I feel  
like I

age 5 years every time the topic comes up.




I would have left well enough alone, but I am working with maven  
dependencies now and the duplicate logging frameworks feels a bit  
odd.  I am happy with any choice here, but figured I should bring it  
up before it is 'cooked' in to an official release.


I am happily to stuff the genie back in the bottle, but i don't think  
that puts years back in the bank.


ryan



Solr 1.4-SNAPSHOT pom pointing to invalid commons-io?

2008-12-04 Thread jayson.minard

Solr 1.4-SNAPSHOT seems to now be requiring:

  3) org.apache.commons:commons-io:jar:1.4

which doesn't appear to exist on public repositories. 
commons-io:commons-io:1.4 does exist.

If you clear your repository and build using it, the build fails.

Before entering a bug, is anyone else seeing that?

-- Jayson
-- 
View this message in context: 
http://www.nabble.com/Solr-1.4-SNAPSHOT-pom-pointing-to-invalid-commons-io--tp20845138p20845138.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: Solr 1.4-SNAPSHOT pom pointing to invalid commons-io?

2008-12-04 Thread Ryan McKinley

dooh -- my fault.  I had one in my local repos, but its non-standard

I'll make the fix in just a sec...
thanks
ryan


On Dec 4, 2008, at 6:51 PM, jayson.minard wrote:



Solr 1.4-SNAPSHOT seems to now be requiring:

 3) org.apache.commons:commons-io:jar:1.4

which doesn't appear to exist on public repositories.
commons-io:commons-io:1.4 does exist.

If you clear your repository and build using it, the build fails.

Before entering a bug, is anyone else seeing that?

-- Jayson
--
View this message in context: 
http://www.nabble.com/Solr-1.4-SNAPSHOT-pom-pointing-to-invalid-commons-io--tp20845138p20845138.html
Sent from the Solr - Dev mailing list archive at Nabble.com.





Re: Solr 1.4-SNAPSHOT pom pointing to invalid commons-io?

2008-12-04 Thread Ryan McKinley

fixed in rev 723554


On Dec 4, 2008, at 6:51 PM, jayson.minard wrote:



Solr 1.4-SNAPSHOT seems to now be requiring:

 3) org.apache.commons:commons-io:jar:1.4

which doesn't appear to exist on public repositories.
commons-io:commons-io:1.4 does exist.

If you clear your repository and build using it, the build fails.

Before entering a bug, is anyone else seeing that?

-- Jayson
--
View this message in context: 
http://www.nabble.com/Solr-1.4-SNAPSHOT-pom-pointing-to-invalid-commons-io--tp20845138p20845138.html
Sent from the Solr - Dev mailing list archive at Nabble.com.





Re: Can we use Berkley DB java in Solr

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Dec 5, 2008 at 12:57 AM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Thu, Dec 4, 2008 at 11:47 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 I tried that and the solution looked so clumsy .
 I need to commit the to read anything was making things difficult

 In a high update environment, most documents would be exposed to an
 open reader with no need to commit or reopen the index to retrieve the
 stored fields.
 In a way, solving the more realtime update issue removes the necessity
 for this altogether.

 Is Lucene write much faster than DB (embedded) writes?

 More to the point, we're already doing the Lucene write (for the most
 part) anyway, and the DB write is overhead to the indexing process.
Considering the fact that the extra Lucene write is over and above the
normal indexing I guess we must compare the cost of indexing of 1
document in luven vs cost of writing one row in a DB.
DB gives me an option of writing to a remote m/c . Thus freeing up my
local disk. Lucene has to write to Local disk

In DB I am writing a byte[] (which is quite compressed) . Lucene may
end up writing more data. So more disk I/O (I am just giving a theory
).
Does lucene allow me to write byte[]. ?

The Lucene API itself is more complex for this kind of operations.
(disclaimer: I do not know a whole lot of it) .

Moreover this is just an UpdateRequestProcessor (No changes to the
core). We can have a Lucene based one also.

Most of the users would not use this feature (the perf sensistive
users).The ones who do random updates will not notice it.
The only problem is for users who index heavily and still want to enable this.



 -Yonik

 On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 A database, just to store uncommitted documents in case they might be
 updated, seems like it will have a pretty major impact on indexing
 performance.  A lucene-only implementation would seem to be much
 lighter on resources.

 -Yonik

 On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 The solution will be an UpdateRequestProcessor (which itself is
 pluggable).I am implementing a JDBC based one. I'll test with H2 and
 MySql (and may be Derby)

 We will ship the H2 (embedded) jar






 On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
 Again, I would hope that solr builds a storage agnostic solution.

 As long as we have a simple interface to load/store documents, it should 
 be
 easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.

 ryan


 On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 Cassandra does not meet our requirements.
 we do not need that kind of scalability

 Moreover its future is uncertain and they are trying to incubate it into
 Solr


 On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote:

 Yet another possibility: http://wiki.apache.org/incubator/Cassandra

 It at least claims to be scalable, no personal experience.

 --
 Sami Siren

 Noble Paul ??? ?? wrote:

 Another persistence solution is ehcache with diskstore. It even has
 replication

 I have never used  ehcache . So I cannot comment on it

 any comments?

 --Noble

 On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ??
 [EMAIL PROTECTED] wrote:


 On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:


 On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote:



 The code can be written against JDBC. But we need to test the DDL 
 and
 data types on al the supported DBs

 But , which one would we like to ship with Solr as a default option?


 Why do we need a default option?  Is this something that is intended
 to
 be
 on by default?  Or, do you mean just to have one for unit tests to
 work?


 Default does not mean that it is enabled bby default. But if it is
 enabled I can have defaults for stuff like driver, url , DDL etc. And
 the user may not need to provide an extra jar


 I don't know if it is still the case, but I often find embedded dbs 
 to
 be
 quite annoying since you often can't connect to them from other
 clients
 outside of the JVM which makes debugging harder.  Of course, maybe I
 just
 don't know the tricks to do it.  Derby is one DB that you can still
 connect
 to even when it is embedded.


 Embedded is the best bet for us because of performance reasons and
 zero management.
 The users can still read the data through Solr itself .


 Also, whatever is chosen needs to scale to millions of documents, and
 I
 wonder about an embedded DB doing that.  I also have a hard time
 believing
 that both a DB w/ millions of docs and Solr can live on the same
 machine,
 which is presumably what an embedded DB must do.  Presumably, it also
 needs
 to be able to be replicated, right?


 millions of docs.?
 then you must configure a remote DB for storage reasons
 and must manage the replication separately




 H2 looks impressive. the jar (small)  is just 667KB and the memory
 footprint is