Re: Can we use Berkley DB java in Solr
Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul നോബിള് नोब्ळ् wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul -- --Noble Paul
[jira] Updated: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport
[ https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Rosher updated SOLR-893: Attachment: SOLR-893.patch Thanks Paul ... I've made that change ... additionally I've noticed that during any particular delta import you might have both and update/create AND a delete, the current code would not honor the delete hence I've added something to cater for this, and updated the test to confirm. Unable to delete documents via SQL and deletedPkQuery with deltaimport -- Key: SOLR-893 URL: https://issues.apache.org/jira/browse/SOLR-893 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Dan Rosher Fix For: 1.3 Attachments: SOLR-893.patch, SOLR-893.patch DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator for the modified rows, but when it comes time to call entityProcessor.nextDeletedRowKey, this is skipped as although no rows are returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is still not null -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport
[ https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653244#action_12653244 ] rosher edited comment on SOLR-893 at 12/4/08 2:23 AM: -- Thanks Noble ... I've made that change ... additionally I've noticed that during any particular delta import you might have both and update/create AND a delete, the current code would not honor the delete hence I've added something to cater for this, and updated the test to confirm. was (Author: rosher): Thanks Paul ... I've made that change ... additionally I've noticed that during any particular delta import you might have both and update/create AND a delete, the current code would not honor the delete hence I've added something to cater for this, and updated the test to confirm. Unable to delete documents via SQL and deletedPkQuery with deltaimport -- Key: SOLR-893 URL: https://issues.apache.org/jira/browse/SOLR-893 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Dan Rosher Fix For: 1.3 Attachments: SOLR-893.patch, SOLR-893.patch DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator for the modified rows, but when it comes time to call entityProcessor.nextDeletedRowKey, this is skipped as although no rows are returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is still not null -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-894) Distributed Search in combination with fl=score returns inconsistent number of fields
Distributed Search in combination with fl=score returns inconsistent number of fields - Key: SOLR-894 URL: https://issues.apache.org/jira/browse/SOLR-894 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Environment: Setup distributed search Reporter: Mario Klaver Priority: Minor 1) http://localhost:8983/solr/select?indent=trueq=ipod+solr == Returns all configured fields 2) http://localhost:8983/solr/select?indent=trueq=ipod+solrfl=score == Returns all configured fields + score 3) http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr == Returns all configured fields 4) http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solrfl=score == Returns unique ID and score field Result 4) is inconsistent with result 2). Solutions: 1) Request 2) will only return score (in this case, also the java client needs to be updated (query.addScoreField(true)) 2) Request 4) will return all configured fields including score -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: putting UnInvertedField instances in a SolrCache?
Right. - we need a blocking cache to avoid more than one thread attempting to generate, but that can be done outside the SolrCache for now. - prob want to expose the statistics collected... (see logging output of new faceting stuff) - might want a way to dynamically add caches.. but for now adding a magic facetCache that exists even when not on solrconfig.xml is prob easiest (the current solr caches do not get instantiated if they are not in solrconfig.xml - they are seen as optional). -Yonik On Tue, Dec 2, 2008 at 6:27 PM, Chris Hostetter [EMAIL PROTECTED] wrote: recent wiki updates have be looking at UnInvertedField for the first time (i haven't been very good at keeping up with commits the last few months) and i'm wondering about the use of a static Cache multiValuedFieldCache keyed off of SolrIndexSearcher. Lucene-Java is trying to move away from this pattern in FieldCache, and in Solr we already have a nice and robust cache mechanism on each SolrIndexSearcher -- that includes the possibility of doing auto-warming via regenerators -- so why don't we suse that for UnInvertedField? suggested changes... 1) add a new special (as opposed to user) SolrCache instance named facetCache to SolrIndexSearcher (just like filterCache and queryResultCache) where the key is a field name and the value is an UnInvertedField instance. 2) I think the way the special caches are initialized they eist with defaults even if they aren't declared in solrconfig.xml, but if i'm wrong we should consier making facetCache work that way. 2) add a regenerator for facetCache (relatively trivial) 3) remove all of the static cashing code from UnInvertedField thoughts? -Hoss
Re: Can we use Berkley DB java in Solr
Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul
Re: Can we use Berkley DB java in Solr
Cassandra does not meet our requirements. we do not need that kind of scalability Moreover its future is uncertain and they are trying to incubate it into Solr On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote: Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul -- --Noble Paul
logo contest
We have discussed with the apache PRC (public relations committee), and they agree that the top choice in the logo contest should be disqualified for its similarity to the solaris logo. Given the rules agreed upon in http://wiki.apache.org/solr/ LogoContest, the next step is for Solr committers to use the results of the community poll to decide what the official logo should be. I posted the results here: http://people.apache.org/~ryan/solr-logo-results.html If we count a vote that came in 12 hours late, the results are quite different: http://people.apache.org/~ryan/solr-logo-results-late.html Using the direct scoring method agreed upon, the logo with the most points is: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png However, it is tough to gauge the real intent/preference since the vote totals are so low. I see two options: 1. Have solr committers vote to accept: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 2. Have a 'runoff' poll with the top contenders: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Following the rules strictly points to option #1, but I think option #2 may better reflect the original intent of the community poll. Personally, I am happy with any of these options (and logos); I just want to make sure we have a process that everyone feels is/was fair. ryan
Re: Can we use Berkley DB java in Solr
Again, I would hope that solr builds a storage agnostic solution. As long as we have a simple interface to load/store documents, it should be easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation. ryan On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള് नोब्ळ् wrote: Cassandra does not meet our requirements. we do not need that kind of scalability Moreover its future is uncertain and they are trying to incubate it into Solr On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote: Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul -- --Noble Paul
Re: logo contest
I see two options: 1. Have solr committers vote to accept: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 2. Have a 'runoff' poll with the top contenders: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Following the rules strictly points to option #1, but I think option #2 may better reflect the original intent of the community poll. Personally, I am happy with any of these options (and logos); I just want to make sure we have a process that everyone feels is/was fair. I'll add that i have a slight preference for option #1 since it would get this process done with sooner :) ryan
Re: logo contest
Hoss may lay down the rules on us, but if he doesn't (or if hes in a good mood today), +1 on the runoff vote. Ryan McKinley wrote: We have discussed with the apache PRC (public relations committee), and they agree that the top choice in the logo contest should be disqualified for its similarity to the solaris logo. Given the rules agreed upon in http://wiki.apache.org/solr/LogoContest, the next step is for Solr committers to use the results of the community poll to decide what the official logo should be. I posted the results here: http://people.apache.org/~ryan/solr-logo-results.html If we count a vote that came in 12 hours late, the results are quite different: http://people.apache.org/~ryan/solr-logo-results-late.html Using the direct scoring method agreed upon, the logo with the most points is: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png However, it is tough to gauge the real intent/preference since the vote totals are so low. I see two options: 1. Have solr committers vote to accept: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 2. Have a 'runoff' poll with the top contenders: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Following the rules strictly points to option #1, but I think option #2 may better reflect the original intent of the community poll. Personally, I am happy with any of these options (and logos); I just want to make sure we have a process that everyone feels is/was fair. ryan
Re: Can we use Berkley DB java in Solr
The solution will be an UpdateRequestProcessor (which itself is pluggable).I am implementing a JDBC based one. I'll test with H2 and MySql (and may be Derby) We will ship the H2 (embedded) jar On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Again, I would hope that solr builds a storage agnostic solution. As long as we have a simple interface to load/store documents, it should be easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation. ryan On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള് नोब्ळ् wrote: Cassandra does not meet our requirements. we do not need that kind of scalability Moreover its future is uncertain and they are trying to incubate it into Solr On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote: Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul -- --Noble Paul -- --Noble Paul
Re: logo contest
I prefertaking option #1 and not delaying this any further On Thu, Dec 4, 2008 at 9:57 PM, Mark Miller [EMAIL PROTECTED] wrote: Hoss may lay down the rules on us, but if he doesn't (or if hes in a good mood today), +1 on the runoff vote. Ryan McKinley wrote: We have discussed with the apache PRC (public relations committee), and they agree that the top choice in the logo contest should be disqualified for its similarity to the solaris logo. Given the rules agreed upon in http://wiki.apache.org/solr/LogoContest, the next step is for Solr committers to use the results of the community poll to decide what the official logo should be. I posted the results here: http://people.apache.org/~ryan/solr-logo-results.html If we count a vote that came in 12 hours late, the results are quite different: http://people.apache.org/~ryan/solr-logo-results-late.html Using the direct scoring method agreed upon, the logo with the most points is: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png However, it is tough to gauge the real intent/preference since the vote totals are so low. I see two options: 1. Have solr committers vote to accept: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 2. Have a 'runoff' poll with the top contenders: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Following the rules strictly points to option #1, but I think option #2 may better reflect the original intent of the community poll. Personally, I am happy with any of these options (and logos); I just want to make sure we have a process that everyone feels is/was fair. ryan -- --Noble Paul
Re: Can we use Berkley DB java in Solr
A database, just to store uncommitted documents in case they might be updated, seems like it will have a pretty major impact on indexing performance. A lucene-only implementation would seem to be much lighter on resources. -Yonik On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: The solution will be an UpdateRequestProcessor (which itself is pluggable).I am implementing a JDBC based one. I'll test with H2 and MySql (and may be Derby) We will ship the H2 (embedded) jar On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Again, I would hope that solr builds a storage agnostic solution. As long as we have a simple interface to load/store documents, it should be easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation. ryan On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള് नोब्ळ् wrote: Cassandra does not meet our requirements. we do not need that kind of scalability Moreover its future is uncertain and they are trying to incubate it into Solr On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote: Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul -- --Noble Paul -- --Noble Paul
Re: Can we use Berkley DB java in Solr
I tried that and the solution looked so clumsy . I need to commit the to read anything was making things difficult DB provides me 'immediate' reads . I am sure performance will be hit anyway. Is Lucene write much faster than DB (embedded) writes? http://www.h2database.com/html/performance.html On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A database, just to store uncommitted documents in case they might be updated, seems like it will have a pretty major impact on indexing performance. A lucene-only implementation would seem to be much lighter on resources. -Yonik On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: The solution will be an UpdateRequestProcessor (which itself is pluggable).I am implementing a JDBC based one. I'll test with H2 and MySql (and may be Derby) We will ship the H2 (embedded) jar On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Again, I would hope that solr builds a storage agnostic solution. As long as we have a simple interface to load/store documents, it should be easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation. ryan On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള് नोब्ळ् wrote: Cassandra does not meet our requirements. we do not need that kind of scalability Moreover its future is uncertain and they are trying to incubate it into Solr On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote: Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul -- --Noble Paul -- --Noble Paul -- --Noble Paul
Re: logo contest
Ryan McKinley wrote: I see two options: 1. Have solr committers vote to accept: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 2. Have a 'runoff' poll with the top contenders: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Following the rules strictly points to option #1, but I think option #2 may better reflect the original intent of the community poll. I prefer option #2 as well. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: logo contest
I'm with Noble. #1 for me as well for the sake of making a decision and running with a quality logo sooner rather than later. ObBiasTransparency: Steve Stedman,the designer of sslogo* submissions, is a good friend of mine. Awesome dude. He does really nice work. Erik On Dec 4, 2008, at 11:36 AM, Noble Paul നോബിള് नोब्ळ् wrote: I prefertaking option #1 and not delaying this any further On Thu, Dec 4, 2008 at 9:57 PM, Mark Miller [EMAIL PROTECTED] wrote: Hoss may lay down the rules on us, but if he doesn't (or if hes in a good mood today), +1 on the runoff vote. Ryan McKinley wrote: We have discussed with the apache PRC (public relations committee), and they agree that the top choice in the logo contest should be disqualified for its similarity to the solaris logo. Given the rules agreed upon in http://wiki.apache.org/solr/LogoContest , the next step is for Solr committers to use the results of the community poll to decide what the official logo should be. I posted the results here: http://people.apache.org/~ryan/solr-logo-results.html If we count a vote that came in 12 hours late, the results are quite different: http://people.apache.org/~ryan/solr-logo-results-late.html Using the direct scoring method agreed upon, the logo with the most points is: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png However, it is tough to gauge the real intent/preference since the vote totals are so low. I see two options: 1. Have solr committers vote to accept: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png 2. Have a 'runoff' poll with the top contenders: https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Following the rules strictly points to option #1, but I think option #2 may better reflect the original intent of the community poll. Personally, I am happy with any of these options (and logos); I just want to make sure we have a process that everyone feels is/was fair. ryan -- --Noble Paul
[jira] Created: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml
DataImportHandler does not import multiple documents specified in db-data-config.xml Key: SOLR-895 URL: https://issues.apache.org/jira/browse/SOLR-895 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.3.1, 1.4 Reporter: Cameron Pope In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing. Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file Actual behavior: the DataImportHandler stops importing it completes indexing of the first document I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml
[ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cameron Pope updated SOLR-895: -- Attachment: import-multiple-documents.patch This is a patch to DataImporter that causes it to import all documents defined in the config file. There is also a unit test to verify correct behavior. It should apply against the svn trunk without any problems. DataImportHandler does not import multiple documents specified in db-data-config.xml Key: SOLR-895 URL: https://issues.apache.org/jira/browse/SOLR-895 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.3.1, 1.4 Reporter: Cameron Pope Attachments: import-multiple-documents.patch In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing. Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file Actual behavior: the DataImportHandler stops importing it completes indexing of the first document I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml
[ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653366#action_12653366 ] Noble Paul commented on SOLR-895: - why can't this be achieved using multiple root-entities under the same document? DataImportHandler does not import multiple documents specified in db-data-config.xml Key: SOLR-895 URL: https://issues.apache.org/jira/browse/SOLR-895 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.3.1, 1.4 Reporter: Cameron Pope Attachments: import-multiple-documents.patch In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing. Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file Actual behavior: the DataImportHandler stops importing it completes indexing of the first document I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr nightly build failure
On Wed, Dec 3, 2008 at 3:29 PM, Ryan McKinley [EMAIL PROTECTED] wrote: [junit] Tests run: 9, Failures: 1, Errors: 0, Time elapsed: 17.101 sec [junit] Test org.apache.solr.client.solrj.embedded.SolrExampleJettyTest FAILED Any thoughts on this? Things are building fine on my local system and on hudson: http://hudson.zones.apache.org/hudson/job/Solr-trunk/ Where is the test output stored so we can look at what is actually happening? /tmp directory on the lucene zone - but it's too late now, last build succeeded. I'll try and be quicker next time to try and grab the output or someone could try and hack the nightly build script to email the failed test output (only the first 1 or 2 to avoid spamming things perhaps). -Yonik
Re: logo contest
: 1. Have solr committers vote to accept: : https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png The process as outlined on the wiki was that the commiters should have a ranked prefrence vote, after considering the point totals from the first vote. (with the added caveat that a -1 veto needs to be allowed since it's a vote to commit a change to the project) Considering the community prefrences expressed, I suggest that the committers hold a vote of the high scoring entries. Picking a score of 10 as the cut off, that would give us 10 entries to vote on https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png (given the distribution of scores, 10 just seems like a natural cutoff) -Hoss
[jira] Commented: (SOLR-895) DataImportHandler does not import multiple documents specified in db-data-config.xml
[ https://issues.apache.org/jira/browse/SOLR-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653396#action_12653396 ] Cameron Pope commented on SOLR-895: --- I tried moving both root entities under the same document element and specifying 'docRoot=true' for both of them and that appears to work. Thanks. Since I am new to Solr, please forgive me for logging what is probably not a bug at all. Is specifying multiple 'root' entities the envisioned way to solve this problem, or is it a workaround? Just curious and trying to gain a better understanding of the design (I noticed parts of the DataImporter assume multiple Document elements and other parts assume only one), and if so, I'd be happy to update the wiki to include it -- I imagine I am not the only one who has a database schema like this who wants to create an index with Solr. All in all, I have been hugely impressed with Solr and the DataImportHandler - both are incredible pieces of work. Thanks! DataImportHandler does not import multiple documents specified in db-data-config.xml Key: SOLR-895 URL: https://issues.apache.org/jira/browse/SOLR-895 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3, 1.3.1, 1.4 Reporter: Cameron Pope Attachments: import-multiple-documents.patch In our system we have multiple kinds of items that need to be indexed. In the database, they are represented as 'one table per concrete class'. We are using the DataImportHandler to automatically create an index from our database. The db-data-config.xml file that we are using contains two 'Document' elements: one for each class of item that we are indexing. Expected behavior: the DataImportHandler imports items for each 'Document' tag defined in the configuration file Actual behavior: the DataImportHandler stops importing it completes indexing of the first document I am attaching a patch, with a unit test that verifies the correct behavior, it should apply against the trunk without problems. I can also supply a patch against the 1.3 branch if you would like. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: logo contest
The methodology will very likely determine the outcome here, with https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Being the likely two candidates for winning. My guess is that narrowing to the two most popular options first would make #2 the winner, while voting on the top 10 (w/o any strategy for winning) would make #1 the winner. fun, fun. So people who want one of these options to win should vote only for that option, really. -Yonik the two most popular would make the second option win, while expanding it would make fir On Thu, Dec 4, 2008 at 1:16 PM, Chris Hostetter [EMAIL PROTECTED] wrote: : 1. Have solr committers vote to accept: : https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png The process as outlined on the wiki was that the commiters should have a ranked prefrence vote, after considering the point totals from the first vote. (with the added caveat that a -1 veto needs to be allowed since it's a vote to commit a change to the project) Considering the community prefrences expressed, I suggest that the committers hold a vote of the high scoring entries. Picking a score of 10 as the cut off, that would give us 10 entries to vote on https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png (given the distribution of scores, 10 just seems like a natural cutoff) -Hoss
Backwards compatibility
Hi, I was wondering what the backwards-compatibility rules in Solr are? Is it the same as in Lucene, i.e. public and protected APIs can only be changed in a major release (X.Y - (X+1).0) ? I'd like to consolidate the function queries in Solr and Lucene and it's gonna be quite messy if we have to keep all classes in Solr's search/function package around. -Michael
Re: Can we use Berkley DB java in Solr
On Thu, Dec 4, 2008 at 11:47 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: I tried that and the solution looked so clumsy . I need to commit the to read anything was making things difficult In a high update environment, most documents would be exposed to an open reader with no need to commit or reopen the index to retrieve the stored fields. In a way, solving the more realtime update issue removes the necessity for this altogether. Is Lucene write much faster than DB (embedded) writes? More to the point, we're already doing the Lucene write (for the most part) anyway, and the DB write is overhead to the indexing process. -Yonik On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A database, just to store uncommitted documents in case they might be updated, seems like it will have a pretty major impact on indexing performance. A lucene-only implementation would seem to be much lighter on resources. -Yonik On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: The solution will be an UpdateRequestProcessor (which itself is pluggable).I am implementing a JDBC based one. I'll test with H2 and MySql (and may be Derby) We will ship the H2 (embedded) jar On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Again, I would hope that solr builds a storage agnostic solution. As long as we have a simple interface to load/store documents, it should be easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation. ryan On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള് नोब्ळ् wrote: Cassandra does not meet our requirements. we do not need that kind of scalability Moreover its future is uncertain and they are trying to incubate it into Solr On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote: Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is small too --Noble On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley [EMAIL PROTECTED] wrote: check http://www.h2database.com/ in my view the best embedded DB out there. from the maker of HSQLDB... is second round. However, from anything solr, I would hope it would just rely on JDBC. On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond that without a commit. On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss [EMAIL PROTECTED]wrote: Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of data and queries, but otherwise the license looks BSDish. http://hsqldb.org/web/hsqlLicense.html Dawid -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul --
[jira] Assigned: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport
[ https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-893: -- Assignee: Shalin Shekhar Mangar Unable to delete documents via SQL and deletedPkQuery with deltaimport -- Key: SOLR-893 URL: https://issues.apache.org/jira/browse/SOLR-893 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Dan Rosher Assignee: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-893.patch, SOLR-893.patch DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator for the modified rows, but when it comes time to call entityProcessor.nextDeletedRowKey, this is skipped as although no rows are returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is still not null -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-887) HTMLStripTransformer for DIH
[ https://issues.apache.org/jira/browse/SOLR-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-887. Resolution: Fixed Committed revision 723410. Thanks Ahmed! I didn't want to delay committing this fine contribution :) We can add more capabilities through another issue if needed. HTMLStripTransformer for DIH Key: SOLR-887 URL: https://issues.apache.org/jira/browse/SOLR-887 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Ahmed Hammad Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.4 Attachments: patch-887.patch, SOLR-887.patch A Transformer implementation for DIH which strip off HTML tags using the Solr class org.apache.solr.analysis.HTMLStripReader This is useful in case you don't need this HTML tags anyway. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-893) Unable to delete documents via SQL and deletedPkQuery with deltaimport
[ https://issues.apache.org/jira/browse/SOLR-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653452#action_12653452 ] Shalin Shekhar Mangar commented on SOLR-893: Thanks for the patch Dan. {code} if(modifiedRow.get(entity.pk) == row.get(entity.pk)){ {code} Wouldn't this need an equals check? Unable to delete documents via SQL and deletedPkQuery with deltaimport -- Key: SOLR-893 URL: https://issues.apache.org/jira/browse/SOLR-893 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Dan Rosher Assignee: Shalin Shekhar Mangar Fix For: 1.3 Attachments: SOLR-893.patch, SOLR-893.patch DocBuilder calls entityProcessor.nextModifiedRowKey which sets up rowIterator for the modified rows, but when it comes time to call entityProcessor.nextDeletedRowKey, this is skipped as although no rows are returned from nextModifiedRowKey, rowIterator in SqlEntityProcessor.java is still not null -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: logo contest
On Dec 4, 2008, at 1:16 PM, Chris Hostetter wrote: : 1. Have solr committers vote to accept: : https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png The process as outlined on the wiki was that the commiters should have a ranked prefrence vote, after considering the point totals from the first vote. (with the added caveat that a -1 veto needs to be allowed since it's a vote to commit a change to the project) Considering the community prefrences expressed, I suggest that the committers hold a vote of the high scoring entries. Picking a score of 10 as the cut off, that would give us 10 entries to vote on right, but what should be the role of committers voting in the second round? Is it: 1. Rank the entries the committers like best or 2. Rank the entries the committers think best represent the community preferences. My understanding of the purpose of the second round is to interpret the results of the community poll and cast a binding VOTE. I think we should either have committers vote on the community intent is or re- run the poll with the full community, since deciphering the #2 choice is unclear. As yonik said fun fun fun ryan
Re: logo contest
On Thu, Dec 4, 2008 at 7:34 PM, Yonik Seeley [EMAIL PROTECTED] wrote: The methodology will very likely determine the outcome here, with https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg Being the likely two candidates for winning. My guess is that narrowing to the two most popular options first would make #2 the winner, while voting on the top 10 (w/o any strategy for winning) would make #1 the winner. +1. All apache_solr_c_red.jpg flavoured logos have a total score of 94. That should be taken into account IMHO and we should reduce the number of choices for these ones. -- Guillaume
[jira] Commented: (SOLR-799) Add support for hash based exact/near duplicate document handling
[ https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653484#action_12653484 ] Yonik Seeley commented on SOLR-799: --- Why not plug in an entirely new chain? That is one of the way it would be done for users of this component, right? updateRequestProcessorChain name=hash [...] And then in the test send in update.processor=hash as a parameter. Add support for hash based exact/near duplicate document handling - Key: SOLR-799 URL: https://issues.apache.org/jira/browse/SOLR-799 Project: Solr Issue Type: New Feature Components: update Reporter: Mark Miller Priority: Minor Attachments: SOLR-799.patch, SOLR-799.patch, SOLR-799.patch Hash based duplicate document detection is efficient and allows for blocking as well as field collapsing. Lets put it into solr. http://wiki.apache.org/solr/Deduplication -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Cleaning up a Few things
So do we want to move forward on this? IIUC, we all agree it should happen, the issues are just what the specific names should be. We have a few components: 1. 'common' code that does not depend on anything (even lucene) 2. 'client' (solrj) code that depends on 'common' 3. 'server' (solr) code that depends on #1 and #2 4. webapp code that depends on everything + javax.servlet 4.a -- embedded solrj code While we could separate this into 4 jar files (in maven that might be a good idea), I think two jar files makes the most sense: solr-{solrj/client}.jar = #1 + #2 solr-{server?}.jar = #3 + #4 In my view the most reasonable jar file names would be: solr-solrj-1.x.jar solr-1.x.jar Alternativly, this could be: solr-client-1.x.jar solr-server-1.x.jar I like the names that avoid using 'client' and 'server' since it gets a bit strange when you say the server depends on the client. Even if we package as two jar files, I think we should have 4 src directories to keep the dependancies clean: Ideally this would be: /src/main/java/common /src/main/java/solrj /src/main/java/solr /src/main/java/web /src/main/webapp/... jsp stuff here However that my be more pain for existing patches then it is worth. With that in mind I suggest: /src/common /src/solrj /src/java (no change) /src/webapp/src (no change) thoughts? ryan On Nov 24, 2008, at 3:16 PM, Grant Ingersoll wrote: I was wondering what people thought of the following things that have been bothering me, off and on, for a while. 1. Let's bring SolrJ into the core and have, as part of the release packaging, a target that builds a standalone SolrJ jar for distribution. Right now, we have circular dependencies between the core and SolrJ such that I think it makes it painful to startup a project in Eclipse or IntelliJ which thus makes it just that little bit more difficult for new people to understand and contribute to Solr. Besides, SolrJ is used by distributed search and is thus core 2. Likewise, let's refactor the appropriate servlet dependencies such that it is in the core lib, but excluded from packaging, and then utilized/copied out to the example where needed. I think these are just the servlet apis used by the webapp part of the code. The goal of both 1 and 2 is to have the core only depend on the lib directory for dependencies such that people need only point their IDE at the core/lib directory to get up and compiling/contributing, etc. I also think we could stand to simplify the example directory quite a bit. Not quite sure what to do there just yet. While the original example is still pretty easy to use, I think it's confused by the proliferation (of which I am guilty) of other examples that are thrown into the directory. Thoughts? Cheers, Grant
logging revisited...
While I'm on a roll tossing stuff out there Since SOLR-560, solr depends on SLF4j as the logging interface. However since we also depend on HttpClient we *also* depend on commons- logging. This is a strange. Our maven artifacts now depend on two logging frameworks! However the good folks at SLF4j have a nice solution -- a drop in replacement for commons-logging that uses slf4j. HttpClient discussed switching to SLF4j for version 4. They decided not to because the slfj4 drop-in replacement gives their users even more options. In Droids we had the same discussion, and now use commons-logging API. So, with that in mind I think we should consider using the commons- logging API and shipping the .war file with the slf4j drop in replacement. The behavior will be identical and their will be one fewer libraries. The loss is the potential to use some of slf4j's more advanced logging features, but I don't see us taking advantage of that anyway. ryan
Re: logging revisited...
LOL! On Dec 4, 2008, at 4:43 PM, Ryan McKinley wrote: While I'm on a roll tossing stuff out there Since SOLR-560, solr depends on SLF4j as the logging interface. However since we also depend on HttpClient we *also* depend on commons-logging. This is a strange. Our maven artifacts now depend on two logging frameworks! However the good folks at SLF4j have a nice solution -- a drop in replacement for commons-logging that uses slf4j. HttpClient discussed switching to SLF4j for version 4. They decided not to because the slfj4 drop-in replacement gives their users even more options. In Droids we had the same discussion, and now use commons-logging API. So, with that in mind I think we should consider using the commons- logging API and shipping the .war file with the slf4j drop in replacement. The behavior will be identical and their will be one fewer libraries. The loss is the potential to use some of slf4j's more advanced logging features, but I don't see us taking advantage of that anyway. ryan
Re: logo contest
: All apache_solr_c_red.jpg flavoured logos have a total score of 94. : That should be taken into account IMHO and we should reduce the number : of choices for these ones. To re-iterate a comment I made in SOLR-84 that wouldn't be fair to the people who have been submitting ideas and then retracting them and resubmitting variations based on feedback. People were told many, MANY, times throughout the process that submiting multiple variant entries would risk diluting the votes. One of the purposes of the long period for submissions was to give people time to post ideas, get feedback, and then tweak submissions and people who did that shouldn't be excluded from the final vote for following the rules. (sslogo-solr-finder2.0.png is a prime example of this) -Hoss
Re: logo contest
: Being the likely two candidates for winning. My guess is that : narrowing to the two most popular options first would make #2 the : winner, while voting on the top 10 (w/o any strategy for winning) : would make #1 the winner. limiting to only voting for the top 2 seems unrepresentative since more then one apache_solr_c_red.jpg variant tied for 2nd. : fun, fun. So people who want one of these options to win should vote : only for that option, really. Perhaps instead of just ranking top 5, we should ask committers to rank all of the choices on the final ballot to eliminate the strategy factor you are refering to ... i think we can trust all committers to understand this, but if someone botches it (or refuses?) we'll just shift the number of points each item earns down by the appropraite number (so if you want your 1st rank to earn 10 points, you must list all 10, if you only list 4 then your top ranked item only earns 4 points) that won't violate anything in the rules as orriginally spelled out, and should help take into account the variant score dilution. (even though i don't think we should be overly accomidating this seems fair) -Hoss
Re: logo contest
: right, but what should be the role of committers voting in the second round? : Is it: : : 1. Rank the entries the committers like best : or : 2. Rank the entries the committers think best represent the community : preferences. : : My understanding of the purpose of the second round is to interpret the : results of the community poll and cast a binding VOTE. I think we should committers should cast their votes as they feel appropriate to best serve the interests of the community -- it's not really different then voting on an implementation approach for a feature, or what logging framework to use, or a decisison to switch from java 1.5 to 1.6 ... we have to make a subjective decision based on the feedback we've observed from the community as a whole (with solr-logo-results.html serving as our cliff notes) -Hoss
Re: logo contest
: Hoss may lay down the rules on us, but if he doesn't (or if hes in a good mood For the record: i'm (almost) always i na good mood -- it's just hard to tell because i spell like an angry unibomber wanna-be and i have a moral objection to using emoticons so my email based sarcasim is very, very, dry. -Hoss
[jira] Created: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser
Solr Query Parser Plugin for Mark Miller's Qsol Parser -- Key: SOLR-896 URL: https://issues.apache.org/jira/browse/SOLR-896 Project: Solr Issue Type: New Feature Components: search Reporter: Chris Harris An extremely basic plugin to get the Qsol query parser (http://www.myhardshadow.com/qsol.php) working in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: logo contest
On 4-Dec-08, at 2:33 PM, Chris Hostetter wrote: : Being the likely two candidates for winning. My guess is that : narrowing to the two most popular options first would make #2 the : winner, while voting on the top 10 (w/o any strategy for winning) : would make #1 the winner. limiting to only voting for the top 2 seems unrepresentative since more then one apache_solr_c_red.jpg variant tied for 2nd. : fun, fun. So people who want one of these options to win should vote : only for that option, really. Perhaps instead of just ranking top 5, we should ask committers to rank all of the choices on the final ballot to eliminate the strategy factor you are refering to ... i think we can trust all committers to understand this, but if someone botches it (or refuses?) we'll just shift the number of points each item earns down by the appropraite number (so if you want your 1st rank to earn 10 points, you must list all 10, if you only list 4 then your top ranked item only earns 4 points) Eliminating strategic voting merely biases the outcome toward the logo without the vote splitting problem. That is no solution. It is better to allow strategic voting, as that is the only way for voters to express certain preferences in this system. I would personally prefer more of an elimination-style vote (i.e., STV). Each voter lists the logos they prefer, in order. The logos are ranked by first place votes. The last in the rank is eliminated from the contest, and anyone who had that logo as their first-place vote has their vote transferred to the next logo on the list, if any. Iterate until two logos remain. There is no danger of vote-splitting and the outcome maximizes global welfare in terms of binary preferences (well, probably not, due to Arrow's theorem, but it does a good job regardless). -Mike
Re: logging revisited...
: Subject: logging revisited... I'm starting to think Ryan woke up today and asked himself what's the best way to screw with Hoss on his day off when he's only casually skimming email? : So, with that in mind I think we should consider using the commons-logging API : and shipping the .war file with the slf4j drop in replacement. The behavior : will be identical and their will be one fewer libraries. The loss is the : potential to use some of slf4j's more advanced logging features, but I don't : see us taking advantage of that anyway. so if i'm understanding your suggestion correctly: 1) we change all of the logging calls in solr to compile against the commons-logging API. 2) we do *not* ship with the commons-logging api. 3) we ship with an slf4j provided jar that implements the commons-logging api, funnels the log messages through slf4j and uses java.util.logging as it's output by default. 4) people who want to configure solr logging via some other favorite logging framework (log4j, etc...) can still add another magic slf4j jar to make slf4j write to their framework of choice instead of java.util.logging. ...do i have that correctly? I feel dirty just thinking about this. I think i may just abstain from any and all current or future discussions or decisions about logging. I'm really not that old, but I feel like I age 5 years every time the topic comes up. -Hoss
[jira] Updated: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser
[ https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Harris updated SOLR-896: -- Attachment: SOLR-896.patch I don't know if this first stab will be useful to anyone else or not, but it might be slightly easier to get started with than writing your own. Limitations include: * No ability to configure qsol (even though qsol is highly configurable) -- you're stuck with the defaults * This doesn't alter qsol itself at all, so you don't get support for certain Solr goodies, like function queries Usage: * This patch creates solrroot/contrib/qsol. * Download qsol from the qsol home page and put qsol jar into solrroot/contrib/qsol/lib * cd solrroot/contrib/qsol * Run ant (no args needed) to create the qsol Solr plugin (solrroot/contrib/qsol/build/apache-solr-qsol-1.4-dev.jar or some such) * To deploy, copy both the qsol Solr plugin jar and qsol.jar to your solr lib directory. In the example jetty setup that comes with solr, that should be solrroot/example/solr/lib/. In a multicore setup, you can specify where the lib directory is in solr.xml. Solr Query Parser Plugin for Mark Miller's Qsol Parser -- Key: SOLR-896 URL: https://issues.apache.org/jira/browse/SOLR-896 Project: Solr Issue Type: New Feature Components: search Reporter: Chris Harris Attachments: SOLR-896.patch An extremely basic plugin to get the Qsol query parser (http://www.myhardshadow.com/qsol.php) working in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser
[ https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653535#action_12653535 ] ryguasu edited comment on SOLR-896 at 12/4/08 3:06 PM: I don't know if this first stab will be useful to anyone else or not, but it might be slightly easier to get started with than writing your own. Limitations include: * No ability to configure qsol (even though qsol is highly configurable) -- you're stuck with the defaults * This doesn't alter qsol itself at all, so you don't get support for certain Solr goodies, like function queries Usage: * This patch creates solrroot/contrib/qsol. * Download qsol from the qsol home page and put qsol jar into solrroot/contrib/qsol/lib * cd solrroot/contrib/qsol * Run ant (no args needed) to create the qsol Solr plugin (solrroot/contrib/qsol/build/apache-solr-qsol-1.4-dev.jar or some such) * To deploy, copy both the qsol Solr plugin jar and qsol.jar to your solr lib directory. In the example jetty setup that comes with solr, that should be solrroot/example/solr/lib/. In a multicore setup, you can specify where the lib directory is in solr.xml. * There are a few different ways to make qsol accessible from Solr now. One is to add queryParser name=qsol class=org.apache.solr.search.QsolQParserPlugin/ to your solrconfig.xml, and then to prepend {!qsol} to your queries URLs, e.g. ...?q={!qsol}term1 | term2. See http://wiki.apache.org/solr/SolrPlugins for more info. was (Author: ryguasu): I don't know if this first stab will be useful to anyone else or not, but it might be slightly easier to get started with than writing your own. Limitations include: * No ability to configure qsol (even though qsol is highly configurable) -- you're stuck with the defaults * This doesn't alter qsol itself at all, so you don't get support for certain Solr goodies, like function queries Usage: * This patch creates solrroot/contrib/qsol. * Download qsol from the qsol home page and put qsol jar into solrroot/contrib/qsol/lib * cd solrroot/contrib/qsol * Run ant (no args needed) to create the qsol Solr plugin (solrroot/contrib/qsol/build/apache-solr-qsol-1.4-dev.jar or some such) * To deploy, copy both the qsol Solr plugin jar and qsol.jar to your solr lib directory. In the example jetty setup that comes with solr, that should be solrroot/example/solr/lib/. In a multicore setup, you can specify where the lib directory is in solr.xml. Solr Query Parser Plugin for Mark Miller's Qsol Parser -- Key: SOLR-896 URL: https://issues.apache.org/jira/browse/SOLR-896 Project: Solr Issue Type: New Feature Components: search Reporter: Chris Harris Attachments: SOLR-896.patch An extremely basic plugin to get the Qsol query parser (http://www.myhardshadow.com/qsol.php) working in Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: logging revisited...
To a certain extent SLF4j makes this decision a fairly small one, namely what API do you want to code to inside SOLR and what jars do you want to ship as a part of the distribution. It doesn't really matter if you pick commons-logging, log4j or slf4j; all have drop in replacements via SLF4j. They also have one for java.util.logging however it requires custom code to activate since you can't replace java.* classes. End users get to do pretty much whatever they want as far as logging goes if you use SLF4j. SLF4j has also updated their 'legacy' page since the last time I looked which was the ~last time this came up: http://www.slf4j.org/legacy.html We choose to code against slf4j APIs as it seemed like it was where things were going (including solr) and gave us and our customers the ability to switch to something else with minimal effort. We also ship log4j+config jars by default because it had the richest config/appender set at the time however the logback project seems like it might be catching up. (good thing we can switch with no code changes) - will -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2008 4:44 PM To: solr-dev@lucene.apache.org Subject: logging revisited... While I'm on a roll tossing stuff out there Since SOLR-560, solr depends on SLF4j as the logging interface. However since we also depend on HttpClient we *also* depend on commons- logging. This is a strange. Our maven artifacts now depend on two logging frameworks! However the good folks at SLF4j have a nice solution -- a drop in replacement for commons-logging that uses slf4j. HttpClient discussed switching to SLF4j for version 4. They decided not to because the slfj4 drop-in replacement gives their users even more options. In Droids we had the same discussion, and now use commons-logging API. So, with that in mind I think we should consider using the commons- logging API and shipping the .war file with the slf4j drop in replacement. The behavior will be identical and their will be one fewer libraries. The loss is the potential to use some of slf4j's more advanced logging features, but I don't see us taking advantage of that anyway. ryan
Re: logging revisited...
On Dec 4, 2008, at 5:55 PM, Chris Hostetter wrote: : Subject: logging revisited... I'm starting to think Ryan woke up today and asked himself what's the best way to screw with Hoss on his day off when he's only casually skimming email? If I knew you had the day off, I would ask about moving to jdk 1.6! : So, with that in mind I think we should consider using the commons- logging API : and shipping the .war file with the slf4j drop in replacement. The behavior : will be identical and their will be one fewer libraries. The loss is the : potential to use some of slf4j's more advanced logging features, but I don't : see us taking advantage of that anyway. so if i'm understanding your suggestion correctly: 1) we change all of the logging calls in solr to compile against the commons-logging API. 2) we do *not* ship with the commons-logging api. 3) we ship with an slf4j provided jar that implements the commons- logging api, funnels the log messages through slf4j and uses java.util.logging as it's output by default. 4) people who want to configure solr logging via some other favorite logging framework (log4j, etc...) can still add another magic slf4j jar to make slf4j write to their framework of choice instead of java.util.logging. ...do i have that correctly? I feel dirty just thinking about this. I'm afraid so, but I'll describe it differently so it does not sound as crazy. 1. We compile everything against the commons-logging API (JCL) 2. We ship the .war file with a JCL implementation that behaves identical to solr-1.3. Currently the best option is: jcl-over- slf4j.jar + slf4j-jdk14. 3. Anyone using the solr.jar could use JCL or SLF4j magic I think i may just abstain from any and all current or future discussions or decisions about logging. I'm really not that old, but I feel like I age 5 years every time the topic comes up. I would have left well enough alone, but I am working with maven dependencies now and the duplicate logging frameworks feels a bit odd. I am happy with any choice here, but figured I should bring it up before it is 'cooked' in to an official release. I am happily to stuff the genie back in the bottle, but i don't think that puts years back in the bank. ryan
Solr 1.4-SNAPSHOT pom pointing to invalid commons-io?
Solr 1.4-SNAPSHOT seems to now be requiring: 3) org.apache.commons:commons-io:jar:1.4 which doesn't appear to exist on public repositories. commons-io:commons-io:1.4 does exist. If you clear your repository and build using it, the build fails. Before entering a bug, is anyone else seeing that? -- Jayson -- View this message in context: http://www.nabble.com/Solr-1.4-SNAPSHOT-pom-pointing-to-invalid-commons-io--tp20845138p20845138.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: Solr 1.4-SNAPSHOT pom pointing to invalid commons-io?
dooh -- my fault. I had one in my local repos, but its non-standard I'll make the fix in just a sec... thanks ryan On Dec 4, 2008, at 6:51 PM, jayson.minard wrote: Solr 1.4-SNAPSHOT seems to now be requiring: 3) org.apache.commons:commons-io:jar:1.4 which doesn't appear to exist on public repositories. commons-io:commons-io:1.4 does exist. If you clear your repository and build using it, the build fails. Before entering a bug, is anyone else seeing that? -- Jayson -- View this message in context: http://www.nabble.com/Solr-1.4-SNAPSHOT-pom-pointing-to-invalid-commons-io--tp20845138p20845138.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: Solr 1.4-SNAPSHOT pom pointing to invalid commons-io?
fixed in rev 723554 On Dec 4, 2008, at 6:51 PM, jayson.minard wrote: Solr 1.4-SNAPSHOT seems to now be requiring: 3) org.apache.commons:commons-io:jar:1.4 which doesn't appear to exist on public repositories. commons-io:commons-io:1.4 does exist. If you clear your repository and build using it, the build fails. Before entering a bug, is anyone else seeing that? -- Jayson -- View this message in context: http://www.nabble.com/Solr-1.4-SNAPSHOT-pom-pointing-to-invalid-commons-io--tp20845138p20845138.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: Can we use Berkley DB java in Solr
On Fri, Dec 5, 2008 at 12:57 AM, Yonik Seeley [EMAIL PROTECTED] wrote: On Thu, Dec 4, 2008 at 11:47 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: I tried that and the solution looked so clumsy . I need to commit the to read anything was making things difficult In a high update environment, most documents would be exposed to an open reader with no need to commit or reopen the index to retrieve the stored fields. In a way, solving the more realtime update issue removes the necessity for this altogether. Is Lucene write much faster than DB (embedded) writes? More to the point, we're already doing the Lucene write (for the most part) anyway, and the DB write is overhead to the indexing process. Considering the fact that the extra Lucene write is over and above the normal indexing I guess we must compare the cost of indexing of 1 document in luven vs cost of writing one row in a DB. DB gives me an option of writing to a remote m/c . Thus freeing up my local disk. Lucene has to write to Local disk In DB I am writing a byte[] (which is quite compressed) . Lucene may end up writing more data. So more disk I/O (I am just giving a theory ). Does lucene allow me to write byte[]. ? The Lucene API itself is more complex for this kind of operations. (disclaimer: I do not know a whole lot of it) . Moreover this is just an UpdateRequestProcessor (No changes to the core). We can have a Lucene based one also. Most of the users would not use this feature (the perf sensistive users).The ones who do random updates will not notice it. The only problem is for users who index heavily and still want to enable this. -Yonik On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A database, just to store uncommitted documents in case they might be updated, seems like it will have a pretty major impact on indexing performance. A lucene-only implementation would seem to be much lighter on resources. -Yonik On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: The solution will be an UpdateRequestProcessor (which itself is pluggable).I am implementing a JDBC based one. I'll test with H2 and MySql (and may be Derby) We will ship the H2 (embedded) jar On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Again, I would hope that solr builds a storage agnostic solution. As long as we have a simple interface to load/store documents, it should be easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation. ryan On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള് नोब्ळ् wrote: Cassandra does not meet our requirements. we do not need that kind of scalability Moreover its future is uncertain and they are trying to incubate it into Solr On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren [EMAIL PROTECTED] wrote: Yet another possibility: http://wiki.apache.org/incubator/Cassandra It at least claims to be scalable, no personal experience. -- Sami Siren Noble Paul ??? ?? wrote: Another persistence solution is ehcache with diskstore. It even has replication I have never used ehcache . So I cannot comment on it any comments? --Noble On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??? ?? [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: On Dec 3, 2008, at 1:28 AM, Noble Paul ??? ?? wrote: The code can be written against JDBC. But we need to test the DDL and data types on al the supported DBs But , which one would we like to ship with Solr as a default option? Why do we need a default option? Is this something that is intended to be on by default? Or, do you mean just to have one for unit tests to work? Default does not mean that it is enabled bby default. But if it is enabled I can have defaults for stuff like driver, url , DDL etc. And the user may not need to provide an extra jar I don't know if it is still the case, but I often find embedded dbs to be quite annoying since you often can't connect to them from other clients outside of the JVM which makes debugging harder. Of course, maybe I just don't know the tricks to do it. Derby is one DB that you can still connect to even when it is embedded. Embedded is the best bet for us because of performance reasons and zero management. The users can still read the data through Solr itself . Also, whatever is chosen needs to scale to millions of documents, and I wonder about an embedded DB doing that. I also have a hard time believing that both a DB w/ millions of docs and Solr can live on the same machine, which is presumably what an embedded DB must do. Presumably, it also needs to be able to be replicated, right? millions of docs.? then you must configure a remote DB for storage reasons and must manage the replication separately H2 looks impressive. the jar (small) is just 667KB and the memory footprint is