Re: Lucene 2.9 in Solr 1.4
Sounds good! I'd be happy to help speed things up, so if there's anything I can do, please let me know! Cheers, Aleksander On Fri, 12 Jun 2009 17:42:04 +0200, Yonik Seeley yo...@lucidimagination.com wrote: So it looks like Lucene 2.9 has all of a sudden accelerated the release, and it seems at this point wiser to release Solr 1.4 with Lucene 2.9 (non-dev), assuming that goes quickly as planned. Feels like we're a bit behind schedule here in Solr-land anyway, so it really doesn't seem like it would slow our release up much. -Yonik http://www.lucidimagination.com -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
[jira] Created: (SOLR-1219) use setproxy ant task when proxy properties are specified
use setproxy ant task when proxy properties are specified - Key: SOLR-1219 URL: https://issues.apache.org/jira/browse/SOLR-1219 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Trivial Currectly, ant luke and ant example will be fail if you use proxy: {code} $ ant luke build.xml:881: HTTP Authorization failure {code} To avoid this, use setproxy ant task when properties are specified by the user: {code} $ ant luke -Dproxy.host=hostname -Dproxy.port=8080 -Dproxy.user=user -Dproxy.password=passwd {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1219) use setproxy ant task when proxy properties are specified
[ https://issues.apache.org/jira/browse/SOLR-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1219: - Attachment: SOLR-1219.patch use setproxy ant task when proxy properties are specified - Key: SOLR-1219 URL: https://issues.apache.org/jira/browse/SOLR-1219 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Trivial Attachments: SOLR-1219.patch Currectly, ant luke and ant example will be fail if you use proxy: {code} $ ant luke build.xml:881: HTTP Authorization failure {code} To avoid this, use setproxy ant task when properties are specified by the user: {code} $ ant luke -Dproxy.host=hostname -Dproxy.port=8080 -Dproxy.user=user -Dproxy.password=passwd {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1203) We should add an example of setting the update.processor for a given RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719460#action_12719460 ] Noble Paul commented on SOLR-1203: -- update.processor is not for RequestHandler it is common across all requesthandlers We should add an example of setting the update.processor for a given RequestHandler --- Key: SOLR-1203 URL: https://issues.apache.org/jira/browse/SOLR-1203 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 1.4 a commented out example that points to the commented out example update chain -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: solr 1.4 lite ?
Is bandwidth or disk space really an issue for people today ? you should be focusing on decreasing the size and speed of indexing stuff not the code-base. It's not like you guys have unlimited time to spend on this project. ps. if you don't have the examples in there, then people won't know that feature exists. and yes. i regularly copy the example schema to create a new index. I know it's bad practice, and not the most efficient schema, but it usually has the cool features enabled in it ;-) Noble Paul ??? ?? wrote: +1 for solr lite A lot of users are fine without those example stuff (dih , cell) one option is to have two different distributions. solr.zip and a solr-min.zip On Thu, Jun 11, 2009 at 6:59 AM, Matthew Runomr...@zappos.com wrote: I'd be willing to guess that the vast majority of users start off with the example app and customize from there to meet their needs. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jun 10, 2009, at 6:04 PM, Eric Pugh wrote: Has anyone really complained about the size of Solr? One of the things I like about Solr is how simple it is to get things up and running, and how accesible the example directory makes everything. When I first played with DIH and Cell, everything was there. I didn't have to chase down .jar's from multiple places. Maybe if it was as simple as ant build-example-cell and ant build-example-dih then there might not be a barrier to entry for new users. I'd be curious to hear what percentage of folks deploy solr based on the example app, and how many just start out with the most stripped down solr.war and build everything up from there? Eric On Jun 10, 2009, at 6:56 PM, Grant Ingersoll wrote: +1 Should be easy enough to conjure up the Ant magic. On Jun 10, 2009, at 5:55 PM, Yonik Seeley wrote: Thanks for bringing this up Patrick... clearly it would be nice to avoid (or mandate) 100MB downloads! -Yonik http://www.lucidimagination.com On Wed, Jun 10, 2009 at 5:50 PM, patrick o'leary pj...@pjaol.com wrote: Just using the apache-solr example directory, it seems to have gotten a bit big e.g. $ du -sh * 13M apache-solr-1.3.0 92M apache-solr-1.4.0 The biggest space user being example-DIH apache-solr-1.4.0/example $ du -sh * 4.0KREADME.txt 5.5Mclustering 80K etc *32M example-DIH* 42K exampleAnalysis 168Kexampledocs 13M lib 52K logs 118Kmulticore *31M solr* 20K start.jar 12M webapps 12K work solr/lib is now 30mb apache-solr-1.4.0/example/solr/lib $ ls -lhS total 30M -rwx-- 1 pjaol None 14M Jun 10 17:08 ooxml-schemas-1.0.jar -rwx-- 1 pjaol None 4.3M Jun 10 17:08 icu4j-3.8.jar -rwx-- 1 pjaol None 3.2M Jun 10 17:08 pdfbox-0.7.3.jar -rwx-- 1 pjaol None 2.6M Jun 10 17:08 xmlbeans-2.3.0.jar -rwx-- 1 pjaol None 1.5M Jun 10 17:08 poi-3.5-beta5.jar -rwx-- 1 pjaol None 1.2M Jun 10 17:08 xercesImpl-2.8.1.jar -rwx-- 1 pjaol None 1.1M Jun 10 17:08 bcprov-jdk14-132.jar as opposed to 0 for 1.3.0 This pushes solr to over a 100mb download for features that I'm sure can be packaged up separately as they look seldom used. It would make sense if there's going to be a batteries included version to also have a solr-lite version. P -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: solr 1.4 lite ?
On Mon, Jun 15, 2009 at 2:32 PM, Ian Holsmanli...@holsman.net wrote: Is bandwidth or disk space really an issue for people today ? you should be focusing on decreasing the size and speed of indexing stuff not the code-base. It's not like you guys have unlimited time to spend on this project. This does not have to be a code change . It just just has to be a new build target (even that is code, but...) downloading 100MB is not very pleasant if you keep downloading nightlies every now and then. ps. if you don't have the examples in there, then people won't know that feature exists. and yes. i regularly copy the example schema to create a new index. I know it's bad practice, and not the most efficient schema, but it usually has the cool features enabled in it ;-) Noble Paul ??? ?? wrote: +1 for solr lite A lot of users are fine without those example stuff (dih , cell) one option is to have two different distributions. solr.zip and a solr-min.zip On Thu, Jun 11, 2009 at 6:59 AM, Matthew Runomr...@zappos.com wrote: I'd be willing to guess that the vast majority of users start off with the example app and customize from there to meet their needs. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jun 10, 2009, at 6:04 PM, Eric Pugh wrote: Has anyone really complained about the size of Solr? One of the things I like about Solr is how simple it is to get things up and running, and how accesible the example directory makes everything. When I first played with DIH and Cell, everything was there. I didn't have to chase down .jar's from multiple places. Maybe if it was as simple as ant build-example-cell and ant build-example-dih then there might not be a barrier to entry for new users. I'd be curious to hear what percentage of folks deploy solr based on the example app, and how many just start out with the most stripped down solr.war and build everything up from there? Eric On Jun 10, 2009, at 6:56 PM, Grant Ingersoll wrote: +1 Should be easy enough to conjure up the Ant magic. On Jun 10, 2009, at 5:55 PM, Yonik Seeley wrote: Thanks for bringing this up Patrick... clearly it would be nice to avoid (or mandate) 100MB downloads! -Yonik http://www.lucidimagination.com On Wed, Jun 10, 2009 at 5:50 PM, patrick o'leary pj...@pjaol.com wrote: Just using the apache-solr example directory, it seems to have gotten a bit big e.g. $ du -sh * 13M apache-solr-1.3.0 92M apache-solr-1.4.0 The biggest space user being example-DIH apache-solr-1.4.0/example $ du -sh * 4.0K README.txt 5.5M clustering 80K etc *32M example-DIH* 42K exampleAnalysis 168K exampledocs 13M lib 52K logs 118K multicore *31M solr* 20K start.jar 12M webapps 12K work solr/lib is now 30mb apache-solr-1.4.0/example/solr/lib $ ls -lhS total 30M -rwx-- 1 pjaol None 14M Jun 10 17:08 ooxml-schemas-1.0.jar -rwx-- 1 pjaol None 4.3M Jun 10 17:08 icu4j-3.8.jar -rwx-- 1 pjaol None 3.2M Jun 10 17:08 pdfbox-0.7.3.jar -rwx-- 1 pjaol None 2.6M Jun 10 17:08 xmlbeans-2.3.0.jar -rwx-- 1 pjaol None 1.5M Jun 10 17:08 poi-3.5-beta5.jar -rwx-- 1 pjaol None 1.2M Jun 10 17:08 xercesImpl-2.8.1.jar -rwx-- 1 pjaol None 1.1M Jun 10 17:08 bcprov-jdk14-132.jar as opposed to 0 for 1.3.0 This pushes solr to over a 100mb download for features that I'm sure can be packaged up separately as they look seldom used. It would make sense if there's going to be a batteries included version to also have a solr-lite version. P -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- - Noble Paul | Principal Engineer| AOL | http://aol.com
[jira] Assigned: (SOLR-1150) OutofMemoryError on enabling highlighting
[ https://issues.apache.org/jira/browse/SOLR-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-1150: - Assignee: Mark Miller OutofMemoryError on enabling highlighting - Key: SOLR-1150 URL: https://issues.apache.org/jira/browse/SOLR-1150 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 1.4 Reporter: Siddharth Gargate Assignee: Mark Miller Fix For: 1.4 Attachments: SOLR-1150.patch Please refer to following mail thread http://markmail.org/message/5nhkm5h3ongqlput I am testing with 2MB document size and just 500 documents. Indexing is working fine even with 128MB heap size. But on searching Solr throws OOM error. This issue is observed only when we enable highlighting. While indexing I am storing 1 MB text. While searching Solr reads all the 500 documents in the memory. It also reads the complete 1 MB stored field in the memory for all 500 documents. Due to this 500 docs * 1 MB * 2 (2 bytes per char) = 1000 MB memory is required for searching. This memory usage can be reduced by reading one document at a time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: solr 1.4 lite ?
Those examples and features should still exist but in a batteries included version The lite version is just for those who want the simple basic search feature. More than 70% of solr these days is indexing strategy or client libraries, rather than search. These are important, but are now beginning to clutter things up for deployment. It would be good to offer a lite version for those that are just upgrading, and a simple target to build a lite-example for those hacking away on the code or distributing a example version. On Mon, Jun 15, 2009 at 5:02 AM, Ian Holsman li...@holsman.net wrote: Is bandwidth or disk space really an issue for people today ? you should be focusing on decreasing the size and speed of indexing stuff not the code-base. It's not like you guys have unlimited time to spend on this project. ps. if you don't have the examples in there, then people won't know that feature exists. and yes. i regularly copy the example schema to create a new index. I know it's bad practice, and not the most efficient schema, but it usually has the cool features enabled in it ;-) Noble Paul ??? ?? wrote: +1 for solr lite A lot of users are fine without those example stuff (dih , cell) one option is to have two different distributions. solr.zip and a solr-min.zip On Thu, Jun 11, 2009 at 6:59 AM, Matthew Runomr...@zappos.com wrote: I'd be willing to guess that the vast majority of users start off with the example app and customize from there to meet their needs. Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833 On Jun 10, 2009, at 6:04 PM, Eric Pugh wrote: Has anyone really complained about the size of Solr? One of the things I like about Solr is how simple it is to get things up and running, and how accesible the example directory makes everything. When I first played with DIH and Cell, everything was there. I didn't have to chase down .jar's from multiple places. Maybe if it was as simple as ant build-example-cell and ant build-example-dih then there might not be a barrier to entry for new users. I'd be curious to hear what percentage of folks deploy solr based on the example app, and how many just start out with the most stripped down solr.war and build everything up from there? Eric On Jun 10, 2009, at 6:56 PM, Grant Ingersoll wrote: +1 Should be easy enough to conjure up the Ant magic. On Jun 10, 2009, at 5:55 PM, Yonik Seeley wrote: Thanks for bringing this up Patrick... clearly it would be nice to avoid (or mandate) 100MB downloads! -Yonik http://www.lucidimagination.com On Wed, Jun 10, 2009 at 5:50 PM, patrick o'leary pj...@pjaol.com wrote: Just using the apache-solr example directory, it seems to have gotten a bit big e.g. $ du -sh * 13M apache-solr-1.3.0 92M apache-solr-1.4.0 The biggest space user being example-DIH apache-solr-1.4.0/example $ du -sh * 4.0KREADME.txt 5.5Mclustering 80K etc *32M example-DIH* 42K exampleAnalysis 168Kexampledocs 13M lib 52K logs 118Kmulticore *31M solr* 20K start.jar 12M webapps 12K work solr/lib is now 30mb apache-solr-1.4.0/example/solr/lib $ ls -lhS total 30M -rwx-- 1 pjaol None 14M Jun 10 17:08 ooxml-schemas-1.0.jar -rwx-- 1 pjaol None 4.3M Jun 10 17:08 icu4j-3.8.jar -rwx-- 1 pjaol None 3.2M Jun 10 17:08 pdfbox-0.7.3.jar -rwx-- 1 pjaol None 2.6M Jun 10 17:08 xmlbeans-2.3.0.jar -rwx-- 1 pjaol None 1.5M Jun 10 17:08 poi-3.5-beta5.jar -rwx-- 1 pjaol None 1.2M Jun 10 17:08 xercesImpl-2.8.1.jar -rwx-- 1 pjaol None 1.1M Jun 10 17:08 bcprov-jdk14-132.jar as opposed to 0 for 1.3.0 This pushes solr to over a 100mb download for features that I'm sure can be packaged up separately as they look seldom used. It would make sense if there's going to be a batteries included version to also have a solr-lite version. P -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
[jira] Commented: (SOLR-1150) OutofMemoryError on enabling highlighting
[ https://issues.apache.org/jira/browse/SOLR-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719597#action_12719597 ] Mark Miller commented on SOLR-1150: --- Odd - change looks like it wouldnt affect this, but somehow the highlighter test fails as it attempts to access a deleted doc. Not quite sure what is up yet. OutofMemoryError on enabling highlighting - Key: SOLR-1150 URL: https://issues.apache.org/jira/browse/SOLR-1150 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 1.4 Reporter: Siddharth Gargate Assignee: Mark Miller Fix For: 1.4 Attachments: SOLR-1150.patch Please refer to following mail thread http://markmail.org/message/5nhkm5h3ongqlput I am testing with 2MB document size and just 500 documents. Indexing is working fine even with 128MB heap size. But on searching Solr throws OOM error. This issue is observed only when we enable highlighting. While indexing I am storing 1 MB text. While searching Solr reads all the 500 documents in the memory. It also reads the complete 1 MB stored field in the memory for all 500 documents. Due to this 500 docs * 1 MB * 2 (2 bytes per char) = 1000 MB memory is required for searching. This memory usage can be reduced by reading one document at a time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1150) OutofMemoryError on enabling highlighting
[ https://issues.apache.org/jira/browse/SOLR-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719598#action_12719598 ] Mark Miller commented on SOLR-1150: --- There is a problem with distrib and highlighting as well. OutofMemoryError on enabling highlighting - Key: SOLR-1150 URL: https://issues.apache.org/jira/browse/SOLR-1150 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 1.4 Reporter: Siddharth Gargate Assignee: Mark Miller Fix For: 1.4 Attachments: SOLR-1150.patch Please refer to following mail thread http://markmail.org/message/5nhkm5h3ongqlput I am testing with 2MB document size and just 500 documents. Indexing is working fine even with 128MB heap size. But on searching Solr throws OOM error. This issue is observed only when we enable highlighting. While indexing I am storing 1 MB text. While searching Solr reads all the 500 documents in the memory. It also reads the complete 1 MB stored field in the memory for all 500 documents. Due to this 500 docs * 1 MB * 2 (2 bytes per char) = 1000 MB memory is required for searching. This memory usage can be reduced by reading one document at a time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1150) OutofMemoryError on enabling highlighting
[ https://issues.apache.org/jira/browse/SOLR-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719599#action_12719599 ] Yonik Seeley commented on SOLR-1150: It's trying to read the loop iterator (i.e. 0-9) ;-) OutofMemoryError on enabling highlighting - Key: SOLR-1150 URL: https://issues.apache.org/jira/browse/SOLR-1150 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 1.4 Reporter: Siddharth Gargate Assignee: Mark Miller Fix For: 1.4 Attachments: SOLR-1150.patch Please refer to following mail thread http://markmail.org/message/5nhkm5h3ongqlput I am testing with 2MB document size and just 500 documents. Indexing is working fine even with 128MB heap size. But on searching Solr throws OOM error. This issue is observed only when we enable highlighting. While indexing I am storing 1 MB text. While searching Solr reads all the 500 documents in the memory. It also reads the complete 1 MB stored field in the memory for all 500 documents. Due to this 500 docs * 1 MB * 2 (2 bytes per char) = 1000 MB memory is required for searching. This memory usage can be reduced by reading one document at a time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1150) OutofMemoryError on enabling highlighting
[ https://issues.apache.org/jira/browse/SOLR-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719601#action_12719601 ] Mark Miller commented on SOLR-1150: --- ah, thanks for the spot yonik. Ill switch it to use doc ids instead and see how things go :) OutofMemoryError on enabling highlighting - Key: SOLR-1150 URL: https://issues.apache.org/jira/browse/SOLR-1150 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 1.4 Reporter: Siddharth Gargate Assignee: Mark Miller Fix For: 1.4 Attachments: SOLR-1150.patch Please refer to following mail thread http://markmail.org/message/5nhkm5h3ongqlput I am testing with 2MB document size and just 500 documents. Indexing is working fine even with 128MB heap size. But on searching Solr throws OOM error. This issue is observed only when we enable highlighting. While indexing I am storing 1 MB text. While searching Solr reads all the 500 documents in the memory. It also reads the complete 1 MB stored field in the memory for all 500 documents. Due to this 500 docs * 1 MB * 2 (2 bytes per char) = 1000 MB memory is required for searching. This memory usage can be reduced by reading one document at a time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1216) disambiguate the replication command names
[ https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719609#action_12719609 ] Walter Underwood commented on SOLR-1216: sync is a weak name, because it doesn't say whether it is a push or pull synchronization. disambiguate the replication command names -- Key: SOLR-1216 URL: https://issues.apache.org/jira/browse/SOLR-1216 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1216.patch There is a lot of confusion in the naming of various commands such as snappull, snapshot etc. This is a vestige of the script based replication we currently have. The commands can be renamed to make more sense * 'snappull' to be renamed to 'sync' * 'snapshot' to be renamed to 'backup' thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1216) disambiguate the replication command names
[ https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719620#action_12719620 ] Mark Miller commented on SOLR-1216: --- thats why I was torn between it and syncFromMaster. Not in love with that either though. Any suggestions? disambiguate the replication command names -- Key: SOLR-1216 URL: https://issues.apache.org/jira/browse/SOLR-1216 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1216.patch There is a lot of confusion in the naming of various commands such as snappull, snapshot etc. This is a vestige of the script based replication we currently have. The commands can be renamed to make more sense * 'snappull' to be renamed to 'sync' * 'snapshot' to be renamed to 'backup' thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1216) disambiguate the replication command names
[ https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719625#action_12719625 ] Walter Underwood commented on SOLR-1216: If we choose a name for the thing we are pulling, like image, then we can use makeimage, pullimage, etc. disambiguate the replication command names -- Key: SOLR-1216 URL: https://issues.apache.org/jira/browse/SOLR-1216 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1216.patch There is a lot of confusion in the naming of various commands such as snappull, snapshot etc. This is a vestige of the script based replication we currently have. The commands can be renamed to make more sense * 'snappull' to be renamed to 'sync' * 'snapshot' to be renamed to 'backup' thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1150) OutofMemoryError on enabling highlighting
[ https://issues.apache.org/jira/browse/SOLR-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1150: -- Attachment: SOLR-1150.patch OutofMemoryError on enabling highlighting - Key: SOLR-1150 URL: https://issues.apache.org/jira/browse/SOLR-1150 Project: Solr Issue Type: Improvement Components: highlighter Affects Versions: 1.4 Reporter: Siddharth Gargate Assignee: Mark Miller Fix For: 1.4 Attachments: SOLR-1150.patch, SOLR-1150.patch Please refer to following mail thread http://markmail.org/message/5nhkm5h3ongqlput I am testing with 2MB document size and just 500 documents. Indexing is working fine even with 128MB heap size. But on searching Solr throws OOM error. This issue is observed only when we enable highlighting. While indexing I am storing 1 MB text. While searching Solr reads all the 500 documents in the memory. It also reads the complete 1 MB stored field in the memory for all 500 documents. Due to this 500 docs * 1 MB * 2 (2 bytes per char) = 1000 MB memory is required for searching. This memory usage can be reduced by reading one document at a time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719677#action_12719677 ] Shekhar commented on SOLR-236: -- Here is the solfconfig file. requestHandler name=geo class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components strlocalsolr/str strcollapse/str /arr /requestHandler Following are the results I am getting : response − lst name=responseHeader int name=status0/int int name=QTime146/int − lst name=params str name=lat41.883784/str str name=radius50/str str name=collapse.fieldresource_id/str str name=rows2/str str name=indenton/str str name=flresource_id,geo_distance/str str name=qTV/str str name=qtgeo/str str name=long-87.637668/str /lst /lst − result name=response numFound=4294 start=0 − doc int name=resource_id10018/int double name=geo_distance26.16691883965225/double /doc − doc int name=resource_id10102/int double name=geo_distance39.90588996589528/double /doc /result − lst name=collapse_counts str name=fieldresource_id/str − lst name=doc int name=10022116/int int name=117014/int /lst − lst name=count int name=10015116/int int name=100184/int /lst − lst name=debug str name=Docset typeBitDocSet(5201)/str long name=Total collapsing time(ms)46/long long name=Create uncollapsed docset(ms)22/long long name=Collapsing normal time(ms)24/long long name=Creating collapseinfo time(ms)0/long long name=Convert to bitset time(ms)0/long long name=Create collapsed docset time(ms)0/long /lst /lst − result name=response numFound=5201 start=0 − doc int name=resource_id10015/int /doc − doc int name=resource_id10018/int /doc /result /response Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719677#action_12719677 ] Shekhar edited comment on SOLR-236 at 6/15/09 3:34 PM: --- Here is the solfconfig file. requestHandler name=geo class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components strlocalsolr/str strcollapse/str /arr /requestHandler You can get more details from http://www.gissearch.com/localsolr === Following are the results I am getting : response − lst name=responseHeader int name=status0/int int name=QTime146/int − lst name=params str name=lat41.883784/str str name=radius50/str str name=collapse.fieldresource_id/str str name=rows2/str str name=indenton/str str name=flresource_id,geo_distance/str str name=qTV/str str name=qtgeo/str str name=long-87.637668/str /lst /lst − result name=response numFound=4294 start=0 − doc int name=resource_id10018/int double name=geo_distance26.16691883965225/double /doc − doc int name=resource_id10102/int double name=geo_distance39.90588996589528/double /doc /result − lst name=collapse_counts str name=fieldresource_id/str − lst name=doc int name=10022116/int int name=117014/int /lst − lst name=count int name=10015116/int int name=100184/int /lst − lst name=debug str name=Docset typeBitDocSet(5201)/str long name=Total collapsing time(ms)46/long long name=Create uncollapsed docset(ms)22/long long name=Collapsing normal time(ms)24/long long name=Creating collapseinfo time(ms)0/long long name=Convert to bitset time(ms)0/long long name=Create collapsed docset time(ms)0/long /lst /lst − result name=response numFound=5201 start=0 − doc int name=resource_id10015/int /doc − doc int name=resource_id10018/int /doc /result /response was (Author: csnirkhe): Here is the solfconfig file. requestHandler name=geo class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str /lst arr name=components strlocalsolr/str strcollapse/str /arr /requestHandler Following are the results I am getting : response − lst name=responseHeader int name=status0/int int name=QTime146/int − lst name=params str name=lat41.883784/str str name=radius50/str str name=collapse.fieldresource_id/str str name=rows2/str str name=indenton/str str name=flresource_id,geo_distance/str str name=qTV/str str name=qtgeo/str str name=long-87.637668/str /lst /lst − result name=response numFound=4294 start=0 − doc int name=resource_id10018/int double name=geo_distance26.16691883965225/double /doc − doc int name=resource_id10102/int double name=geo_distance39.90588996589528/double /doc /result − lst name=collapse_counts str name=fieldresource_id/str − lst name=doc int name=10022116/int int name=117014/int /lst − lst name=count int name=10015116/int int name=100184/int /lst − lst name=debug str name=Docset typeBitDocSet(5201)/str long name=Total collapsing time(ms)46/long long name=Create uncollapsed docset(ms)22/long long name=Collapsing normal time(ms)24/long long name=Creating collapseinfo time(ms)0/long long name=Convert to bitset time(ms)0/long long name=Create collapsed docset time(ms)0/long /lst /lst − result name=response numFound=5201 start=0 − doc int name=resource_id10015/int /doc − doc int name=resource_id10018/int /doc /result /response Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new
[jira] Assigned: (SOLR-1219) use setproxy ant task when proxy properties are specified
[ https://issues.apache.org/jira/browse/SOLR-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-1219: Assignee: Koji Sekiguchi I'll commit soon. use setproxy ant task when proxy properties are specified - Key: SOLR-1219 URL: https://issues.apache.org/jira/browse/SOLR-1219 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Trivial Fix For: 1.4 Attachments: SOLR-1219.patch Currectly, ant luke and ant example will be fail if you use proxy: {code} $ ant luke build.xml:881: HTTP Authorization failure {code} To avoid this, use setproxy ant task when properties are specified by the user: {code} $ ant luke -Dproxy.host=hostname -Dproxy.port=8080 -Dproxy.user=user -Dproxy.password=passwd {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1219) use setproxy ant task when proxy properties are specified
[ https://issues.apache.org/jira/browse/SOLR-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1219: - Fix Version/s: 1.4 use setproxy ant task when proxy properties are specified - Key: SOLR-1219 URL: https://issues.apache.org/jira/browse/SOLR-1219 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Trivial Fix For: 1.4 Attachments: SOLR-1219.patch Currectly, ant luke and ant example will be fail if you use proxy: {code} $ ant luke build.xml:881: HTTP Authorization failure {code} To avoid this, use setproxy ant task when properties are specified by the user: {code} $ ant luke -Dproxy.host=hostname -Dproxy.port=8080 -Dproxy.user=user -Dproxy.password=passwd {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1219) use setproxy ant task when proxy properties are specified
[ https://issues.apache.org/jira/browse/SOLR-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1219. -- Resolution: Fixed Committed revision 785029. use setproxy ant task when proxy properties are specified - Key: SOLR-1219 URL: https://issues.apache.org/jira/browse/SOLR-1219 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Trivial Fix For: 1.4 Attachments: SOLR-1219.patch Currectly, ant luke and ant example will be fail if you use proxy: {code} $ ant luke build.xml:881: HTTP Authorization failure {code} To avoid this, use setproxy ant task when properties are specified by the user: {code} $ ant luke -Dproxy.host=hostname -Dproxy.port=8080 -Dproxy.user=user -Dproxy.password=passwd {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1220) UnInvertedField performance improvement on fields with an extremely large number of values
UnInvertedField performance improvement on fields with an extremely large number of values -- Key: SOLR-1220 URL: https://issues.apache.org/jira/browse/SOLR-1220 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Kent Fitch Priority: Minor Our setup is : - about 34M lucene documents of bibliographic and full text content - index currently 115GB, will at least double over next 6 months - moving to support real-time-ish updates (maybe 5 min delay) We facet on 8 fields, 6 of which are normal with small numbers of distinct values. But 2 faceted fields, creator and subject, are huge, with 18M and 9M terms respectively. On a server with 2xquad core AMD 2382 processors and 64GB memory, java 1.6.0_13-b03, 64 bit run with -Xmx15192M -Xms6000M -verbose:gc, with the index on Intel X25M SSD, on start-up the elapsed time to create the 8 facets is 306 seconds (best time). Following an index reopen, the time to recreate them in 318 seconds (best time). [We have made an independent experimental change to create the facets with 3 async threads, that is, in parallel, and also to decouple them from the underlying index, so our facets lag the index changes by the time to recreate the facets. With our setup, the 3 threads reduced facet creation elapsed time from about 450 secs to around 320 secs, but this will depend a lot on IO capabilities of the device containing the index, amount of file system caching, load, etc] Anyway, we noticed that huge amounts of garbage were being collected during facet generation of the creator and subject fields, and tracked it down to this decision in UnInvertedField univert(): if (termNum = maxTermCounts.length) { // resize, but conserve memory by not doubling // resize at end??? we waste a maximum of 16K (average of 8K) int[] newMaxTermCounts = new int[maxTermCounts.length+4096]; System.arraycopy(maxTermCounts, 0, newMaxTermCounts, 0, termNum); maxTermCounts = newMaxTermCounts; } So, we tried the obvious thing: - allocate 10K terms initially, rather than 1K - extend by doubling the current size, rather than adding a fixed 4K - free unused space at the end (but only if unused space is significant) by reallocating the array to the exact required size And also: - created a static HashMap lookup keyed on field name which remembers the previous allocated size for maxTermCounts for that field, and initially allocates that size + 1000 entries The second change is a minor optimisation, but the first change, by eliminating thousands of array reallocations and copies, greatly improved load times, down from 306 to 124 seconds on the initial load and from 318 to 134 seconds on reloads after index updates. About 60-70 secs is still spend in GC, but it is a significant improvement. Unless you have very large numbers of facet values, this change won't have any positive benefit. The core part of our change is reflected by this diff against revision 785058: *** *** 222,232 int termNum = te.getTermNumber(); if (termNum = maxTermCounts.length) { ! // resize, but conserve memory by not doubling ! // resize at end??? we waste a maximum of 16K (average of 8K) ! int[] newMaxTermCounts = new int[maxTermCounts.length+4096]; System.arraycopy(maxTermCounts, 0, newMaxTermCounts, 0, termNum); maxTermCounts = newMaxTermCounts; } --- 222,232 int termNum = te.getTermNumber(); if (termNum = maxTermCounts.length) { ! // resize by doubling - for very large number of unique terms, expanding ! // by 4K and resultant GC will dominate uninvert times. Resize at end if material ! int[] newMaxTermCounts = new int[maxTermCounts.length*2]; System.arraycopy(maxTermCounts, 0, newMaxTermCounts, 0, termNum); maxTermCounts = newMaxTermCounts; } *** *** 331,338 --- 331,346 numTermsInField = te.getTermNumber(); te.close(); + // free space if outrageously wasteful (tradeoff memory/cpu) + + if ((maxTermCounts.length - numTermsInField) 1024) { // too much waste! + int[] newMaxTermCounts = new int[numTermsInField]; + System.arraycopy(maxTermCounts, 0, newMaxTermCounts, 0, numTermsInField); + maxTermCounts = newMaxTermCounts; +} + long midPoint = System.currentTimeMillis(); if (termInstances == 0) { // we didn't invert anything -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1216) disambiguate the replication command names
[ https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719920#action_12719920 ] Shalin Shekhar Mangar commented on SOLR-1216: - bq. If we choose a name for the thing we are pulling, like image, then we can use makeimage, pullimage, etc. How about pullIndex? disambiguate the replication command names -- Key: SOLR-1216 URL: https://issues.apache.org/jira/browse/SOLR-1216 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1216.patch There is a lot of confusion in the naming of various commands such as snappull, snapshot etc. This is a vestige of the script based replication we currently have. The commands can be renamed to make more sense * 'snappull' to be renamed to 'sync' * 'snapshot' to be renamed to 'backup' thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1216) disambiguate the replication command names
[ https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719922#action_12719922 ] Noble Paul commented on SOLR-1216: -- image' gives the same idea of snapshot. it suggests that an image of the index exists how about 'fetchIndex' and 'abortfetch' ? disambiguate the replication command names -- Key: SOLR-1216 URL: https://issues.apache.org/jira/browse/SOLR-1216 Project: Solr Issue Type: Improvement Components: replication (java) Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1216.patch There is a lot of confusion in the naming of various commands such as snappull, snapshot etc. This is a vestige of the script based replication we currently have. The commands can be renamed to make more sense * 'snappull' to be renamed to 'sync' * 'snapshot' to be renamed to 'backup' thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1220) UnInvertedField performance improvement on fields with an extremely large number of values
[ https://issues.apache.org/jira/browse/SOLR-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719924#action_12719924 ] Yonik Seeley commented on SOLR-1220: Thanks Kent, but the patch was mangled by JIRA. The normal procedure is to do an svn diff SOLR-NNN.patch and attach that file to the issue via attachFile. That also allows you to click the grant license to ASF button to help us with our intellectual property tracking. UnInvertedField performance improvement on fields with an extremely large number of values -- Key: SOLR-1220 URL: https://issues.apache.org/jira/browse/SOLR-1220 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Kent Fitch Priority: Minor Our setup is : - about 34M lucene documents of bibliographic and full text content - index currently 115GB, will at least double over next 6 months - moving to support real-time-ish updates (maybe 5 min delay) We facet on 8 fields, 6 of which are normal with small numbers of distinct values. But 2 faceted fields, creator and subject, are huge, with 18M and 9M terms respectively. On a server with 2xquad core AMD 2382 processors and 64GB memory, java 1.6.0_13-b03, 64 bit run with -Xmx15192M -Xms6000M -verbose:gc, with the index on Intel X25M SSD, on start-up the elapsed time to create the 8 facets is 306 seconds (best time). Following an index reopen, the time to recreate them in 318 seconds (best time). [We have made an independent experimental change to create the facets with 3 async threads, that is, in parallel, and also to decouple them from the underlying index, so our facets lag the index changes by the time to recreate the facets. With our setup, the 3 threads reduced facet creation elapsed time from about 450 secs to around 320 secs, but this will depend a lot on IO capabilities of the device containing the index, amount of file system caching, load, etc] Anyway, we noticed that huge amounts of garbage were being collected during facet generation of the creator and subject fields, and tracked it down to this decision in UnInvertedField univert(): if (termNum = maxTermCounts.length) { // resize, but conserve memory by not doubling // resize at end??? we waste a maximum of 16K (average of 8K) int[] newMaxTermCounts = new int[maxTermCounts.length+4096]; System.arraycopy(maxTermCounts, 0, newMaxTermCounts, 0, termNum); maxTermCounts = newMaxTermCounts; } So, we tried the obvious thing: - allocate 10K terms initially, rather than 1K - extend by doubling the current size, rather than adding a fixed 4K - free unused space at the end (but only if unused space is significant) by reallocating the array to the exact required size And also: - created a static HashMap lookup keyed on field name which remembers the previous allocated size for maxTermCounts for that field, and initially allocates that size + 1000 entries The second change is a minor optimisation, but the first change, by eliminating thousands of array reallocations and copies, greatly improved load times, down from 306 to 124 seconds on the initial load and from 318 to 134 seconds on reloads after index updates. About 60-70 secs is still spend in GC, but it is a significant improvement. Unless you have very large numbers of facet values, this change won't have any positive benefit. The core part of our change is reflected by this diff against revision 785058: *** *** 222,232 int termNum = te.getTermNumber(); if (termNum = maxTermCounts.length) { ! // resize, but conserve memory by not doubling ! // resize at end??? we waste a maximum of 16K (average of 8K) ! int[] newMaxTermCounts = new int[maxTermCounts.length+4096]; System.arraycopy(maxTermCounts, 0, newMaxTermCounts, 0, termNum); maxTermCounts = newMaxTermCounts; } --- 222,232 int termNum = te.getTermNumber(); if (termNum = maxTermCounts.length) { ! // resize by doubling - for very large number of unique terms, expanding ! // by 4K and resultant GC will dominate uninvert times. Resize at end if material ! int[] newMaxTermCounts = new int[maxTermCounts.length*2]; System.arraycopy(maxTermCounts, 0, newMaxTermCounts, 0, termNum); maxTermCounts = newMaxTermCounts; } *** *** 331,338 --- 331,346 numTermsInField = te.getTermNumber(); te.close(); + // free space if outrageously wasteful (tradeoff memory/cpu) + + if ((maxTermCounts.length - numTermsInField) 1024) { // too much waste! + int[]