Re: SOLR 3.3 DIH and Java 1.6
Hello, Have you tried jdk 6 from Oracle? On Tue, Mar 20, 2012 at 8:41 AM, randolf.julian randolf.jul...@dominionenterprises.com wrote: I am trying to use the data import handler to update SOLR index with Oracle data. In the SOLR schema, a dynamic field called PHOTO_* has been defined. I created a script transformer: script and called it in a query: entity name=photo transformer=script:pivotPhotos query=select p.path||','||p.photo_barcode||','||p.display_order REC_PHOTO, lpad(p.display_order,3,'0') SEQUENCE_NUMBER from traderadm.photo p where p.realm_id = '${ad.REALM_ID}' and p.ad_id = '${ad.AD_ID}' order by p.display_order/ However, whenever I run a full import, it fails with this error in the solr0.log file: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: lt;scriptgt; can be used only in java 6 or above Here's the output of my java version: $ java -version java version 1.6.0_0 OpenJDK Runtime Environment (IcedTea6 1.6) (rhel-1.13.b16.el5-x86_64) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) I believe we are using java 6. I am lost with this error and need help on why this is happening. Thanks. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3841355.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Why my email always been rejected?
I send email to :solr-user@lucene.apache.org, but I always receive the rejected email. It can't send successful.
RE: SOLR 3.3 DIH and Java 1.6
Some versions of the OpenJDK doesn´t include the Rhino Engine to run javascript dataimport. You have to use the Oracle JDK. Juampa. De: randolf.julian [randolf.jul...@dominionenterprises.com] Enviado el: martes, 20 de marzo de 2012 5:41 Para: solr-user@lucene.apache.org Asunto: SOLR 3.3 DIH and Java 1.6 I am trying to use the data import handler to update SOLR index with Oracle data. In the SOLR schema, a dynamic field called PHOTO_* has been defined. I created a script transformer: script and called it in a query: entity name=photo transformer=script:pivotPhotos query=select p.path||','||p.photo_barcode||','||p.display_order REC_PHOTO, lpad(p.display_order,3,'0') SEQUENCE_NUMBER from traderadm.photo p where p.realm_id = '${ad.REALM_ID}' and p.ad_id = '${ad.AD_ID}' order by p.display_order/ However, whenever I run a full import, it fails with this error in the solr0.log file: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: lt;scriptgt; can be used only in java 6 or above Here's the output of my java version: $ java -version java version 1.6.0_0 OpenJDK Runtime Environment (IcedTea6 1.6) (rhel-1.13.b16.el5-x86_64) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) I believe we are using java 6. I am lost with this error and need help on why this is happening. Thanks. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3841355.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is the SolrJ call to add collection of documents a blocking function call ?
Hi Ramdev, add() is a blocking call. Otherwise it had to start an own background thread which is not what a library like Solrj should do (how many threads at most? At which priority? Which thread group? How long keep them pooled?) And, additionally, you might want to know whether the transmission was successful, or whether your guinea pig has eaten the network cable just in the middle of the transmission. But it's easy to write your own background task that adds your documents to the Solr server. Using Java's ExecutionService class, this is done within two minutes. Greetings, Kuli Am 19.03.2012 16:48, schrieb ramdev.wud...@thomsonreuters.com: Hi: I am trying to index a collection of SolrInputDocs to a Solr server. I was wondering if the call I make to add the documents (the add(CollectionSolrInputDocument) call ) is a blocking function call ? I would also like to know if the add call is a call that would take longer for a larger collection of documents Thanks Ramdev
Why does parameter useCompoundFile not work?
Dear all, I want to generate compound type index instead of files contain fdt,fdx etc. I follow the suggestion to change the useCompoundFile parameter to true (both in indexDefaults and mainIndex) in solrconfig.xml, but when i use post.jar to post example xml file, i find the index is the same as before, not only 3 files including 1 cfs 2 segment files. Could anyone tell me how to generate cfs file for index? And then could anyone tell me why the situation happens to me? Best regards, Thanks, Moss -- View this message in context: http://lucene.472066.n3.nabble.com/Why-does-parameter-useCompoundFile-not-work-tp3841702p3841702.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PorterStemmer using example schema and data
I tried that, and it seems like recharging and rechargeable, for example, actually do stem to the same root (recharg). So why is it not working when I'm searching on my indexed sampledocs? The stemming works when I search for videos and it's actually video in the document, etc., but not for rechargeable-recharging or capability-capable, etc., even though they stem to the same root when i check them on the Admin/analysis page. What am I overlooking? On March 16, 2012 at 2:17 PM Erick Erickson erickerick...@gmail.com wrote: What you think the results of stemming should be and what they actually are sometimes differ G... Look at the admin/analysis page, check the verbose boxes and try recharging rechargeable and you'll see, step by step, the results of each element of the analysis chain. Since the Porter stemmer is algorithmic, I'm betting that these don't stem to the same root. Best Erick On Thu, Mar 15, 2012 at 7:05 AM, Birkmann, Magdalena magdalena.birkm...@open-xchange.com wrote: Hey there, I've been working through the Solr Tutorial (http://lucene.apache.org/solr/tutorial.html), using the example schema and documents, just working through step by step trying everything out. Everything worked out the way it should (just using the example queries and stuff), except for the stemming (A search for features:recharging http://localhost:8983/solr/select/?indent=onq=features:rechargingfl=name,features should match Rechargeable due to stemming with the EnglishPorterFilter, but doesn't). I've been the using the example directory exactly the way it was when downloading it, without changing anything. Since I'm fairly new to all of this and don't quite understand yet how all of it works or should work, I don't really know where the problem lies or how to configure anything to make it work, so I just thought I'd ask here, since you all seem so nice :) Thanks a lot in advance, Magda
Staggering Replication start times
I am playing with an index that is sharded many times, between 64 and 128. One thing I noticed is that with replication set to happen every 5 minutes, it means that each slave hits the master at the same moment asking for updates: :00:00, :05:00, :10:00, :15:00 etc. Replication takes very little time, so it seems like I may be flooding the network with a bunch of traffic requests, and then goes away. I tweaked the replication start time code to instead just start 5 minutes after a shard starts up, which means instead of all of the slaves hitting at the same moment, they are a bit staggered. :00:00, :00:01, :00:02, :00:04 etcetera. Which presumably will use my network pipe more efficiently. Any thoughts on this? I know it means the slaves are more likely to be slightly out of sync, but over a 5 minute range will get back in sync. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Apache Solr 3 Enterprise Search Server available from http://www.packtpub.com/apache-solr-3-enterprise-search-server/book This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
Re: To truncate or not to truncate (group.truncate vs. facet)
Faceting is orthogonal to grouping, so be careful what you ask for. So adding faceting would be easy, the only reason I suggested grouping is your requirement that your brands be just a count of the number of distinct ones found, not the number of matching docs. So a really simple solution would be to forget about grouping and just facet. Then have your application change the counts for all the brand entries to 1. Best Erick On Mon, Mar 19, 2012 at 5:23 PM, rasser r...@vertica.dk wrote: I see your point. If I understand it correct it will however mean that i need to return 10(brands)x100(resultToShow) = 1000 docs to facilitate that all 100 results to show is of the same brand. Correnct? And tomorrow (or later) the customer will also want a facet on 5 new fields eg. production year. How could this be handled with the above approach? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3840406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is the SolrJ call to add collection of documents a blocking function call ?
Also consider StreamingUpdateSolrServer if you want multiple threads to operate from your client. Best Erick On Tue, Mar 20, 2012 at 4:12 AM, Michael Kuhlmann k...@solarier.de wrote: Hi Ramdev, add() is a blocking call. Otherwise it had to start an own background thread which is not what a library like Solrj should do (how many threads at most? At which priority? Which thread group? How long keep them pooled?) And, additionally, you might want to know whether the transmission was successful, or whether your guinea pig has eaten the network cable just in the middle of the transmission. But it's easy to write your own background task that adds your documents to the Solr server. Using Java's ExecutionService class, this is done within two minutes. Greetings, Kuli Am 19.03.2012 16:48, schrieb ramdev.wud...@thomsonreuters.com: Hi: I am trying to index a collection of SolrInputDocs to a Solr server. I was wondering if the call I make to add the documents (the add(CollectionSolrInputDocument) call ) is a blocking function call ? I would also like to know if the add call is a call that would take longer for a larger collection of documents Thanks Ramdev
Re: is the SolrJ call to add collection of documents a blocking function call ?
Hmm nice feature Erik -- View this message in context: http://lucene.472066.n3.nabble.com/is-the-SolrJ-call-to-add-collection-of-documents-a-blocking-function-call-tp3839387p3842232.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why my email always been rejected?
I received it...sometimes it just needs some time. 2012/3/20 怪侠 87863...@qq.com I send email to :solr-user@lucene.apache.org, but I always receive the rejected email. It can't send successful. -- ** *Travis Low, Director of Development* ** t...@4centurion.com* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* http://www.centurionresearch.com **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
SolrCloud replica and leader out of Sync somehow
I'm trying to figure out how it's possible for 2 solr instances (1 which is leader 1 is replica) to be out of sync. I've done commits to the solr instances, forced replication but still the solr instances have different info. The relevant snippet from my clusterstate.json is listed below. \shard3\:{ \host2:7577_solr_shard3-core2\:{ \shard\:\shard3\, \leader\:\true\, \state\:\active\, \core\:\shard3-core2\, \collection\:\collection1\, \node_name\:\host2:7577_solr\, \base_url\:\http://host2:7577/solr\}, \host1:7575_solr_shard3-core1\:{ \shard\:\shard3\, \state\:\active\, \core\:\shard3-core1\, \collection\:\collection1\, \node_name\:\host1:7575_solr\, \base_url\:\http://host1:7575/solr\}}, Where can I look to see why this is happening?
RE: SOLR 3.3 DIH and Java 1.6
Thanks Mikhail and Juampa. How can I prove to our Systems guys that the Rhino Engine is not installed? This is the only way that I can prove that it's not installed and we have to have it for SOLR data importhandler script to run. Thanks again. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3842520.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: querying on shards
On 3/19/2012 11:55 PM, Ankita Patil wrote: Hi, I wanted to know whether it is feasible to query on all the shards even if the query yields data only from a few shards n not all. Or is it better to mention those shards explicitly from which we get the data and only query on them. for example : I have 4 shards. Now I have a query which yields data only from 2 shards. So shoud I select those 2 shards only and query on them or it is ok to query on all the shards? Will that affect the performance in any way? I use a sharded index, but I am not a seasoned Java/Solr/Lucene developer. My clients do not use the shards parameter themselves - they talk to a a load balancer, which in turn talks to a special core that has the shards in its request handler config and has no index of its own. I call it a broker, because that is what our previous search product (EasyAsk) called it. As I understand things, the performance of your slowest shard, whether that is because of index size on that shard or the underlying hardware, will be a large factor in the performance of the entire index. A distributed query sends an identical query to all the shards it is configured for. It gathers all those results in parallel and builds a final result to send to the client. You MIGHT get better performance by not including the other shards. If the no results shard query returns super-fast, it probably won't really make any difference. If it takes a long time to get the answer that there are no results, then removing them would make things go faster. That requires intelligence on the client to know where the data is. If the client does not know where the data is, it is safer to simply include all the shards. Thanks, Shawn
RE: SOLR 3.3 DIH and Java 1.6
Taking a quick look at the code, it seems this exception could have been thrown for four reasons: (see org.apache.solr.handler.dataimport.ScriptTransformer#initEngine) 1. Your JRE doesn't have class javax.script.ScriptEngineManager (pre 1.6, loaded here via reflection) 2. Your JRE doesn't have any installed scripting engines. This little program outputs 1 engine on my JRE with 6 aliases: [js, rhino, JavaScript, javascript, ECMAScript, ecmascript] - import javax.script.ScriptEngineFactory; import javax.script.ScriptEngineManager; public class TestScripting { public static void main(String args[]) { ScriptEngineManager sem = new ScriptEngineManager(); for(ScriptEngineFactory sef : sem.getEngineFactories()) { System.out.println(sef.getNames()); } } } - 3. You specified an unsupported scripting engine name in the language parameter (see http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer) 4. The script you wrote in the script tag has errors. Unfortunately, it looks like all 4 of these things are being checked in the same try/catch block. So you could have any of these problems and are getting a potentially misleading error message. One way to eliminate all #1#2 is to run the test org.apache.solr.handler.dataimport.TestScriptTransformer on your JRE and see if it passes. (see here for how: http://wiki.apache.org/solr/HowToContribute#Unit_Tests) James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: randolf.julian [mailto:randolf.jul...@dominionenterprises.com] Sent: Tuesday, March 20, 2012 9:24 AM To: solr-user@lucene.apache.org Subject: RE: SOLR 3.3 DIH and Java 1.6 Thanks Mikhail and Juampa. How can I prove to our Systems guys that the Rhino Engine is not installed? This is the only way that I can prove that it's not installed and we have to have it for SOLR data importhandler script to run. Thanks again. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3842520.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replica and leader out of Sync somehow
Do you have the logs for this? Either around startup or when you are forcing replication. Logs around both would be helpful. Also the doc counts for each shard? On Mar 20, 2012, at 10:16 AM, Jamie Johnson wrote: I'm trying to figure out how it's possible for 2 solr instances (1 which is leader 1 is replica) to be out of sync. I've done commits to the solr instances, forced replication but still the solr instances have different info. The relevant snippet from my clusterstate.json is listed below. \shard3\:{ \host2:7577_solr_shard3-core2\:{ \shard\:\shard3\, \leader\:\true\, \state\:\active\, \core\:\shard3-core2\, \collection\:\collection1\, \node_name\:\host2:7577_solr\, \base_url\:\http://host2:7577/solr\}, \host1:7575_solr_shard3-core1\:{ \shard\:\shard3\, \state\:\active\, \core\:\shard3-core1\, \collection\:\collection1\, \node_name\:\host1:7575_solr\, \base_url\:\http://host1:7575/solr\}}, Where can I look to see why this is happening? - Mark Miller lucidimagination.com
Re: SolrCloud replica and leader out of Sync somehow
DocCounts are the same. I am going to disable my custom component to see if that is mucking with something but it seems to be working properly. After looking at the results a little closer (expanding the number of results coming back) it seems that the same information is in both but the order in which the items are being returned is not the same. I'm sorting by score when they seem to be in different orders, if I sort by key then the results look the same. On Tue, Mar 20, 2012 at 10:52 AM, Mark Miller markrmil...@gmail.com wrote: Do you have the logs for this? Either around startup or when you are forcing replication. Logs around both would be helpful. Also the doc counts for each shard? On Mar 20, 2012, at 10:16 AM, Jamie Johnson wrote: I'm trying to figure out how it's possible for 2 solr instances (1 which is leader 1 is replica) to be out of sync. I've done commits to the solr instances, forced replication but still the solr instances have different info. The relevant snippet from my clusterstate.json is listed below. \shard3\:{ \host2:7577_solr_shard3-core2\:{ \shard\:\shard3\, \leader\:\true\, \state\:\active\, \core\:\shard3-core2\, \collection\:\collection1\, \node_name\:\host2:7577_solr\, \base_url\:\http://host2:7577/solr\}, \host1:7575_solr_shard3-core1\:{ \shard\:\shard3\, \state\:\active\, \core\:\shard3-core1\, \collection\:\collection1\, \node_name\:\host1:7575_solr\, \base_url\:\http://host1:7575/solr\}}, Where can I look to see why this is happening? - Mark Miller lucidimagination.com
Re: SolrCloud replica and leader out of Sync somehow
ok, with my custom component out of the picture I still have the same issue. Specifically, when sorting by score on a leader and replica I am getting different doc orderings. Is this something anyone has seen? On Tue, Mar 20, 2012 at 11:09 AM, Jamie Johnson jej2...@gmail.com wrote: DocCounts are the same. I am going to disable my custom component to see if that is mucking with something but it seems to be working properly. After looking at the results a little closer (expanding the number of results coming back) it seems that the same information is in both but the order in which the items are being returned is not the same. I'm sorting by score when they seem to be in different orders, if I sort by key then the results look the same. On Tue, Mar 20, 2012 at 10:52 AM, Mark Miller markrmil...@gmail.com wrote: Do you have the logs for this? Either around startup or when you are forcing replication. Logs around both would be helpful. Also the doc counts for each shard? On Mar 20, 2012, at 10:16 AM, Jamie Johnson wrote: I'm trying to figure out how it's possible for 2 solr instances (1 which is leader 1 is replica) to be out of sync. I've done commits to the solr instances, forced replication but still the solr instances have different info. The relevant snippet from my clusterstate.json is listed below. \shard3\:{ \host2:7577_solr_shard3-core2\:{ \shard\:\shard3\, \leader\:\true\, \state\:\active\, \core\:\shard3-core2\, \collection\:\collection1\, \node_name\:\host2:7577_solr\, \base_url\:\http://host2:7577/solr\}, \host1:7575_solr_shard3-core1\:{ \shard\:\shard3\, \state\:\active\, \core\:\shard3-core1\, \collection\:\collection1\, \node_name\:\host1:7575_solr\, \base_url\:\http://host1:7575/solr\}}, Where can I look to see why this is happening? - Mark Miller lucidimagination.com
Re: SolrCloud replica and leader out of Sync somehow
On Tue, Mar 20, 2012 at 11:17 AM, Jamie Johnson jej2...@gmail.com wrote: ok, with my custom component out of the picture I still have the same issue. Specifically, when sorting by score on a leader and replica I am getting different doc orderings. Is this something anyone has seen? This is certainly possible and expected - sorting tiebreakers is by internal lucene docid, which can change (even on a single node!) If you need lists that don't shift around due to unrelated changes, make sure you don't have any ties! -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SolrCloud replica and leader out of Sync somehow
HmmmOk, I don't see how it's possible for me to ensure that there are no ties. If a query were for *:* everything has a constant score, if the user requested 1 page then requested the next the results on the second page could be duplicates from what was on the first page. I don't remember ever seeing this issue on older versions of SolrCloud, although from what you're saying I should have. What could explain why I never saw this before? Another possible fix to ensure proper ordering couldn't we always specify a sort order which contained the key? So for instance the user asks for score asc, we'd make this score asc,key asc so that results would be order by score and then by key so the results across pages would be consistent? On Tue, Mar 20, 2012 at 11:30 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Mar 20, 2012 at 11:17 AM, Jamie Johnson jej2...@gmail.com wrote: ok, with my custom component out of the picture I still have the same issue. Specifically, when sorting by score on a leader and replica I am getting different doc orderings. Is this something anyone has seen? This is certainly possible and expected - sorting tiebreakers is by internal lucene docid, which can change (even on a single node!) If you need lists that don't shift around due to unrelated changes, make sure you don't have any ties! -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: SolrCloud replica and leader out of Sync somehow
On Tue, Mar 20, 2012 at 11:39 AM, Jamie Johnson jej2...@gmail.com wrote: HmmmOk, I don't see how it's possible for me to ensure that there are no ties. If a query were for *:* everything has a constant score, if the user requested 1 page then requested the next the results on the second page could be duplicates from what was on the first page. I don't remember ever seeing this issue on older versions of SolrCloud, although from what you're saying I should have. What could explain why I never saw this before? If you use replication only to duplicate an index (and avoid any merges), then you will have identical docids. Another possible fix to ensure proper ordering couldn't we always specify a sort order which contained the key? So for instance the user asks for score asc, we'd make this score asc,key asc so that results would be order by score and then by key so the results across pages would be consistent? Yep. And like I said, this is also an issue even on a single node. docid A can be before docid B, then a segment merge can cause these to be shuffled. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
org.apache.solr.common.SolrException: Internal Server Error
I use the solrJ to index a pdf file. File file = new File(1.pdf); String urlString = constant.getUrl(); StreamingUpdateSolrServer solr = new StreamingUpdateSolrServer( urlString, 1, 1); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest( /update/extract); up.addFile(file); up.setParam(uprefix, attr_); up.setParam(fmap.content, attr_content); up.setParam(literal.id, file.getPath()); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, false); solr.request(up); solr.blockUntilFinished(); When I execute the code, I always get the error:org.apache.solr.common.SolrException: Internal Server Error. What's wrong? Could anyone help me? Thanks very much. -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-common-SolrException-Internal-Server-Error-tp3842862p3842862.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replica and leader out of Sync somehow
I believe we're using replication to only duplicate the index (standard SolrCloud nothing special on our end) so I don't see why the docids wouldn't be the sameam I missing something that is happening there that I am unaware of? On Tue, Mar 20, 2012 at 11:50 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Mar 20, 2012 at 11:39 AM, Jamie Johnson jej2...@gmail.com wrote: HmmmOk, I don't see how it's possible for me to ensure that there are no ties. If a query were for *:* everything has a constant score, if the user requested 1 page then requested the next the results on the second page could be duplicates from what was on the first page. I don't remember ever seeing this issue on older versions of SolrCloud, although from what you're saying I should have. What could explain why I never saw this before? If you use replication only to duplicate an index (and avoid any merges), then you will have identical docids. Another possible fix to ensure proper ordering couldn't we always specify a sort order which contained the key? So for instance the user asks for score asc, we'd make this score asc,key asc so that results would be order by score and then by key so the results across pages would be consistent? Yep. And like I said, this is also an issue even on a single node. docid A can be before docid B, then a segment merge can cause these to be shuffled. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Replication with different schema
Thanks .. i need to index data from one solr to another solr with different analyser .. Now i am able to do this by querying from solr which will be index into another solr NOTE: As the field which i need to reindex is stored so it is easy by as my index has 31 lakh record it is taking lot of time .. (suggest me for better performance) Thanks and Regards, S SYED ABDUL KATHER On Tue, Mar 13, 2012 at 10:05 PM, Erick Erickson [via Lucene] ml-node+s472066n3822752...@n3.nabble.com wrote: Why would you want to? This seems like an XY problem, see: http://people.apache.org/~hossman/#xyproblem See the confFiles section here: http://wiki.apache.org/solr/SolrReplication although it mentions solrconfig.xml, it might work with schema.xml. BUT: This strikes me as really, really dangerous. I'm having a hard time thinking of a use-case that this makes sense for, so be very cautious. Having an index created with one schema and searched on with another is a recipe for disaster IMO unless you're very careful. Best Erick On Tue, Mar 13, 2012 at 3:40 AM, syed kather [hidden email]http://user/SendEmail.jtp?type=nodenode=3822752i=0 wrote: Team, Is it possible to do replication with different Schema in solr ? If not how can i acheive this . Can any one can give an idea to do this advance thanks .. Thanks and Regards, S SYED ABDUL KATHER -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3822752.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3843068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud replica and leader out of Sync somehow
Thanks Yonik, I really appreciate the explanation. It sounds like the best solution for me to solve this is to add the additional sort parameter. That being said is there a significant memory increase to do this when sorting by score? I don't see how with SolrCloud I can avoid doing this, and how others wouldn't need to do the same thing. On Tue, Mar 20, 2012 at 1:38 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Mar 20, 2012 at 1:07 PM, Jamie Johnson jej2...@gmail.com wrote: I believe we're using replication to only duplicate the index (standard SolrCloud nothing special on our end) so I don't see why the docids wouldn't be the sameam I missing something that is happening there that I am unaware of? Each document is pushed to the replicas (i.e. standard whole-index replication is only used in recovery scenarios). If you're using multiple threads to index, then docA can be indexed before docB on one replica and vice-versa on a different replica (or on the leader). Although even if this were not the case, I don't believe Lucene is deterministic in this respect anyway (i.e. indexing identically on two different boxes is not guaranteed to result in the exact same internal document order). -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
RE: SOLR 3.3 DIH and Java 1.6
I also applied a fix to both Trunk/4.x and the 3.x branch (will be in 3.6 when it is released). This should give you better error messages when something goes wrong when ScriptTransformer is invoked. It will tell you that you need 1.6 only if the functionality is absent (case #1 in my last message). In case #2 or #3 it will tell you the language you specified isn't supported. In case #4, it will tell you the script itself is invalid. See https://issues.apache.org/jira/browse/SOLR-3260 . James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Tuesday, March 20, 2012 9:46 AM To: solr-user@lucene.apache.org Subject: RE: SOLR 3.3 DIH and Java 1.6 Taking a quick look at the code, it seems this exception could have been thrown for four reasons: (see org.apache.solr.handler.dataimport.ScriptTransformer#initEngine) 1. Your JRE doesn't have class javax.script.ScriptEngineManager (pre 1.6, loaded here via reflection) 2. Your JRE doesn't have any installed scripting engines. This little program outputs 1 engine on my JRE with 6 aliases: [js, rhino, JavaScript, javascript, ECMAScript, ecmascript] - import javax.script.ScriptEngineFactory; import javax.script.ScriptEngineManager; public class TestScripting { public static void main(String args[]) { ScriptEngineManager sem = new ScriptEngineManager(); for(ScriptEngineFactory sef : sem.getEngineFactories()) { System.out.println(sef.getNames()); } } } - 3. You specified an unsupported scripting engine name in the language parameter (see http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer) 4. The script you wrote in the script tag has errors. Unfortunately, it looks like all 4 of these things are being checked in the same try/catch block. So you could have any of these problems and are getting a potentially misleading error message. One way to eliminate all #1#2 is to run the test org.apache.solr.handler.dataimport.TestScriptTransformer on your JRE and see if it passes. (see here for how: http://wiki.apache.org/solr/HowToContribute#Unit_Tests) James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: randolf.julian [mailto:randolf.jul...@dominionenterprises.com] Sent: Tuesday, March 20, 2012 9:24 AM To: solr-user@lucene.apache.org Subject: RE: SOLR 3.3 DIH and Java 1.6 Thanks Mikhail and Juampa. How can I prove to our Systems guys that the Rhino Engine is not installed? This is the only way that I can prove that it's not installed and we have to have it for SOLR data importhandler script to run. Thanks again. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3842520.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi-valued polyfields - Do they exist in the wild ?
Hi: We have been keen on using polyfields for a while. But we have been restricted from using it because they do not seem to support Multi-values (yet). I am wondering if there are any Custom implementations or is there any ETA on the Solr releases to include Multivalued PolyFields . Thanks for the support Ramde
SV: To truncate or not to truncate (group.truncate vs. facet)
Thanks for taking the time to help me Erick! Just to clarify my desired behavior from the facets. This is the index, notice color is multivalued to represent a model of car that has more than one color: doc field name=skuAudi A4/field field name=brandaudi/field field name=variant_idA4_black/field field name=colorblack/field field name=colorwhite/field /doc doc field name=skuAudi A4/field field name=brandaudi/field field name=variant_idA4_white/field field name=colorwhite/field /doc doc field name=skuVolvo V50/field field name=brandvolvo/field field name=variant_idVolvo_V50/field field name=colorblack/field /doc doc field name=skuAudi A5/field field name=brandaudi/field field name=variant_idA5_white/field field name=colorwhite/field /doc doc field name=skuAudi S8/field field name=brandaudi/field field name=variant_idS8_yellow/field field name=coloryellow/field /doc doc field name=skuAudi S8/field field name=brandaudi/field field name=variant_idS8_black/field field name=colorblack/field field name=colorwhite/field /doc My goal is to to get this facet: brand - audi (3) - since there are 3 audi models (A4,A5 and S8) volvo (1) - since there is only one volvo model (V50) color - black (3) - since all models except except A5 is available in black white (3) - since A4,A5 and S8 is available in white yellow (1) - since only S8 is available in yellow Thanks Fra: Erick Erickson [via Lucene] [ml-node+s472066n3842071...@n3.nabble.com] Sendt: 20. marts 2012 12:42 Til: Rasmus Østergård Emne: Re: To truncate or not to truncate (group.truncate vs. facet) Faceting is orthogonal to grouping, so be careful what you ask for. So adding faceting would be easy, the only reason I suggested grouping is your requirement that your brands be just a count of the number of distinct ones found, not the number of matching docs. So a really simple solution would be to forget about grouping and just facet. Then have your application change the counts for all the brand entries to 1. Best Erick On Mon, Mar 19, 2012 at 5:23 PM, rasser [hidden email]UrlBlockedError.aspx wrote: I see your point. If I understand it correct it will however mean that i need to return 10(brands)x100(resultToShow) = 1000 docs to facilitate that all 100 results to show is of the same brand. Correnct? And tomorrow (or later) the customer will also want a facet on 5 new fields eg. production year. How could this be handled with the above approach? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3840406.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3842071.html To unsubscribe from To truncate or not to truncate (group.truncate vs. facet), click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3838797code=cm9zQHZlcnRpY2EuZGt8MzgzODc5N3wxOTg1NDU0NDUx. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/SV-To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3843321p3843321.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-valued polyfields - Do they exist in the wild ?
On Tue, Mar 20, 2012 at 2:17 PM, ramdev.wud...@thomsonreuters.com wrote: Hi: We have been keen on using polyfields for a while. But we have been restricted from using it because they do not seem to support Multi-values (yet). Poly-fields should support multi-values, it's more what uses them may not. For example LatLon isn't multiValued because it doesn't have a mechanism to correlate multiple values per document. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Thanks All
Here is the core of the SOLRJ client that ended up accomplishing what I wanted String fileName2 = C:\\work\\SolrClient\\data\\worldwartwo.txt; SolrServer server = new StreamingUpdateSolrServer(http://localhost:8080/solr/,20,8); UpdateRequest req = new UpdateRequest(/update/extract); ModifiableSolrParams params = null ; params = new ModifiableSolrParams(); params.add(stream.file, new String[]{fileName2}); params.set(literal.id, fileName2); params.set(captureAttr, false); req.setParams(params); server.request(req); server.commit(); To get this to work correctly, the following server side config was needed (I started from a barebones solr config) 1. Add apache-solr-cell-3.5.0.jar to the solrhost/lib directory (or wherever solr can access jars) as this contains the class ExtractingRequestHandler 2. Add the appropriate handler for /update/extract in the solrconfig.xml (this uses the ExtractingRequestHandler class). I'll blog about this later on for the benefit of the community at large I'm still puzzled that there are no readily available alternatives to using the Tika based ExtractingRequestHandler in the situation where the input data is plain UTF-8 text files that SOLR needs to injest and index. I may need to look into defining a custom Request Handler if that's the right way to go. Thanks again -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3843593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: To truncate or not to truncate (group.truncate vs. facet)
Thanks for taking the time to help me Erick! Just to clarify my desired behavior from the facets. This is the index, notice color is multivalued to represent a model of car that has more than one color: doc field name=skuAudi A4/field field name=brandaudi/field field name=variant_idA4_black/field field name=colorblack/field field name=colorwhite/field /doc doc field name=skuAudi A4/field field name=brandaudi/field field name=variant_idA4_white/field field name=colorwhite/field /doc doc field name=skuVolvo V50/field field name=brandvolvo/field field name=variant_idVolvo_V50/field field name=colorblack/field /doc doc field name=skuAudi A5/field field name=brandaudi/field field name=variant_idA5_white/field field name=colorwhite/field /doc doc field name=skuAudi S8/field field name=brandaudi/field field name=variant_idS8_yellow/field field name=coloryellow/field /doc doc field name=skuAudi S8/field field name=brandaudi/field field name=variant_idS8_black/field field name=colorblack/field field name=colorwhite/field /doc My goal is to to get this facet: brand - audi (3) - since there are 3 audi models (A4,A5 and S8) volvo (1) - since there is only one volvo model (V50) color - black (3) - since all models except except A5 is available in black white (3) - since A4,A5 and S8 is available in white yellow (1) - since only S8 is available in yellow And these 4 results (when query is *:*) - Audi A4 - Audi A5 - Audi S8 - Volvo V50 Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3843596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any way to get reference to original request object from within Solr component?
Hi Hoss, Thanks for the pointers, and sorry, it was a bug in my code (was some dead code which was alphabetizing the facet link text and also the parameters themselves indirectly by reference). I actually ended up building a servlet and a component to print out the multi-valued parameters using HttpServletRequest.getParameterValues(myparam) and ResponseBuilder.req.getParams().getParams(myparam) respectively to isolate the problem. Both of them returned the parameters in the correct order. So I went trolling through the code with a debugger, to observe exactly at what point the order got messed up, and found the bug. FWIW, I am using Tomcat 5.5. Thanks to everybody for their help, and sorry for the noise, guess I should have done the debugger thing before I threw up my hands :-). -sujit On Mar 19, 2012, at 6:55 PM, Chris Hostetter wrote: : I have a custom component which depends on the ordering of a : multi-valued parameter. Unfortunately it looks like the values do not : come back in the same order as they were put in the URL. Here is some : code to explain the behavior: ... : and I notice that the values are ordered differently than [foo, bar, : baz] that I would have expected. I am guessing its because the : SolrParams is a MultiMap structure, so order is destroyed on its way in. a) MultiMapSolrParams does not destroy order on the way in b) when dealing with HTTP requests, the request params actaully use an instance of ServletSolrParams which is backed directly by the ServletRequest.getParameterMap() -- you should get the values returned in the exact order as ServletRequest.getParameterMap().get(myparam) : 1) is there a setting in Solr can use to enforce ordering of : multi-valued parameters? I suppose I could use a single parameter with : comma-separated values, but its a bit late to do that now... Should already be enforced in MultiMapSolrParams and ServletSolrParams : 2) is it possible to use a specific SolrParams object that preserves order? If so how? see above. : 3) is it possible to get a reference to the HTTP request object from within a component? If so how? not out of the box, because there is no garuntee that solr is even running in a servlet container. you can subclass SolrDispatchFilter to do this if you wish (note the comment in the execute() method). My questions to you... 1) what servlet container are you using? 2) have you tested your servlet container with a simple servlet (ie: eliminate solr from the equation) to verify that the ServletRequest.getParameterMap() contains your request values in order? if you debug this and find evidence that something in solr is re-ordering the values in a MultiMapSolrParams or ServletSolrParams *PLEASE* open a jira with a reproducable example .. that would definitley be an anoying bug we should get to the bottom of. -Hoss
Re: Replication with different schema
OK, I was thrown off by your use of schema, I thought you were talking about schema.xml Anyway, assuming you have some kind of loop that pages through the documents via Solr, gets the results and then sends them to another Solr server... yeah, that'll be slow. You have the deep paging problem here. I'd consider dropping into Lucene to spin through the documents, fetch them and then assemble what you need into a new SolrInputDocument that you then send to your new server. You really aren't moving any interesting data. By that I mean by the time things go through your intermediate code, they're pretty much primitive types so the fact that the various Solr indexes have different schemas really isn't relevant. Best Erick On Tue, Mar 20, 2012 at 1:17 PM, in.abdul in.ab...@gmail.com wrote: Thanks .. i need to index data from one solr to another solr with different analyser .. Now i am able to do this by querying from solr which will be index into another solr NOTE: As the field which i need to reindex is stored so it is easy by as my index has 31 lakh record it is taking lot of time .. (suggest me for better performance) Thanks and Regards, S SYED ABDUL KATHER On Tue, Mar 13, 2012 at 10:05 PM, Erick Erickson [via Lucene] ml-node+s472066n3822752...@n3.nabble.com wrote: Why would you want to? This seems like an XY problem, see: http://people.apache.org/~hossman/#xyproblem See the confFiles section here: http://wiki.apache.org/solr/SolrReplication although it mentions solrconfig.xml, it might work with schema.xml. BUT: This strikes me as really, really dangerous. I'm having a hard time thinking of a use-case that this makes sense for, so be very cautious. Having an index created with one schema and searched on with another is a recipe for disaster IMO unless you're very careful. Best Erick On Tue, Mar 13, 2012 at 3:40 AM, syed kather [hidden email]http://user/SendEmail.jtp?type=nodenode=3822752i=0 wrote: Team, Is it possible to do replication with different Schema in solr ? If not how can i acheive this . Can any one can give an idea to do this advance thanks .. Thanks and Regards, S SYED ABDUL KATHER -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3822752.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-with-different-schema-tp3821672p3843068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: To truncate or not to truncate (group.truncate vs. facet)
Ok, assuming sku is an un-tokenized field (and if it isn't, use a copyField) then just facet on that field. Then, at the app layer, combine them to get your aggregate counts. So your raw return would have Audi A4 (2) Audi A5 (1) Audi S8 (2) Volvo V50 (1) The app would have to be smart enough to spin through the sku facet and just know that the three Audi SKUs need to be rolled up into one Audi entry. This could be simple if the rule were that the SKU always started with the brand name And similarly for the other SKUs. Crude, but it'd work. Best Erick On Tue, Mar 20, 2012 at 4:01 PM, rasser r...@vertica.dk wrote: Thanks for taking the time to help me Erick! Just to clarify my desired behavior from the facets. This is the index, notice color is multivalued to represent a model of car that has more than one color: doc field name=skuAudi A4/field field name=brandaudi/field field name=variant_idA4_black/field field name=colorblack/field field name=colorwhite/field /doc doc field name=skuAudi A4/field field name=brandaudi/field field name=variant_idA4_white/field field name=colorwhite/field /doc doc field name=skuVolvo V50/field field name=brandvolvo/field field name=variant_idVolvo_V50/field field name=colorblack/field /doc doc field name=skuAudi A5/field field name=brandaudi/field field name=variant_idA5_white/field field name=colorwhite/field /doc doc field name=skuAudi S8/field field name=brandaudi/field field name=variant_idS8_yellow/field field name=coloryellow/field /doc doc field name=skuAudi S8/field field name=brandaudi/field field name=variant_idS8_black/field field name=colorblack/field field name=colorwhite/field /doc My goal is to to get this facet: brand - audi (3) - since there are 3 audi models (A4,A5 and S8) volvo (1) - since there is only one volvo model (V50) color - black (3) - since all models except except A5 is available in black white (3) - since A4,A5 and S8 is available in white yellow (1) - since only S8 is available in yellow And these 4 results (when query is *:*) - Audi A4 - Audi A5 - Audi S8 - Volvo V50 Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/To-truncate-or-not-to-truncate-group-truncate-vs-facet-tp3838797p3843596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Thanks All
: To get this to work correctly, the following server side config was needed : (I started from a barebones solr config) : 1. Add apache-solr-cell-3.5.0.jar to the solrhost/lib directory (or : wherever solr can access jars) as this contains the class : ExtractingRequestHandler : 2. Add the appropriate handler for /update/extract in the solrconfig.xml : (this uses the ExtractingRequestHandler class). what barebones solr config did you start with? the example configs that ship with solr have included /update/extract since 1.4.0 -Hoss
Re: StreamingUpdateSolrServer - thread exit timeout?
: Is there any way to get get the threads within SUSS objects to immediately : exit without creating other issues? Alternatively, if immediate isn't : possible, the exit could take 1-2 seconds. I could not find any kind of : method in the API that closes down the object. you should take al ook at this thread... http://www.lucidimagination.com/search/document/53dc7e3d2102bb51 -Hoss
Re: Thanks All
If you build it, they will come! On Tue, Mar 20, 2012 at 12:59 PM, vybe3142 vybe3...@gmail.com wrote: I'm still puzzled that there are no readily available alternatives to using the Tika based ExtractingRequestHandler in the situation where the input data is plain UTF-8 text files that SOLR needs to injest and index. I may need to look into defining a custom Request Handler if that's the right way to go. -- Lance Norskog goks...@gmail.com
Re: Staggering Replication start times
For our use case this is a no-no. When the index is updated, we need all indexes to be updated at the same time. We put all indexes (slaves) behind a load balancer and the user would expect the same results from page to page. On Tue, Mar 20, 2012 at 5:36 AM, Eric Pugh ep...@opensourceconnections.com wrote: I am playing with an index that is sharded many times, between 64 and 128. One thing I noticed is that with replication set to happen every 5 minutes, it means that each slave hits the master at the same moment asking for updates: :00:00, :05:00, :10:00, :15:00 etc. Replication takes very little time, so it seems like I may be flooding the network with a bunch of traffic requests, and then goes away. I tweaked the replication start time code to instead just start 5 minutes after a shard starts up, which means instead of all of the slaves hitting at the same moment, they are a bit staggered. :00:00, :00:01, :00:02, :00:04 etcetera. Which presumably will use my network pipe more efficiently. Any thoughts on this? I know it means the slaves are more likely to be slightly out of sync, but over a 5 minute range will get back in sync. Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Apache Solr 3 Enterprise Search Server available from http://www.packtpub.com/apache-solr-3-enterprise-search-server/book This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: StreamingUpdateSolrServer - thread exit timeout?
On 3/20/2012 8:11 PM, Chris Hostetter wrote: : Is there any way to get get the threads within SUSS objects to immediately : exit without creating other issues? Alternatively, if immediate isn't : possible, the exit could take 1-2 seconds. I could not find any kind of : method in the API that closes down the object. you should take al ook at this thread... http://www.lucidimagination.com/search/document/53dc7e3d2102bb51 I've got this in a standalone application with a main(), started from the commandline. When I close it and it calls the shutdown hook, there is nothing from SolrJ logged to my log4j destination, stdout, or stderr. I'm using SolrJ 3.5.0. Is the memory leak you have mentioned still something I need to worry about? Thanks, Shawn