Re: Dataimport handler showing idle status with multiple shards
From: Shawn Heisey <elyog...@elyograg.org> Reply-To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Date: Tuesday, December 5, 2017 at 1:31 PM To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Subject: Re: Dataimport handler showing idle status with multiple shards On 12/5/2017 10:47 AM, Sarah Weissman wrote: I’ve recently been using the dataimport handler to import records from a database into a Solr cloud collection with multiple shards. I have 6 dataimport handlers configured on 6 different paths all running simultaneously against the same DB. I’ve noticed that when I do this I often get “idle” status from the DIH even when the import is still running. The percentage of the time I get an “idle” response seems proportional to the number of shards. I.e., with 1 shard it always shows me non-idle status, with 2 shards I see idle about half the time I check the status, with 96 shards it seems to be showing idle almost all the time. I can see the size of each shard increasing, so I’m sure the import is still going. I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. Does anyone know why the DIH would report idle when it’s running? e.g.: curl http://myserver:8983/solr/collection/dataimport6 To use DIH with SolrCloud, you should be sending your request directly to a shard replica core, not the collection, so that you can be absolutely certain that the import command and the status command are going to the same place. You MIGHT need to also have a distrib=false parameter on the request, but I do not know whether that is required to prevent the load balancing on the dataimport handler. Thanks for the information, Shawn. I am relatively new to Solr cloud and I am used to running the dataimport from the admin dashboard, where it happens at the collection level, so I find it surprising that the right way to do this is at the core level. So, if I want to be able to check the status of my data import for N cores I would need to create N different data import configs that manually partition the collection and start each different config on a different core? That seems like it could get confusing. And then if I wanted to grow or shrink my shards I’d have to rejigger my data import configs every time. I kind of expect a distributed index to hide these details from me. I only have one node at the moment, and I don’t understand how Solr cloud works internally well enough to understand what it means for the data import to be running on a shard vs. a node. It would be nice if doing a status query would at least tell you something, like the number of documents last indexed on that core, even if nothing is currently running. That way at least I could extrapolate how much longer the operation will take.
Re: Dataimport handler showing idle status with multiple shards
On 12/5/2017 10:47 AM, Sarah Weissman wrote: I’ve recently been using the dataimport handler to import records from a database into a Solr cloud collection with multiple shards. I have 6 dataimport handlers configured on 6 different paths all running simultaneously against the same DB. I’ve noticed that when I do this I often get “idle” status from the DIH even when the import is still running. The percentage of the time I get an “idle” response seems proportional to the number of shards. I.e., with 1 shard it always shows me non-idle status, with 2 shards I see idle about half the time I check the status, with 96 shards it seems to be showing idle almost all the time. I can see the size of each shard increasing, so I’m sure the import is still going. I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. Does anyone know why the DIH would report idle when it’s running? e.g.: curl http://myserver:8983/solr/collection/dataimport6 When you send a DIH request to the collection name, SolrCloud is going to load balance that request across the cloud, just like it would with any other request. Solr will look at the list of all responding nodes that host part of the collection and send multiple such requests to different cores (shards/replicas) across the cloud. If there are four cores in the collection and the nodes hosting them are all working, then each of those cores would only see requests to /dataimport about one fourth of the time. DIH imports happen at the core level, NOT the collection level, so when you start an import on a collection with four cores in the cloud, only one of those four cores is actually going to be doing the import, the rest of them are idle. This behavior should happen with any version, so I would expect it in 6.1 as well as 7.1. To use DIH with SolrCloud, you should be sending your request directly to a shard replica core, not the collection, so that you can be absolutely certain that the import command and the status command are going to the same place. You MIGHT need to also have a distrib=false parameter on the request, but I do not know whether that is required to prevent the load balancing on the dataimport handler. A similar question came to this list two days ago, and I replied to that one yesterday. http://lucene.472066.n3.nabble.com/Dataimporter-status-tp4365602p4365879.html Somebody did open an issue a LONG time ago about this problem: https://issues.apache.org/jira/browse/SOLR-3666 I just commented on the issue. Thanks, Shawn
RE: DataImport Handler Out of Memory
https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F -Original Message- From: Deeksha Sharma [mailto:dsha...@flexera.com] Sent: Wednesday, September 27, 2017 1:40 PM To: solr-user@lucene.apache.org Subject: DataImport Handler Out of Memory I am trying to create indexes using dataimport handler (Solr 5.2.1). Data is in mysql db and the number of records are more than 3.5 million. My solr server stops due to OOM (out of memory error). I tried starting solr by giving 12GB of RAM but still no luck. Also, I see that Solr fetches all the documents in 1 request. Is there a way to configure Solr to stream the data from DB or any other solution somewhere may have tried? Note: When my records are nearly 2 Million, I am able to create indexes by giving Solr 10GB of RAM. Your help is appreciated. Thanks Deeksha
Re: Dataimport handler Date
On 7 March 2014 08:50, Pritesh Patel priteshpate...@gmail.com wrote: I'm using the dataimporthandler to index data from a mysql DB. Been running it just fine. I've been using full-imports. I'm now trying implement the delta import functionality. To implement the delta query, you need to be reading the last_index_time from a properties file to know what new to index. So I'm using the parameter: {dataimporter.last_index_time} within my query. The problem is when I use this, the date always is : Thu Jan 01 00:00:00 UTC 1970. It's never actually reading the correct date stored in the dataimport.properties file. [...] I take it that you have verified that the dataimport.properties file exists. What are its contents? Please share the exact DIH configuration file that you use, obfuscating DB password/username. Your cut-and-paste seems to have a syntax error in the deltaQuery (notice the 'jgkg' string): deltaQuery=SELECT node.nid from node where node.type = 'news' and node.status = 1 and (node.changed gt; UNIX_TIMESTAMP('${ dataimporter.last_index_time}'jgkg) or node.created gt; UNIX_TIMESTAMP('${dataimporter.last_index_time}')) What response do you get fromm the delta-import URL? Are there any error messages in your Solr log? Regards, Gora
Re: dataimport handler
I'm guessing that id in your schema.xml is also a unique key field. If so, each document must have an id field or Solr will refuse to index them. DataImportHandler will map the id field in your table to Solr schema's id field only if you have not specified a mapping. On Thu, Jan 23, 2014 at 3:01 AM, tom praveen...@yahoo.com wrote: Hi, I am trying to use dataimporthandler(Solr 4.6) from oracle database, but I have some issues in mapping the data. I have 3 columns in the test_table, column1, column2, id dataconfig.xml entity name=test_table query=select * from test_table field column=column1 name=id / field column=column2 name=name / /entity Issue is, - if I remove the id column from the table, index fails, solr is looking for id column even though it is not mapped in dataconfig.xml. - if I add, it directly maps the id column form the db to solr id, it ignores the column1, even though it is mapped. my problem is I don't have ID in every table, I should be able to map the column I choose from the table to solr Id, any solution will be greatly appreciated. `Tom -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-handler-tp4112830.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Dataimport handler exception when migrating from 4.4 to 4.6. Help needed
On 12/22/2013 9:51 AM, William Pierce wrote: My configurations works nicely with solr 4.4. I am encountering a configuration error when I try to upgrade from 4.4 to 4.6. All I did was the following: a) Replace the 4.4 solr.war file with the 4.6 solr.war in the tomcat/lib folder. I am using version 6.0.36 of tomcat. b) I replaced the solr-dataimporthandler-4.4.0.jar and solr-dataimporthandler-extras-4.4.0.jar with the corresponding 4.6 counterparts in the collection/lib folder. I restarted tomcat. I get the following stack trace (full log is also given below) – there are no other warnings/errors in my log. I have gone through the 4.5 changes to see if I needed to add/modify my DIH configuration – but I am stymied. Any help will be greatly appreciated. ERROR - 2013-12-22 08:05:09.824; org.apache.solr.handler.dataimport.DataImportHandler; Exception while loading DataImporter java.lang.NoSuchMethodError: org.apache.solr.core.SolrCore.getLatestSchema()Lorg/apache/solr/schema/IndexSchema; The method it's complaining about not being there is org.apache.solr.core.SolrCore.getLatestSchema() ... which is in Solr itself, not the dataimport handler. I did some checking. This method did not exist before 4.4.0, so my best guess is that your classloader is loading a SolrCore class from 4.3.1 or earlier, which probably means one of two things: 1) The Solr war you're extracting is not actually version 4.6.0, or 2) you've got jars in your system from one or more older versions. It's a good idea to delete the extracted war data whenever you upgrade Solr -- stop the container, delete the extracted data and all old jars, then replace the .war file and start it back up. Thanks, Shawn
Re: Dataimport handler exception when migrating from 4.4 to 4.6. Help needed
The best practice for upgrading is take the distribution and expand it. Then take your cores and replace it. Then you are guaranteed to get the jars and not have other WARs/JARs hanging around. On Sun, Dec 22, 2013 at 7:24 PM, Shawn Heisey s...@elyograg.org wrote: On 12/22/2013 9:51 AM, William Pierce wrote: My configurations works nicely with solr 4.4. I am encountering a configuration error when I try to upgrade from 4.4 to 4.6. All I did was the following: a) Replace the 4.4 solr.war file with the 4.6 solr.war in the tomcat/lib folder. I am using version 6.0.36 of tomcat. b) I replaced the solr-dataimporthandler-4.4.0.jar and solr-dataimporthandler-extras-4.4.0.jar with the corresponding 4.6 counterparts in the collection/lib folder. I restarted tomcat. I get the following stack trace (full log is also given below) – there are no other warnings/errors in my log. I have gone through the 4.5 changes to see if I needed to add/modify my DIH configuration – but I am stymied. Any help will be greatly appreciated. ERROR - 2013-12-22 08:05:09.824; org.apache.solr.handler.dataimport.DataImportHandler; Exception while loading DataImporter java.lang.NoSuchMethodError: org.apache.solr.core.SolrCore.getLatestSchema()Lorg/apache/solr/schema/IndexSchema; The method it's complaining about not being there is org.apache.solr.core.SolrCore.getLatestSchema() ... which is in Solr itself, not the dataimport handler. I did some checking. This method did not exist before 4.4.0, so my best guess is that your classloader is loading a SolrCore class from 4.3.1 or earlier, which probably means one of two things: 1) The Solr war you're extracting is not actually version 4.6.0, or 2) you've got jars in your system from one or more older versions. It's a good idea to delete the extracted war data whenever you upgrade Solr -- stop the container, delete the extracted data and all old jars, then replace the .war file and start it back up. Thanks, Shawn -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: DataImport Handler, writing a new EntityProcessor
Hi! Thanks for all the advice! I finally did it, the most annoying error that took me the best of a day to figure out was that the state variable here had to be reset: https://bitbucket.org/dermotte/liresolr/src/d27878a71c63842cb72b84162b599d99c4408965/src/main/java/net/semanticmetadata/lire/solr/LireEntityProcessor.java?at=master#cl-56 The EntityProcessor is part of this image search plugin if anyone is interested: https://bitbucket.org/dermotte/liresolr/ :) It's always the small things that are hard to find cheers and thanks, Mathias On Wed, Dec 18, 2013 at 7:26 PM, P Williams williams.tricia.l...@gmail.com wrote: Hi Mathias, I'd recommend testing one thing at a time. See if you can get it to work for one image before you try a directory of images. Also try testing using the solr-testframework using your ide (I use Eclipse) to debug rather than your browser/print statements. Hopefully that will give you some more specific knowledge of what's happening around your plugin. I also wrote an EntityProcessor plugin to read from a properties filehttps://issues.apache.org/jira/browse/SOLR-3928. Hopefully that'll give you some insight about this kind of Solr plugin and testing them. Cheers, Tricia On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote: Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec -- PD Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
RE: DataImport Handler, writing a new EntityProcessor
The first thing I would suggest is to try and run it not in debug mode. DIH's debug mode limits the number of documents it will take in, so that might be all that is wrong here. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias Lux Sent: Wednesday, December 18, 2013 4:04 AM To: solr-user@lucene.apache.org Subject: DataImport Handler, writing a new EntityProcessor Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: DataImport Handler, writing a new EntityProcessor
Unfortunately it is the same in non-debug, just the first document. I also output the params to sout, but it seems only the first one is ever arriving at my custom class. I've the feeling that I'm doing something seriously wrong here, based on a complete misunderstanding :) I basically assume that the nested entity processor will be called for each of the rows that come out from its parent. I've read somewhere, that the data has to be taken from the data source, and I've implemented that, but it doesn't seem to change anything. cheers, Mathias On Wed, Dec 18, 2013 at 3:05 PM, Dyer, James james.d...@ingramcontent.com wrote: The first thing I would suggest is to try and run it not in debug mode. DIH's debug mode limits the number of documents it will take in, so that might be all that is wrong here. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias Lux Sent: Wednesday, December 18, 2013 4:04 AM To: solr-user@lucene.apache.org Subject: DataImport Handler, writing a new EntityProcessor Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec -- PD Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: DataImport Handler, writing a new EntityProcessor
Hi Mathias, I'd recommend testing one thing at a time. See if you can get it to work for one image before you try a directory of images. Also try testing using the solr-testframework using your ide (I use Eclipse) to debug rather than your browser/print statements. Hopefully that will give you some more specific knowledge of what's happening around your plugin. I also wrote an EntityProcessor plugin to read from a properties filehttps://issues.apache.org/jira/browse/SOLR-3928. Hopefully that'll give you some insight about this kind of Solr plugin and testing them. Cheers, Tricia On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote: Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: dataimport handler
Hmm, I will fix. https://issues.apache.org/jira/browse/SOLR-4788 On Thu, May 9, 2013 at 8:35 PM, William Bell billnb...@gmail.com wrote: It does not work anymore in 4.x. ${dih.last_index_time} does work, but the entity version does not. Bill On Tue, May 7, 2013 at 4:19 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Using ${dih.entity_name.last_index_time} should work. Make sure you put it in quotes in your query. On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote: In the data import handler I have multiple entities. Each one generates a date in the dataimport.properties i.e. entityname.last_index_time. How do I reference the specific entity time in my delta queries? Thanks Eric -- Regards, Shalin Shekhar Mangar. -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Regards, Shalin Shekhar Mangar.
Re: dataimport handler
It does not work anymore in 4.x. ${dih.last_index_time} does work, but the entity version does not. Bill On Tue, May 7, 2013 at 4:19 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Using ${dih.entity_name.last_index_time} should work. Make sure you put it in quotes in your query. On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote: In the data import handler I have multiple entities. Each one generates a date in the dataimport.properties i.e. entityname.last_index_time. How do I reference the specific entity time in my delta queries? Thanks Eric -- Regards, Shalin Shekhar Mangar. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: dataimport handler
Using ${dih.entity_name.last_index_time} should work. Make sure you put it in quotes in your query. On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote: In the data import handler I have multiple entities. Each one generates a date in the dataimport.properties i.e. entityname.last_index_time. How do I reference the specific entity time in my delta queries? Thanks Eric -- Regards, Shalin Shekhar Mangar.
Re: Dataimport handler
I also get this. 4.2+ On Fri, Apr 19, 2013 at 10:43 PM, Eric Myers badllam...@gmail.com wrote: I have multiple parallel entities in my document and when I run an import there are times like xxx.last_index_time where xxx is the name of the entity. I tried accessing these using dih.xxx.last_index_time but receive a null value. Is there a way to reference these in my queries. Thanks -- Bill Bell billnb...@gmail.com cell 720-256-8076
RE: DataImport Handler : Transformer Function Eval Failed Error
Looks like it will be helpful. I'm going to give it a shot. Thanks, Otis. Shikhar From: Otis Gospodnetic [otis.gospodne...@gmail.com] Sent: Friday, November 02, 2012 4:36 PM To: solr-user@lucene.apache.org Subject: Re: DataImport Handler : Transformer Function Eval Failed Error Would http://wiki.apache.org/solr/Join do anything for you? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Fri, Nov 2, 2012 at 10:06 AM, Mishra, Shikhar shikhar.mis...@telcobuy.com wrote: We have a scenario where the same products are available from multiple vendors at different prices. We want to store these prices along with the products in the index (product has many prices), so that we can apply dynamic filtering on the prices at the time of search. Thanks, Shikhar -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, November 01, 2012 8:13 PM To: solr-user@lucene.apache.org Subject: Re: DataImport Handler : Transformer Function Eval Failed Error Hi, That looks a little painful... what are you trying to achieve by storing JSON in there? Maybe there's a simpler way to get there... Otis -- Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM, Mishra, Shikhar shikhar.mis...@telcobuy.com wrote: Hi, I'm trying to store a list of JSON objects as stored value for the field prices (see below). I'm getting the following error from the custom transformer function (see the data-config file at the end) of data import handler. Error Message -- - Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 'eval' failed with language: JavaScript and script: function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT hrow(DataImportHandlerException.java:71) Data Import Handler Configuration File dataConfig script ![CDATA[ function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } ]] /script dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST= rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME= ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/ document entity name=item query=select * from wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where prod.mfg_id = mfg.mfg_id and prod.mfg_product_number = 'CON-CBO2-B22HPF' field column=PRODUCT_ID name=id / field column=MFG_PRODUCT_NUMBER name=name / field column=MFG_PRODUCT_NUMBER name=nameSort / field column=MFG_NAME name=manu / field column=MFG_ITEM_NUMBER name=alphaNameSort / field column=DESCRIPTION name=features / field column=DESCRIPTION name=description / entity name=vendor_sources transformer=script:vendorPrices query=SELECT PRICE.WWT_COST, PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod, wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend, wwt_catalog.wwt_product_availability avail WHERE PROD.PRODUCT_ID = price.product_id(+) AND price.vendor_id
RE: DataImport Handler : Transformer Function Eval Failed Error
We have a scenario where the same products are available from multiple vendors at different prices. We want to store these prices along with the products in the index (product has many prices), so that we can apply dynamic filtering on the prices at the time of search. Thanks, Shikhar -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, November 01, 2012 8:13 PM To: solr-user@lucene.apache.org Subject: Re: DataImport Handler : Transformer Function Eval Failed Error Hi, That looks a little painful... what are you trying to achieve by storing JSON in there? Maybe there's a simpler way to get there... Otis -- Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM, Mishra, Shikhar shikhar.mis...@telcobuy.com wrote: Hi, I'm trying to store a list of JSON objects as stored value for the field prices (see below). I'm getting the following error from the custom transformer function (see the data-config file at the end) of data import handler. Error Message -- - Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 'eval' failed with language: JavaScript and script: function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT hrow(DataImportHandlerException.java:71) Data Import Handler Configuration File dataConfig script ![CDATA[ function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } ]] /script dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST= rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME= ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/ document entity name=item query=select * from wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where prod.mfg_id = mfg.mfg_id and prod.mfg_product_number = 'CON-CBO2-B22HPF' field column=PRODUCT_ID name=id / field column=MFG_PRODUCT_NUMBER name=name / field column=MFG_PRODUCT_NUMBER name=nameSort / field column=MFG_NAME name=manu / field column=MFG_ITEM_NUMBER name=alphaNameSort / field column=DESCRIPTION name=features / field column=DESCRIPTION name=description / entity name=vendor_sources transformer=script:vendorPrices query=SELECT PRICE.WWT_COST, PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod, wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend, wwt_catalog.wwt_product_availability avail WHERE PROD.PRODUCT_ID = price.product_id(+) AND price.vendor_id = vend.vendor_id(+) AND PRICE.PRODUCT_ID = avail.product_id(+) AND PRICE.VENDOR_ID = AVAIL.VENDOR_ID(+) AND prod.PRODUCT_ID = '${item.PRODUCT_ID}' /entity /entity /document /dataConfig Are there any syntactic errors in the JavaScript code above? Thanks. Shikhar
Re: DataImport Handler : Transformer Function Eval Failed Error
Hi, That looks a little painful... what are you trying to achieve by storing JSON in there? Maybe there's a simpler way to get there... Otis -- Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM, Mishra, Shikhar shikhar.mis...@telcobuy.com wrote: Hi, I'm trying to store a list of JSON objects as stored value for the field prices (see below). I'm getting the following error from the custom transformer function (see the data-config file at the end) of data import handler. Error Message --- Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 'eval' failed with language: JavaScript and script: function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) Data Import Handler Configuration File dataConfig script ![CDATA[ function vendorPrices(row){ var wwtCost = row.get('WWT_COST'); var listPrice = row.get('LIST_PRICE'); var vendorName = row.get('VENDOR_NAME'); //Below approach fails var prices = []; prices.push({'vendor':vendorName}); prices.push({'wwtCost':wwtCost}); prices.push({'listPrice':listPrice}); row.put('prices':prices); //Below approach works //row.put('prices', '{' + 'vendor:' + vendorName + ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}'); return row; } ]] /script dataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST= rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME= ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/ document entity name=item query=select * from wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where prod.mfg_id = mfg.mfg_id and prod.mfg_product_number = 'CON-CBO2-B22HPF' field column=PRODUCT_ID name=id / field column=MFG_PRODUCT_NUMBER name=name / field column=MFG_PRODUCT_NUMBER name=nameSort / field column=MFG_NAME name=manu / field column=MFG_ITEM_NUMBER name=alphaNameSort / field column=DESCRIPTION name=features / field column=DESCRIPTION name=description / entity name=vendor_sources transformer=script:vendorPrices query=SELECT PRICE.WWT_COST, PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod, wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend, wwt_catalog.wwt_product_availability avail WHERE PROD.PRODUCT_ID = price.product_id(+) AND price.vendor_id = vend.vendor_id(+) AND PRICE.PRODUCT_ID = avail.product_id(+) AND PRICE.VENDOR_ID = AVAIL.VENDOR_ID(+) AND prod.PRODUCT_ID = '${item.PRODUCT_ID}' /entity /entity /document /dataConfig Are there any syntactic errors in the JavaScript code above? Thanks. Shikhar
RE: Dataimport Handler in solr 3.6.1
There were 2 major changes to DIH Cache functionality in Solr 3.6, only 1 of which was carried to Solr 4.0: - Solr 3.6 had 2 MAJOR changes: 1. We support pluggable caches so that you can write your own cache implemetations and cache however you want. The goal here is to allow you to cache to disk when you had to do large, complex joins and an in-memory cache could result in an OOM. Also, you can specify cacheImpl with any EntityProcessor, not just SqlEntityProcessor. So you can join child entities that come from XML, flat files, etc. CachedSqlEntityProcessor is technically deprecated as using it is the same as SqlEntityProcessor with cacheImpl=SortedMapBackedCache specified. This does a simple in-memory cache very similar to Solr3.5 and prior. (see https://issues.apache.org/jira/browse/SOLR-2382) 2. Extensive work was done to try and make the threads parameter work in more situations. This involved some rather invasive changes to the DIH Cache functionality. (see https://issues.apache.org/jira/browse/SOLR-3011) - Solr 4.0 has #1 above, BUT NOT #2. Rather the threads functionality was entirely removed. Subsequently, if the problem is due to #2 (SOLR-3011), this isn't as big a problem because 3.x users can simply use the 3.5 DIH jar (but some use-cases involding threads work with the 3.6(.1) jar and not at all with 3.5, so users will have to pick choose the best version to use for their instance). My concern is there are issues with #1 (SOLR-2382). That's why I'm asking if at all possible you can try this with SOLR 4.0. I have tested Solr 4.0 extensively here and it seems caching works exactly as it ought. However, DIH is flexible on how it can be configured and there could be somethat that was broken that I have not uncovered myself. Any issues that may exist with SOLR-2382 need to be identified and fixed in the 4.x branch as soon as possible. I apologize for the late response. I was away the past week. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: mechravi25 [mailto:mechrav...@yahoo.co.in] Sent: Tuesday, August 21, 2012 7:47 AM To: solr-user@lucene.apache.org Subject: RE: Dataimport Handler in solr 3.6.1 Hi James, Thanks for the suggestions. Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be needing the transformers mentioned as there are other columns as well. Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came across the SOLR-2382 patch. I tried giving the following processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache in my DIH.xml file. In case of static fields in child entities ,the indexing happended fine but in case of dynamic fields, only one of the dynamic fields was indexed and the rest was skipped even though the total rows fetched from datasource was correct. Following are my questions 1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like is any new feature added in 3.6 DIH that is not present in 3.5? 2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache in my DIH.xml because of which dynamic fields are not indexed properly? There is no change to my DIH file from my previous post apart from this cacheImpl addition and also the dynamic fields are indexed properly if I do not give this cacheImpl. Am I missing something here? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Dataimport Handler in solr 3.6.1
Hi James, Thanks for the suggestions. Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be needing the transformers mentioned as there are other columns as well. Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came across the SOLR-2382 patch. I tried giving the following processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache in my DIH.xml file. In case of static fields in child entities ,the indexing happended fine but in case of dynamic fields, only one of the dynamic fields was indexed and the rest was skipped even though the total rows fetched from datasource was correct. Following are my questions 1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like is any new feature added in 3.6 DIH that is not present in 3.5? 2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache in my DIH.xml because of which dynamic fields are not indexed properly? There is no change to my DIH file from my previous post apart from this cacheImpl addition and also the dynamic fields are indexed properly if I do not give this cacheImpl. Am I missing something here? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Dataimport Handler in solr 3.6.1
One thing I notice in your configuration...the child entity has this: cacheLookup=ent1.uid but your parent entity doesn't have a uid field. Also, you have these 3 transformers: RegexTransformer,DateFormatTransformer,TemplateTransformer but none of your columns seem to make use of these. Are you sure you need them? In any case I am suspicious there may still be bugs in 3.6.1 related to CachedSqlEntityProcessor, so if you are able to create a failing unit test and post it to JIRA that would be helpful. If you need to, you can use the 3.5 DIH jar with Solr 3.6.1. Also, I do not think the SOLR-3360 should affect you unless you're using the threads parameter. Both SOLR-3360 SOLR-3430 fixed bugs related to CachedSqlEntityProcessor that were introduced in 3.6.0 (from SOLR-3411 and SOLR-2482 respectively). Finally, if you are at all able to test this on 4.0-beta, I would greatly appreciate it! SOLR-3411/SOLR-3360 were never applied to version 4.0 because threadS support was removed entirely. However, SOLR-2482/SOLR-3430 were applied to 4.0 also. If we have any more SOLR-2482 bugs lingering in 4.0 these really need to be fixed so any testing help would be much appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: mechravi25 [mailto:mechrav...@yahoo.co.in] Sent: Tuesday, August 14, 2012 8:04 AM To: solr-user@lucene.apache.org Subject: Dataimport Handler in solr 3.6.1 I am indexing some data using dataimport handler files in solr 3.6.1. I using a nested entity in my handler file. I noticed a scenario where-in instead of the records which is to be fetched for a document, all the records present in the table are indexed. Following is the ideal scenario how the data has to be indexed. For a document A, I am trying to index the 2 values B,C as a multivalued field idA/id related_id strB/str strC/str /related_id This is how the output should be. I have used the same DIH file for solr 1.4,3.5 versions and the data was indexed fine like the one mentioned above in both the versions. But in solr 3.6.1 version, data was indexed differently. In my table, there are 4 values(B,C,D,E) in related_id field. This is how the data is indexed in 3.6.1 idA/id related_id strB/str strC/str strD/str strE/str /related_id Ideally, the values D and E should not get indexed under id A. This is the same for the other id records. Following is the content of the DIH file entity name=ent1 query=select sid as id Table1 a transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer field column=id name=id boost=0.5/ entity name=ent2 query=select id1,rid from Table2 processor=CachedSqlEntityProcessor cacheKey=id1 cacheLookup=ent1.uid transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer field column=rid name=related_id/ /entity /entity I tried changing the CachedSqlEntityProcessor to SqlEntityProcessor and then indexed the same but still I faced the same issue. When I googled a bit, I found this url https://issues.apache.org/jira/browse/SOLR-3360 I am not sure if the issue 3360 is the same as the scenario as I have mentioned above. Please guid me. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimport handler (DIH) - notify when it has finished?
On 1 May 2012 23:12, geeky2 gee...@hotmail.com wrote: Hello all, is there a notification / trigger / callback mechanism people use that allows them to know when a dataimport process has finished? we will be doing daily delta-imports and i need some way for an operations group to know when the DIH has finished. Never tried it myself, but this should meet your needs: http://wiki.apache.org/solr/DataImportHandler#EventListeners Regards, Gora
Re: dataimport handler with mysql: wrong field mapping
Have you tried using the dynamicField name=* type=string indexed=true / options in the schema.xml? After the indexing, take a look to the fields DIH has generated. Bye, L.M. 2008/12/15 jokkmokk jokkm...@gmx.at: HI, I'm desperately trying to get the dataimport handler to work, however it seems that it just ignores the field name mapping. I have the fields body and subject in the database and those are called title and content in the solr schema, so I use the following import config: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb user=root password=/ document entity name=phorum_messages query=select * from phorum_messages field column=body name=content/ field column=subject name=title/ /entity /document /dataConfig however I always get the following exception: org.apache.solr.common.SolrException: ERROR:unknown field 'body' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367) but according to the documentation it should add a document with title and content not body and subject?! I'd appreciate any help as I can't see anything wrong with my configuration... TIA, Stefan -- View this message in context: http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimport handler with mysql: wrong field mapping
sorry, I'm using the 1.3.0 release. I've now worked around that issue by using aliases in the sql statement so that no mapping is needed. This way it works perfectly. best regards Stefan Shalin Shekhar Mangar wrote: Which solr version are you using? -- View this message in context: http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimport handler with mysql: wrong field mapping
Which solr version are you using? On Mon, Dec 15, 2008 at 6:04 PM, jokkmokk jokkm...@gmx.at wrote: HI, I'm desperately trying to get the dataimport handler to work, however it seems that it just ignores the field name mapping. I have the fields body and subject in the database and those are called title and content in the solr schema, so I use the following import config: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb user=root password=/ document entity name=phorum_messages query=select * from phorum_messages field column=body name=content/ field column=subject name=title/ /entity /document /dataConfig however I always get the following exception: org.apache.solr.common.SolrException: ERROR:unknown field 'body' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367) but according to the documentation it should add a document with title and content not body and subject?! I'd appreciate any help as I can't see anything wrong with my configuration... TIA, Stefan -- View this message in context: http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: dataimport handler multiple databases
each entity has an optional attribute called dataSource. If you have multiple dataSources give them a name and use the name is dataSource .So you solrconfig must look like requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str lst name=datasource str name=namedatasource-1/str str name=drivercom.mysql.jdbc.Driver/str /lst lst name=datasource str name=namedatasource-2/str str name=drivercom.mysql.jdbc.Driver/str /lst /lst /requestHandler and each entity can have its dataSource attribute refer to something eg: entity name=one dataSource=datasource-1 .. /entity entity name=two dataSource=datasource-2 .. /entity But as I see you have a usecase where prod and qa uses different dbs. But So betweenprod and qa us can change the solrconfig xml --Noble On undefined, Ismail Siddiqui [EMAIL PROTECTED] wrote: Hi I have a situaion where I am using dataimport handler with development db and going to use it with production database in production environment I have entry in solr-config.xml like this requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str lst name=datasource str name=drivercom.mysql.jdbc.Driver/str str name=urljdbc:mysql://localhost/dbname/str str name=userdb_username/str str name=passworddb_password/str /lst /lst /requestHandler I understand i can add another datasource called datasource-2 . but how can I can use this datasource to index data currently i am colling somethign /dataimport?command=full-import or /dataimport?command=delta-import.How can i define a particular db to be called so it indexes dev db on development machine and prod db in production environmnt. thanks -- --Noble Paul