Re: Dataimport handler showing idle status with multiple shards

2017-12-05 Thread Sarah Weissman


From: Shawn Heisey <elyog...@elyograg.org>
Reply-To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Date: Tuesday, December 5, 2017 at 1:31 PM
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Subject: Re: Dataimport handler showing idle status with multiple shards

On 12/5/2017 10:47 AM, Sarah Weissman wrote:
I’ve recently been using the dataimport handler to import records from a 
database into a Solr cloud collection with multiple shards. I have 6 dataimport 
handlers configured on 6 different paths all running simultaneously against the 
same DB. I’ve noticed that when I do this I often get “idle” status from the 
DIH even when the import is still running. The percentage of the time I get an 
“idle” response seems proportional to the number of shards. I.e., with 1 shard 
it always shows me non-idle status, with 2 shards I see idle about half the 
time I check the status, with 96 shards it seems to be showing idle almost all 
the time. I can see the size of each shard increasing, so I’m sure the import 
is still going.

I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. 
Does anyone know why the DIH would report idle when it’s running?

e.g.:
curl http://myserver:8983/solr/collection/dataimport6



To use DIH with SolrCloud, you should be sending your request directly
to a shard replica core, not the collection, so that you can be
absolutely certain that the import command and the status command are
going to the same place.  You MIGHT need to also have a distrib=false
parameter on the request, but I do not know whether that is required to
prevent the load balancing on the dataimport handler.



Thanks for the information, Shawn. I am relatively new to Solr cloud and I am 
used to running the dataimport from the admin dashboard, where it happens at 
the collection level, so I find it surprising that the right way to do this is 
at the core level. So, if I want to be able to check the status of my data 
import for N cores I would need to create N different data import configs that 
manually partition the collection and start each different config on a 
different core? That seems like it could get confusing. And then if I wanted to 
grow or shrink my shards I’d have to rejigger my data import configs every 
time. I kind of expect a distributed index to hide these details from me.

I only have one node at the moment, and I don’t understand how Solr cloud works 
internally well enough to understand what it means for the data import to be 
running on a shard vs. a node. It would be nice if doing a status query would 
at least tell you something, like the number of documents last indexed on that 
core, even if nothing is currently running. That way at least I could 
extrapolate how much longer the operation will take.



Re: Dataimport handler showing idle status with multiple shards

2017-12-05 Thread Shawn Heisey

On 12/5/2017 10:47 AM, Sarah Weissman wrote:

I’ve recently been using the dataimport handler to import records from a 
database into a Solr cloud collection with multiple shards. I have 6 dataimport 
handlers configured on 6 different paths all running simultaneously against the 
same DB. I’ve noticed that when I do this I often get “idle” status from the 
DIH even when the import is still running. The percentage of the time I get an 
“idle” response seems proportional to the number of shards. I.e., with 1 shard 
it always shows me non-idle status, with 2 shards I see idle about half the 
time I check the status, with 96 shards it seems to be showing idle almost all 
the time. I can see the size of each shard increasing, so I’m sure the import 
is still going.

I recently switched from 6.1 to 7.1 and I don’t remember this happening in 6.1. 
Does anyone know why the DIH would report idle when it’s running?

e.g.:
curl http://myserver:8983/solr/collection/dataimport6


When you send a DIH request to the collection name, SolrCloud is going 
to load balance that request across the cloud, just like it would with 
any other request.  Solr will look at the list of all responding nodes 
that host part of the collection and send multiple such requests to 
different cores (shards/replicas) across the cloud.  If there are four 
cores in the collection and the nodes hosting them are all working, then 
each of those cores would only see requests to /dataimport about one 
fourth of the time.


DIH imports happen at the core level, NOT the collection level, so when 
you start an import on a collection with four cores in the cloud, only 
one of those four cores is actually going to be doing the import, the 
rest of them are idle.


This behavior should happen with any version, so I would expect it in 
6.1 as well as 7.1.


To use DIH with SolrCloud, you should be sending your request directly 
to a shard replica core, not the collection, so that you can be 
absolutely certain that the import command and the status command are 
going to the same place.  You MIGHT need to also have a distrib=false 
parameter on the request, but I do not know whether that is required to 
prevent the load balancing on the dataimport handler.


A similar question came to this list two days ago, and I replied to that 
one yesterday.


http://lucene.472066.n3.nabble.com/Dataimporter-status-tp4365602p4365879.html

Somebody did open an issue a LONG time ago about this problem:

https://issues.apache.org/jira/browse/SOLR-3666

I just commented on the issue.

Thanks,
Shawn



RE: DataImport Handler Out of Memory

2017-09-27 Thread Allison, Timothy B.
https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F


-Original Message-
From: Deeksha Sharma [mailto:dsha...@flexera.com] 
Sent: Wednesday, September 27, 2017 1:40 PM
To: solr-user@lucene.apache.org
Subject: DataImport Handler Out of Memory

I am trying to create indexes using dataimport handler (Solr 5.2.1). Data is in 
mysql db and the number of records are more than 3.5 million. My solr server 
stops due to OOM (out of memory error). I tried starting solr by giving 12GB of 
RAM but still no luck.


Also, I see that Solr fetches all the documents in 1 request. Is there a way to 
configure Solr to stream the data from DB or any other solution somewhere may 
have tried?


Note: When my records are nearly 2 Million, I am able to create indexes by 
giving Solr 10GB of RAM.


Your help is appreciated.



Thanks

Deeksha




Re: Dataimport handler Date

2014-03-06 Thread Gora Mohanty
On 7 March 2014 08:50, Pritesh Patel priteshpate...@gmail.com wrote:
 I'm using the dataimporthandler to index data from a mysql DB.  Been
 running it just fine. I've been using full-imports. I'm now trying
 implement the delta import functionality.

 To implement the delta query, you need to be reading the last_index_time
 from a properties file to know what new to index.  So I'm using the
 parameter:
 {dataimporter.last_index_time} within my query.

 The problem is when I use this, the date always is : Thu Jan 01 00:00:00
 UTC 1970.  It's never actually reading the correct date stored in the
 dataimport.properties file.
[...]

I take it that you have verified that the dataimport.properties file exists.
What are its contents?

Please share the exact DIH configuration file that you use, obfuscating
DB password/username. Your cut-and-paste seems to have a syntax
error in the deltaQuery (notice the 'jgkg' string):
deltaQuery=SELECT node.nid from node where node.type = 'news' and
node.status = 1 and (node.changed gt;
UNIX_TIMESTAMP('${
dataimporter.last_index_time}'jgkg) or node.created gt;
UNIX_TIMESTAMP('${dataimporter.last_index_time}'))

What response do you get fromm the delta-import URL?
Are there any error messages in your Solr log?

Regards,
Gora


Re: dataimport handler

2014-01-22 Thread Shalin Shekhar Mangar
I'm guessing that id in your schema.xml is also a unique key field.
If so, each document must have an id field or Solr will refuse to
index them.

DataImportHandler will map the id field in your table to Solr schema's
id field only if you have not specified a mapping.

On Thu, Jan 23, 2014 at 3:01 AM, tom praveen...@yahoo.com wrote:
 Hi,
 I am trying to use dataimporthandler(Solr 4.6) from oracle database, but I
 have some issues in mapping the data.
 I have 3 columns in the test_table,
  column1,
  column2,
  id

 dataconfig.xml

   entity name=test_table
 query=select * from test_table 
 field column=column1 name=id /
 field column=column2 name=name /
 /entity

 Issue is,
 - if I remove the id column from the table, index fails, solr is looking for
 id column even though it is not mapped in dataconfig.xml.
 - if I add, it directly maps the id column form the db to solr id, it
 ignores the column1, even though it is mapped.

 my problem is I don't have ID in every table, I should be able to map the
 column I choose from the table to solr Id,  any solution will be greatly
 appreciated.

 `Tom




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/dataimport-handler-tp4112830.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Dataimport handler exception when migrating from 4.4 to 4.6. Help needed

2013-12-22 Thread Shawn Heisey
On 12/22/2013 9:51 AM, William Pierce wrote:
 My configurations works nicely with solr 4.4. I am encountering a 
 configuration error when I try to upgrade from 4.4 to 4.6.  All I did was the 
 following:
 
 a) Replace the 4.4 solr.war file with the 4.6 solr.war in the tomcat/lib 
 folder. I am using version 6.0.36 of tomcat.
 b) I replaced the solr-dataimporthandler-4.4.0.jar and 
 solr-dataimporthandler-extras-4.4.0.jar with the corresponding 4.6 
 counterparts in the collection/lib folder.
 
 I restarted tomcat.   I get the following stack trace (full log is also given 
 below) – there are no other warnings/errors in my log.  I have gone through 
 the 4.5 changes to see if I needed to add/modify my DIH configuration – but I 
 am stymied.  Any help will be greatly appreciated.
 
 ERROR - 2013-12-22 08:05:09.824; 
 org.apache.solr.handler.dataimport.DataImportHandler; Exception while loading 
 DataImporter
 java.lang.NoSuchMethodError: 
 org.apache.solr.core.SolrCore.getLatestSchema()Lorg/apache/solr/schema/IndexSchema;

The method it's complaining about not being there is
org.apache.solr.core.SolrCore.getLatestSchema() ... which is in Solr
itself, not the dataimport handler.  I did some checking.  This method
did not exist before 4.4.0, so my best guess is that your classloader is
loading a SolrCore class from 4.3.1 or earlier, which probably means one
of two things: 1) The Solr war you're extracting is not actually version
4.6.0, or 2) you've got jars in your system from one or more older versions.

It's a good idea to delete the extracted war data whenever you upgrade
Solr -- stop the container, delete the extracted data and all old jars,
then replace the .war file and start it back up.

Thanks,
Shawn



Re: Dataimport handler exception when migrating from 4.4 to 4.6. Help needed

2013-12-22 Thread William Bell
The best practice for upgrading is take the distribution and expand it.
Then take your cores and replace it.

Then you are guaranteed to get the jars and not have other WARs/JARs
hanging around.



On Sun, Dec 22, 2013 at 7:24 PM, Shawn Heisey s...@elyograg.org wrote:

 On 12/22/2013 9:51 AM, William Pierce wrote:
  My configurations works nicely with solr 4.4. I am encountering a
 configuration error when I try to upgrade from 4.4 to 4.6.  All I did was
 the following:
 
  a) Replace the 4.4 solr.war file with the 4.6 solr.war in the tomcat/lib
 folder. I am using version 6.0.36 of tomcat.
  b) I replaced the solr-dataimporthandler-4.4.0.jar and
 solr-dataimporthandler-extras-4.4.0.jar with the corresponding 4.6
 counterparts in the collection/lib folder.
 
  I restarted tomcat.   I get the following stack trace (full log is also
 given below) – there are no other warnings/errors in my log.  I have gone
 through the 4.5 changes to see if I needed to add/modify my DIH
 configuration – but I am stymied.  Any help will be greatly appreciated.
 
  ERROR - 2013-12-22 08:05:09.824;
 org.apache.solr.handler.dataimport.DataImportHandler; Exception while
 loading DataImporter
  java.lang.NoSuchMethodError:
 org.apache.solr.core.SolrCore.getLatestSchema()Lorg/apache/solr/schema/IndexSchema;

 The method it's complaining about not being there is
 org.apache.solr.core.SolrCore.getLatestSchema() ... which is in Solr
 itself, not the dataimport handler.  I did some checking.  This method
 did not exist before 4.4.0, so my best guess is that your classloader is
 loading a SolrCore class from 4.3.1 or earlier, which probably means one
 of two things: 1) The Solr war you're extracting is not actually version
 4.6.0, or 2) you've got jars in your system from one or more older
 versions.

 It's a good idea to delete the extracted war data whenever you upgrade
 Solr -- stop the container, delete the extracted data and all old jars,
 then replace the .war file and start it back up.

 Thanks,
 Shawn




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: DataImport Handler, writing a new EntityProcessor

2013-12-19 Thread Mathias Lux
Hi!

Thanks for all the advice! I finally did it, the most annoying error
that took me the best of a day to figure out was that the state
variable here had to be reset:
https://bitbucket.org/dermotte/liresolr/src/d27878a71c63842cb72b84162b599d99c4408965/src/main/java/net/semanticmetadata/lire/solr/LireEntityProcessor.java?at=master#cl-56

The EntityProcessor is part of this image search plugin if anyone is
interested: https://bitbucket.org/dermotte/liresolr/

:) It's always the small things that are hard to find

cheers and thanks, Mathias

On Wed, Dec 18, 2013 at 7:26 PM, P Williams
williams.tricia.l...@gmail.com wrote:
 Hi Mathias,

 I'd recommend testing one thing at a time.  See if you can get it to work
 for one image before you try a directory of images.  Also try testing using
 the solr-testframework using your ide (I use Eclipse) to debug rather than
 your browser/print statements.  Hopefully that will give you some more
 specific knowledge of what's happening around your plugin.

 I also wrote an EntityProcessor plugin to read from a properties
 filehttps://issues.apache.org/jira/browse/SOLR-3928.
  Hopefully that'll give you some insight about this kind of Solr plugin and
 testing them.

 Cheers,
 Tricia




 On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote:

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec




-- 
PD Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


RE: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Dyer, James
The first thing I would suggest is to try and run it not in debug mode.  DIH's 
debug mode limits the number of documents it will take in, so that might be all 
that is wrong here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias 
Lux
Sent: Wednesday, December 18, 2013 4:04 AM
To: solr-user@lucene.apache.org
Subject: DataImport Handler, writing a new EntityProcessor

Hi all!

I've got a question regarding writing a new EntityProcessor, in the
same sense as the Tika one. My EntityProcessor should analyze jpg
images and create document fields to be used with the LIRE Solr plugin
(https://bitbucket.org/dermotte/liresolr). Basically I've taken the
same approach as the TikaEntityProcessor, but my setup just indexes
the first of 1000 images. I'm using a FileListEntityProcessor to get
all JPEGs from a directory and then I'm handing them over (see [2]).
My code for the EntityProcessor is at [1]. I've tried to use the
DataSource as well as the filePath attribute, but it ends up all the
same. However, the FileListEntityProcessor is able to read all the
files according to the debug output, but I'm missing the link from the
FileListEntityProcessor to the LireEntityProcessor.

I'd appreciate any pointer or help :)

cheers,
  Mathias

[1] LireEntityProcessor http://pastebin.com/JFajkNtf
[2] dataConfig http://pastebin.com/vSHucatJ

-- 
Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec



Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Mathias Lux
Unfortunately it is the same in non-debug, just the first document. I
also output the params to sout, but it seems only the first one is
ever arriving at my custom class. I've the feeling that I'm doing
something seriously wrong here, based on a complete misunderstanding
:) I basically assume that the nested entity processor will be called
for each of the rows that come out from its parent. I've read
somewhere, that the data has to be taken from the data source, and
I've implemented that, but it doesn't seem to change anything.

cheers,
Mathias

On Wed, Dec 18, 2013 at 3:05 PM, Dyer, James
james.d...@ingramcontent.com wrote:
 The first thing I would suggest is to try and run it not in debug mode.  
 DIH's debug mode limits the number of documents it will take in, so that 
 might be all that is wrong here.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of 
 Mathias Lux
 Sent: Wednesday, December 18, 2013 4:04 AM
 To: solr-user@lucene.apache.org
 Subject: DataImport Handler, writing a new EntityProcessor

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec




-- 
PD Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread P Williams
Hi Mathias,

I'd recommend testing one thing at a time.  See if you can get it to work
for one image before you try a directory of images.  Also try testing using
the solr-testframework using your ide (I use Eclipse) to debug rather than
your browser/print statements.  Hopefully that will give you some more
specific knowledge of what's happening around your plugin.

I also wrote an EntityProcessor plugin to read from a properties
filehttps://issues.apache.org/jira/browse/SOLR-3928.
 Hopefully that'll give you some insight about this kind of Solr plugin and
testing them.

Cheers,
Tricia




On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote:

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec



Re: dataimport handler

2013-05-10 Thread Shalin Shekhar Mangar
Hmm, I will fix.

https://issues.apache.org/jira/browse/SOLR-4788


On Thu, May 9, 2013 at 8:35 PM, William Bell billnb...@gmail.com wrote:

 It does not work anymore in 4.x.

 ${dih.last_index_time} does work, but the entity version does not.

 Bill



 On Tue, May 7, 2013 at 4:19 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  Using ${dih.entity_name.last_index_time} should work. Make sure you put
  it in quotes in your query.
 
 
  On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com
 wrote:
 
   In the  data import handler  I have multiple entities.  Each one
   generates a date in the
   dataimport.properties i.e. entityname.last_index_time.
  
   How do I reference the specific entity time in my delta queries?
  
   Thanks
  
   Eric
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076




-- 
Regards,
Shalin Shekhar Mangar.


Re: dataimport handler

2013-05-09 Thread William Bell
It does not work anymore in 4.x.

${dih.last_index_time} does work, but the entity version does not.

Bill



On Tue, May 7, 2013 at 4:19 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Using ${dih.entity_name.last_index_time} should work. Make sure you put
 it in quotes in your query.


 On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote:

  In the  data import handler  I have multiple entities.  Each one
  generates a date in the
  dataimport.properties i.e. entityname.last_index_time.
 
  How do I reference the specific entity time in my delta queries?
 
  Thanks
 
  Eric
 



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: dataimport handler

2013-05-07 Thread Shalin Shekhar Mangar
Using ${dih.entity_name.last_index_time} should work. Make sure you put
it in quotes in your query.


On Tue, May 7, 2013 at 12:07 PM, Eric Myers emy...@nabancard.com wrote:

 In the  data import handler  I have multiple entities.  Each one
 generates a date in the
 dataimport.properties i.e. entityname.last_index_time.

 How do I reference the specific entity time in my delta queries?

 Thanks

 Eric




-- 
Regards,
Shalin Shekhar Mangar.


Re: Dataimport handler

2013-04-23 Thread William Bell
I also get this. 4.2+


On Fri, Apr 19, 2013 at 10:43 PM, Eric Myers badllam...@gmail.com wrote:

 I have multiple parallel entities in my document and when I run an import
 there are times like
 xxx.last_index_time
 where xxx is the name of the entity.

 I tried accessing these using dih.xxx.last_index_time but receive a null
 value.

 Is there a way to reference these in my queries.

 Thanks




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


RE: DataImport Handler : Transformer Function Eval Failed Error

2012-11-05 Thread Mishra, Shikhar
Looks like it will be helpful. I'm going to give it a shot. Thanks, Otis.

Shikhar

From: Otis Gospodnetic [otis.gospodne...@gmail.com]
Sent: Friday, November 02, 2012 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImport Handler : Transformer Function Eval Failed Error

Would http://wiki.apache.org/solr/Join do anything for you?

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Nov 2, 2012 at 10:06 AM, Mishra, Shikhar 
shikhar.mis...@telcobuy.com wrote:

 We have a scenario where the same products are available from multiple
 vendors at different prices. We want to store these prices along with the
 products in the index (product has many prices), so that we can apply
 dynamic filtering on the prices at the time of search.


 Thanks,
 Shikhar

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Thursday, November 01, 2012 8:13 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImport Handler : Transformer Function Eval Failed Error

 Hi,

 That looks a little painful... what are you trying to achieve by storing
 JSON in there? Maybe there's a simpler way to get there...

 Otis
 --
 Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM,
 Mishra, Shikhar shikhar.mis...@telcobuy.com
 wrote:

  Hi,
 
  I'm trying to store a list of JSON objects as stored value for the
  field prices (see below).
 
  I'm getting the following error from the custom transformer function
  (see the data-config file at the end) of data import handler.
 
  Error Message
 
  --
  - Caused by:
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  'eval' failed with language: JavaScript and script:
  function vendorPrices(row){
 
  var wwtCost = row.get('WWT_COST');
  var listPrice = row.get('LIST_PRICE');
  var vendorName = row.get('VENDOR_NAME');
 
  //Below approach fails
  var prices = [];
 
  prices.push({'vendor':vendorName});
  prices.push({'wwtCost':wwtCost});
  prices.push({'listPrice':listPrice});
 
  row.put('prices':prices);
 
  //Below approach works
  //row.put('prices', '{' + 'vendor:' + vendorName +
  ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
  return row;
  } Processing Document # 1
  at
  org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
  hrow(DataImportHandlerException.java:71)
 
  Data Import Handler Configuration File dataConfig
 
  script
  ![CDATA[
  function vendorPrices(row){
 
  var wwtCost = row.get('WWT_COST');
  var listPrice = row.get('LIST_PRICE');
  var vendorName = row.get('VENDOR_NAME');
 
  //Below approach fails
  var prices = [];
 
  prices.push({'vendor':vendorName});
  prices.push({'wwtCost':wwtCost});
  prices.push({'listPrice':listPrice});
 
  row.put('prices':prices);
 
  //Below approach works
  //row.put('prices', '{' + 'vendor:' + vendorName +
  ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
  return row;
  }
  ]]
  /script
 
  dataSource driver=oracle.jdbc.driver.OracleDriver
  url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=
  rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME=
  ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/
  document
  entity name=item query=select * from
  wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where
  prod.mfg_id = mfg.mfg_id and prod.mfg_product_number =
 'CON-CBO2-B22HPF'
  field column=PRODUCT_ID name=id /
  field column=MFG_PRODUCT_NUMBER name=name /
  field column=MFG_PRODUCT_NUMBER name=nameSort /
  field column=MFG_NAME name=manu /
  field column=MFG_ITEM_NUMBER name=alphaNameSort /
  field column=DESCRIPTION name=features /
  field column=DESCRIPTION name=description /
 
  entity name=vendor_sources
  transformer=script:vendorPrices query=SELECT PRICE.WWT_COST,
  PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME,
  AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod,
  wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend,
  wwt_catalog.wwt_product_availability avail WHERE  PROD.PRODUCT_ID =
  price.product_id(+) AND price.vendor_id

RE: DataImport Handler : Transformer Function Eval Failed Error

2012-11-02 Thread Mishra, Shikhar
We have a scenario where the same products are available from multiple vendors 
at different prices. We want to store these prices along with the products in 
the index (product has many prices), so that we can apply dynamic filtering on 
the prices at the time of search.


Thanks,
Shikhar

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Thursday, November 01, 2012 8:13 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImport Handler : Transformer Function Eval Failed Error

Hi,

That looks a little painful... what are you trying to achieve by storing JSON 
in there? Maybe there's a simpler way to get there...

Otis
--
Performance Monitoring - http://sematext.com/spm On Nov 1, 2012 6:16 PM, 
Mishra, Shikhar shikhar.mis...@telcobuy.com
wrote:

 Hi,

 I'm trying to store a list of JSON objects as stored value for the 
 field prices (see below).

 I'm getting the following error from the custom transformer function 
 (see the data-config file at the end) of data import handler.

 Error Message

 --
 - Caused by: 
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 'eval' failed with language: JavaScript and script:
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + 
 ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 } Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndT
 hrow(DataImportHandlerException.java:71)

 Data Import Handler Configuration File dataConfig

 script
 ![CDATA[
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + 
 ', ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 }
 ]]
 /script

 dataSource driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=
 rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME=
 ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/
 document
 entity name=item query=select * from 
 wwt_catalog.wwt_product prod, wwt_catalog.wwt_manufacturer mfg where 
 prod.mfg_id = mfg.mfg_id and prod.mfg_product_number = 'CON-CBO2-B22HPF'
 field column=PRODUCT_ID name=id /
 field column=MFG_PRODUCT_NUMBER name=name /
 field column=MFG_PRODUCT_NUMBER name=nameSort /
 field column=MFG_NAME name=manu /
 field column=MFG_ITEM_NUMBER name=alphaNameSort /
 field column=DESCRIPTION name=features /
 field column=DESCRIPTION name=description /

 entity name=vendor_sources
 transformer=script:vendorPrices query=SELECT PRICE.WWT_COST, 
 PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, 
 AVAIL.QTY_AVAILABLE FROM wwt_catalog.wwt_product prod, 
 wwt_catalog.wwt_product_pricing price, wwt_catalog.wwt_vendor vend, 
 wwt_catalog.wwt_product_availability avail WHERE  PROD.PRODUCT_ID = 
 price.product_id(+) AND price.vendor_id =
 vend.vendor_id(+) AND PRICE.PRODUCT_ID = avail.product_id(+) AND 
 PRICE.VENDOR_ID = AVAIL.VENDOR_ID(+) AND prod.PRODUCT_ID = 
 '${item.PRODUCT_ID}'

 /entity
 /entity

 /document
 /dataConfig


 Are there any syntactic errors in the JavaScript code above? Thanks.

 Shikhar





Re: DataImport Handler : Transformer Function Eval Failed Error

2012-11-01 Thread Otis Gospodnetic
Hi,

That looks a little painful... what are you trying to achieve by storing
JSON in there? Maybe there's a simpler way to get there...

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 1, 2012 6:16 PM, Mishra, Shikhar shikhar.mis...@telcobuy.com
wrote:

 Hi,

 I'm trying to store a list of JSON objects as stored value for the field
 prices (see below).

 I'm getting the following error from the custom transformer function (see
 the data-config file at the end) of data import handler.

 Error Message

 ---
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
 'eval' failed with language: JavaScript and script:
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + ',
 ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 } Processing Document # 1
 at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)

 Data Import Handler Configuration File
 dataConfig

 script
 ![CDATA[
 function vendorPrices(row){

 var wwtCost = row.get('WWT_COST');
 var listPrice = row.get('LIST_PRICE');
 var vendorName = row.get('VENDOR_NAME');

 //Below approach fails
 var prices = [];

 prices.push({'vendor':vendorName});
 prices.push({'wwtCost':wwtCost});
 prices.push({'listPrice':listPrice});

 row.put('prices':prices);

 //Below approach works
 //row.put('prices', '{' + 'vendor:' + vendorName + ',
 ' + 'wwtCost:' + wwtCost + ', ' + 'listPrice:' + listPrice + '}');
 return row;
 }
 ]]
 /script

 dataSource driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=
 rac-scan.somr.com)(PORT=3465))(CONNECT_DATA=(SERVICE_NAME=
 ERP_GENERAL.SOMR.ORG))) user=dummy password=xx/
 document
 entity name=item query=select * from wwt_catalog.wwt_product
 prod, wwt_catalog.wwt_manufacturer mfg where prod.mfg_id = mfg.mfg_id and
 prod.mfg_product_number = 'CON-CBO2-B22HPF'
 field column=PRODUCT_ID name=id /
 field column=MFG_PRODUCT_NUMBER name=name /
 field column=MFG_PRODUCT_NUMBER name=nameSort /
 field column=MFG_NAME name=manu /
 field column=MFG_ITEM_NUMBER name=alphaNameSort /
 field column=DESCRIPTION name=features /
 field column=DESCRIPTION name=description /

 entity name=vendor_sources
 transformer=script:vendorPrices query=SELECT PRICE.WWT_COST,
 PRICE.LIST_PRICE, VEND.VENDOR_NAME, AVAIL.LEAD_TIME, AVAIL.QTY_AVAILABLE
 FROM wwt_catalog.wwt_product prod, wwt_catalog.wwt_product_pricing price,
 wwt_catalog.wwt_vendor vend, wwt_catalog.wwt_product_availability avail
 WHERE  PROD.PRODUCT_ID = price.product_id(+) AND price.vendor_id =
 vend.vendor_id(+) AND PRICE.PRODUCT_ID = avail.product_id(+) AND
 PRICE.VENDOR_ID = AVAIL.VENDOR_ID(+) AND prod.PRODUCT_ID =
 '${item.PRODUCT_ID}'

 /entity
 /entity

 /document
 /dataConfig


 Are there any syntactic errors in the JavaScript code above? Thanks.

 Shikhar





RE: Dataimport Handler in solr 3.6.1

2012-08-30 Thread Dyer, James
There were 2 major changes to DIH Cache functionality in Solr 3.6, only 1 of 
which was carried to Solr 4.0:

- Solr 3.6 had 2 MAJOR changes:

1. We support pluggable caches so that you can write your own cache 
implemetations and cache however you want.  The goal here is to allow you to 
cache to disk when you had to do large, complex joins and an in-memory cache 
could result in an OOM.  Also, you can specify cacheImpl with any 
EntityProcessor, not just SqlEntityProcessor.  So you can join child entities 
that come from XML, flat files, etc.  CachedSqlEntityProcessor is technically 
deprecated as using it is the same as SqlEntityProcessor with 
cacheImpl=SortedMapBackedCache specified.  This does a simple in-memory cache 
very similar to Solr3.5 and prior. (see 
https://issues.apache.org/jira/browse/SOLR-2382)

2. Extensive work was done to try and make the threads parameter work in more 
situations.  This involved some rather invasive changes to the DIH Cache 
functionality. (see https://issues.apache.org/jira/browse/SOLR-3011)

- Solr 4.0 has #1 above, BUT NOT #2.  Rather the threads functionality was 
entirely removed.

Subsequently, if the problem is due to #2 (SOLR-3011), this isn't as big a 
problem because 3.x users can simply use the 3.5 DIH jar (but some use-cases 
involding threads work with the 3.6(.1) jar and not at all with 3.5, so users 
will have to pick  choose the best version to use for their instance).

My concern is there are issues with #1 (SOLR-2382).  That's why I'm asking if 
at all possible you can try this with SOLR 4.0.  I have tested Solr 4.0 
extensively here and it seems caching works exactly as it ought.  However, DIH 
is flexible on how it can be configured and there could be somethat that was 
broken that I have not uncovered myself.  Any issues that may exist with 
SOLR-2382 need to be identified and fixed in the 4.x branch as soon as possible.

I apologize for the late response.  I was away the past week.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: mechravi25 [mailto:mechrav...@yahoo.co.in] 
Sent: Tuesday, August 21, 2012 7:47 AM
To: solr-user@lucene.apache.org
Subject: RE: Dataimport Handler in solr 3.6.1

Hi James,

Thanks for the suggestions. 

Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be
needing the transformers mentioned as there are other columns as well.

Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the
indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came
across the SOLR-2382 patch. I tried giving the following 

processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache 

in my DIH.xml file. In case of static fields in child entities ,the indexing
happended fine but in case of dynamic fields, only one of the dynamic fields
was indexed and the rest was skipped even though the total rows fetched from
datasource was correct.

Following are my questions

1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like
is any new feature added in 3.6 DIH that is not present in 3.5?
2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache
in my DIH.xml because of which dynamic fields are not indexed properly?
There is no change to my DIH file from my previous post apart from this
cacheImpl addition and also the dynamic fields are indexed properly if I do
not give this cacheImpl. Am I missing something here?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Dataimport Handler in solr 3.6.1

2012-08-21 Thread mechravi25
Hi James,

Thanks for the suggestions. 

Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be
needing the transformers mentioned as there are other columns as well.

Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the
indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came
across the SOLR-2382 patch. I tried giving the following 

processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache 

in my DIH.xml file. In case of static fields in child entities ,the indexing
happended fine but in case of dynamic fields, only one of the dynamic fields
was indexed and the rest was skipped even though the total rows fetched from
datasource was correct.

Following are my questions

1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like
is any new feature added in 3.6 DIH that is not present in 3.5?
2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache
in my DIH.xml because of which dynamic fields are not indexed properly?
There is no change to my DIH file from my previous post apart from this
cacheImpl addition and also the dynamic fields are indexed properly if I do
not give this cacheImpl. Am I missing something here?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Dataimport Handler in solr 3.6.1

2012-08-14 Thread Dyer, James
One thing I notice in your configuration...the child entity has this:

cacheLookup=ent1.uid

but your parent entity doesn't have a uid field.  

Also, you have these 3 transformers:  
RegexTransformer,DateFormatTransformer,TemplateTransformer

but none of your columns seem to make use of these.  Are you sure you need them?

In any case I am suspicious there may still be bugs in 3.6.1 related to 
CachedSqlEntityProcessor, so if you are able to create a failing unit test and 
post it to JIRA that would be helpful.  If you need to, you can use the 3.5 DIH 
jar with Solr 3.6.1.  Also, I do not think the SOLR-3360 should affect you 
unless you're using the threads parameter.  Both SOLR-3360  SOLR-3430 fixed 
bugs related to CachedSqlEntityProcessor that were introduced in 3.6.0 (from 
SOLR-3411 and SOLR-2482 respectively).

Finally, if you are at all able to test this on 4.0-beta, I would greatly 
appreciate it!  SOLR-3411/SOLR-3360 were never applied to version 4.0 because 
threadS support was removed entirely.  However, SOLR-2482/SOLR-3430 were 
applied to 4.0 also.  If we have any more SOLR-2482 bugs lingering in 4.0 these 
really need to be fixed so any testing help would be much appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: mechravi25 [mailto:mechrav...@yahoo.co.in] 
Sent: Tuesday, August 14, 2012 8:04 AM
To: solr-user@lucene.apache.org
Subject: Dataimport Handler in solr 3.6.1

I am indexing some data using dataimport handler files in solr 3.6.1. I using
a nested entity in my handler file. 
I noticed a scenario where-in instead of the records which is to be fetched
for a document, 
all the records present in the table are indexed.

Following is the ideal scenario how the data has to be indexed.
For a document A, I am trying to index the 2 values B,C as a multivalued
field

idA/id
related_id
strB/str
strC/str
/related_id

This is how the output should be. I have used the same DIH file for solr
1.4,3.5 versions 
and the data was indexed fine like the one mentioned above in both the
versions.

But in solr 3.6.1 version, data was indexed differently. In my table, there
are 4 values(B,C,D,E) in related_id field.
This is how the data is indexed in 3.6.1

idA/id
related_id
strB/str
strC/str
strD/str
strE/str
/related_id

Ideally, the values D and E should not get indexed under id A. This is the
same for the other id records.


Following is the content of the DIH file



 entity name=ent1  query=select sid as id Table1 a 
transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer

field column=id name=id boost=0.5/
  

entity name=ent2 query=select id1,rid from Table2 
processor=CachedSqlEntityProcessor cacheKey=id1 cacheLookup=ent1.uid
transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer


field column=rid name=related_id/
   

/entity


/entity



 I tried changing the CachedSqlEntityProcessor to SqlEntityProcessor and
then indexed the same but still I faced the same issue.
 
 When I googled a bit, I found this url
https://issues.apache.org/jira/browse/SOLR-3360


I am not sure if the issue 3360 is the same as the scenario as I have
mentioned above.

Please guid me.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: dataimport handler (DIH) - notify when it has finished?

2012-05-01 Thread Gora Mohanty
On 1 May 2012 23:12, geeky2 gee...@hotmail.com wrote:
 Hello all,

 is there a notification / trigger / callback mechanism people use that
 allows them to know when a dataimport process has finished?

 we will be doing daily delta-imports and i need some way for an operations
 group to know when the DIH has finished.


Never tried it myself, but this should meet your needs:
http://wiki.apache.org/solr/DataImportHandler#EventListeners

Regards,
Gora


Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread Luca Molteni
Have you tried using the

dynamicField name=* type=string indexed=true /

options in the schema.xml? After the indexing, take a look to the
fields DIH has generated.

Bye,

L.M.



2008/12/15 jokkmokk jokkm...@gmx.at:

 HI,

 I'm desperately trying to get the dataimport handler to work, however it
 seems that it just ignores the field name mapping.
 I have the fields body and subject in the database and those are called
 title and content in the solr schema, so I use the following import
 config:

 dataConfig

 dataSource
type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/mydb
user=root
password=/


 document
entity name=phorum_messages query=select * from phorum_messages
field column=body name=content/
field column=subject name=title/
/entity
 /document

 /dataConfig

 however I always get the following exception:

 org.apache.solr.common.SolrException: ERROR:unknown field 'body'
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274)
at
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at
 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279)
at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326)
at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367)


 but according to the documentation it should add a document with title and
 content not body and subject?!

 I'd appreciate any help as I can't see anything wrong with my
 configuration...

 TIA,

 Stefan
 --
 View this message in context: 
 http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread jokkmokk

sorry, I'm using the 1.3.0 release. I've now worked around that issue by
using aliases in the sql statement so that no mapping is needed. This way it
works perfectly.

best regards

Stefan


Shalin Shekhar Mangar wrote:
 
 Which solr version are you using?
 
-- 
View this message in context: 
http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013639.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread Shalin Shekhar Mangar
Which solr version are you using?

On Mon, Dec 15, 2008 at 6:04 PM, jokkmokk jokkm...@gmx.at wrote:


 HI,

 I'm desperately trying to get the dataimport handler to work, however it
 seems that it just ignores the field name mapping.
 I have the fields body and subject in the database and those are called
 title and content in the solr schema, so I use the following import
 config:

 dataConfig

 dataSource
type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/mydb
user=root
password=/


 document
entity name=phorum_messages query=select * from phorum_messages
field column=body name=content/
field column=subject name=title/
/entity
 /document

 /dataConfig

 however I always get the following exception:

 org.apache.solr.common.SolrException: ERROR:unknown field 'body'
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274)
at

 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at

 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279)
at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326)
at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367)


 but according to the documentation it should add a document with title
 and
 content not body and subject?!

 I'd appreciate any help as I can't see anything wrong with my
 configuration...

 TIA,

 Stefan
 --
 View this message in context:
 http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: dataimport handler multiple databases

2008-04-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
each entity has an optional attribute called dataSource.
If you have multiple dataSources give them a name and use the name is
dataSource .So you solrconfig must look like
requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
 str name=config/home/username/data-config.xml/str
 lst name=datasource
str name=namedatasource-1/str
str name=drivercom.mysql.jdbc.Driver/str

 /lst
  lst name=datasource
str name=namedatasource-2/str
str name=drivercom.mysql.jdbc.Driver/str

 /lst
   /lst
 /requestHandler

and each entity can have its dataSource attribute refer to something
eg:
entity name=one dataSource=datasource-1 ..
/entity

entity name=two dataSource=datasource-2 ..
/entity

But as I see you have a usecase where prod and qa uses different dbs. But

So betweenprod and qa us can change the solrconfig xml
--Noble

On undefined, Ismail Siddiqui [EMAIL PROTECTED] wrote:
 Hi I have a situaion where I am using dataimport handler with development db
  and  going to use it with production database in production environment

  I have entry in solr-config.xml like this

  requestHandler name=/dataimport
  class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
   str name=config/home/username/data-config.xml/str
   lst name=datasource
  str name=drivercom.mysql.jdbc.Driver/str
  str name=urljdbc:mysql://localhost/dbname/str
  str name=userdb_username/str
  str name=passworddb_password/str
   /lst
 /lst
   /requestHandler

  I understand i can add  another datasource called datasource-2 . but how can
  I can use this datasource to index data

  currently i am colling somethign  /dataimport?command=full-import or
  /dataimport?command=delta-import.How can i define a particular db to be
  called
  so it indexes dev db on development machine and prod db in production
  environmnt.


  thanks




-- 
--Noble Paul