Re: Data Import Handler (DIH) - Installing and running

2020-12-23 Thread Erick Erickson
Have you done what the message says and looked at your Solr log? If so,
what information is there?

> On Dec 23, 2020, at 5:13 AM, DINSD | SPAutores 
>  wrote:
> 
> Hi,
> 
> I'm trying to install the package "data-import-handler", since it was 
> discontinued from core SolR distro.
> 
> https://github.com/rohitbemax/dataimporthandler
> 
> However, as soon as the first command is carried out
> 
> solr -c -Denable.packages=true
> 
> I get this screen in web interface
> 
> 
> 
> Has anyone been through this, or have any idea why it's happening ?
> 
> Thanks for any help
> Rui Pimentel
> 
> 
> 
> DINSD - Departamento de Informática / SPA Digital
> Av. Duque de Loulé, 31 - 1069-153 Lisboa  PORTUGAL
> T (+ 351) 21 359 44 36 / (+ 351) 21 359 44 00  F (+ 351) 21 353 02 57
>  informat...@spautores.pt
>  www.SPAutores.pt
> 
> Please consider the environment before printing this email 
> 
> Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém 
> informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se 
> unicamente à pessoa e endereço electrónico acima indicados. Se não for o 
> destinatário desta mensagem, agradecemos que a elimine e nos comunique de 
> imediato através do telefone  +351 21 359 44 00 ou por email para: 
> ge...@spautores.pt 
> 
> This electronic mail transmission including any attachment hereof, contains 
> information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM DISCLOSURE, and 
> it is only for the use of the person and the e-mail address above indicated. 
> If you have received this electronic mail transmission in error, please 
> destroy it and notify us immediately through the telephone number  +351 21 
> 359 44 00 or at the e-mail address:  ge...@spautores.pt
>  



Data Import Handler (DIH) - Installing and running

2020-12-23 Thread DINSD | SPAutores

Hi,

I'm trying to install the package "data-import-handler", since it was 
discontinued from core SolR distro.


https://github.com/rohitbemax/dataimporthandler

However, as soon as the first command is carried out

solr -c -Denable.packages=true

I get this screen in web interface

Has anyone been through this, or have any idea why it's happening ?

Assinatura SPA Thanks for any help
**
*Rui Pimentel*


**
*DINSD - Departamento de Informática / SPA Digital*
Av. Duque de Loulé, 31 - 1069-153 Lisboa PORTUGAL
*T * (+ 351) 21 359 44 36 */* (+ 351) 21 359 44 00 *F* (+ 351) 21 353 02 57
<mailto:%7bmailsector...@spautores.pt> informat...@spautores.pt
<http://www.spautores.pt/>www.SPAutores.pt
<https://www.facebook.com/spautores> 
<https://www.youtube.com/user/SPAutores1925><https://plus.google.com/107542947146636584118><https://www.linkedin.com/company/spautores> 


Please consider the environment before printing this email

Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém 
informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se 
unicamente à pessoa e endereço electrónico acima indicados. Se não for o 
destinatário desta mensagem, agradecemos que a elimine e nos comunique 
de imediato através do telefone +351 21 359 44 00 ou por email para: 
ge...@spautores.pt <mailto:ge...@spautores.pt>


This electronic mail transmission including any attachment hereof, 
contains information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM 
DISCLOSURE, and it is only for the use of the person and the e-mail 
address above indicated. If you have received this electronic mail 
transmission in error, please destroy it and notify us immediately 
through the telephone number +351 21 359 44 00 or at the e-mail address: 
ge...@spautores.pt

Assinatura SPA


Re: Data Import Blocker - Solr

2020-12-19 Thread Shawn Heisey

On 12/18/2020 12:03 AM, basel altameme wrote:

While trying to Import & Index data from MySQL DB custom view i am facing the 
error below:
Data Config problem: The value of attribute "query" associated with an element type 
"entity" must not contain the '<' character.
Please note that in my SQL statements i am using '<>' as an operator for 
comparing only.
sample line:
         when (`v`.`live_type_id` <> 1) then 100


These configurations are written in XML.  So you must encode the 
character using XML-friendly notation.


Instead of <> it should say <> to be correct.  Or you could use != 
which is also correct SQL notation for "not equal to".


Thanks,
Shawn


Re: Data Import Blocker - Solr

2020-12-18 Thread Erick Erickson
Have you tried escaping that character?

> On Dec 18, 2020, at 2:03 AM, basel altameme  
> wrote:
> 
> Dear,
> While trying to Import & Index data from MySQL DB custom view i am facing the 
> error below:
> Data Config problem: The value of attribute "query" associated with an 
> element type "entity" must not contain the '<' character.
> Please note that in my SQL statements i am using '<>' as an operator for 
> comparing only.
> sample line:
> when (`v`.`live_type_id` <> 1) then 100
> 
> Kindly advice.
> Regards,Basel
> 



Data Import Blocker - Solr

2020-12-18 Thread basel altameme
Dear,
While trying to Import & Index data from MySQL DB custom view i am facing the 
error below:
Data Config problem: The value of attribute "query" associated with an element 
type "entity" must not contain the '<' character.
Please note that in my SQL statements i am using '<>' as an operator for 
comparing only.
sample line:
        when (`v`.`live_type_id` <> 1) then 100

Kindly advice.
Regards,Basel



Re: data import handler deprecated?

2020-11-30 Thread Dmitri Maziuk

On 11/30/2020 7:50 AM, David Smiley wrote:

Yes, absolutely to what Eric said.  We goofed on news / release highlights
on how to communicate what's happening in Solr.  From a Solr insider point
of view, we are "deprecating" because strictly speaking, the code isn't in
our codebase any longer.  From a user point of view (the audience of news /
release notes), the functionality has *moved*.


Just FYI, there is the dih 8.7.0 jar in 
repo1.maven.org/maven2/org/apache/solr -- whereas the github build is on 
8.6.0.


Dima



Re: data import handler deprecated?

2020-11-30 Thread David Smiley
Yes, absolutely to what Eric said.  We goofed on news / release highlights
on how to communicate what's happening in Solr.  From a Solr insider point
of view, we are "deprecating" because strictly speaking, the code isn't in
our codebase any longer.  From a user point of view (the audience of news /
release notes), the functionality has *moved*.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Nov 30, 2020 at 8:04 AM Eric Pugh 
wrote:

> You don’t need to abandon DIH right now….   You can just use the Github
> hosted version….   The more people who use it, the better a community it
> will form around it!It’s a bit chicken and egg, since no one is
> actively discussing it, submitting PR’s etc, it may languish.   If you use
> it, and test it, and support other community folks using it, then it will
> continue on!
>
>
>
> > On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk 
> wrote:
> >
> > On 11/29/2020 10:32 AM, Erick Erickson wrote:
> >
> >> And I absolutely agree with Walter that the DB is often where
> >> the bottleneck lies. You might be able to
> >> use multiple threads and/or processes to query the
> >> DB if that’s the case and you can find some kind of partition
> >> key.
> >
> > IME the difficult part has always been dealing with incremental updates,
> if we were to roll our own, my vote would be for a database trigger that
> does a POST in whichever language the DBMS likes.
> >
> > But this has not been a part of our "solr 6.5 update" project until now.
> >
> > Thanks everyone,
> > Dima
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: data import handler deprecated?

2020-11-30 Thread Eric Pugh
You don’t need to abandon DIH right now….   You can just use the Github hosted 
version….   The more people who use it, the better a community it will form 
around it!It’s a bit chicken and egg, since no one is actively discussing 
it, submitting PR’s etc, it may languish.   If you use it, and test it, and 
support other community folks using it, then it will continue on!



> On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk  wrote:
> 
> On 11/29/2020 10:32 AM, Erick Erickson wrote:
> 
>> And I absolutely agree with Walter that the DB is often where
>> the bottleneck lies. You might be able to
>> use multiple threads and/or processes to query the
>> DB if that’s the case and you can find some kind of partition
>> key.
> 
> IME the difficult part has always been dealing with incremental updates, if 
> we were to roll our own, my vote would be for a database trigger that does a 
> POST in whichever language the DBMS likes.
> 
> But this has not been a part of our "solr 6.5 update" project until now.
> 
> Thanks everyone,
> Dima

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: data import handler deprecated?

2020-11-29 Thread Dmitri Maziuk

On 11/29/2020 10:32 AM, Erick Erickson wrote:


And I absolutely agree with Walter that the DB is often where
the bottleneck lies. You might be able to
use multiple threads and/or processes to query the
DB if that’s the case and you can find some kind of partition
key.


IME the difficult part has always been dealing with incremental updates, 
if we were to roll our own, my vote would be for a database trigger that 
does a POST in whichever language the DBMS likes.


But this has not been a part of our "solr 6.5 update" project until now.

Thanks everyone,
Dima


Re: data import handler deprecated?

2020-11-29 Thread Erick Erickson
If you like Java instead of Python, here’s a skeletal program:

https://lucidworks.com/post/indexing-with-solrj/

It’s simple and single-threaded, but could serve as a basis for
something along the lines that Walter suggests.

And I absolutely agree with Walter that the DB is often where
the bottleneck lies. You might be able to
use multiple threads and/or processes to query the
DB if that’s the case and you can find some kind of partition
key.

You also might (and it depends on the Solr version) be able,
to wrap a jdbc stream in an update decorator.

https://lucene.apache.org/solr/guide/8_0/stream-source-reference.html

https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html

Best,
Erick

> On Nov 29, 2020, at 3:04 AM, Walter Underwood  wrote:
> 
> I recommend building an outboard loader, like I did a dozen years ago for
> Solr 1.3 (before DIH) and did again recently. I’m glad to send you my Python
> program, though it reads from a JSONL file, not a database.
> 
> Run a loop fetching records from a database. Put each record into a 
> synchronized
> (thread-safe) queue. Run multiple worker threads, each pulling records from 
> the
> queue, batching them up, and sending them to Solr. For maximum indexing speed
> (at the expense of query performance), count the number of CPUs per shard 
> leader
> and run two worker threads per CPU.
> 
> Adjust the batch size to be maybe 10k to 50k bytes. That might be 20 to 1000 
> documents, depending on the content.
> 
> With this setup, your database will probably be your bottleneck. I’ve had this
> index a million (small) documents per minute to a multi-shard cluster, from a 
> JSONL
> file on local disk.
> 
> Also, don’t worry about finding the leaders and sending the right document to
> the right shard. I just throw the batches at the load balancer and let Solr 
> figure
> it out. That is super simple and amazingly fast.
> 
> If you are doing big batches, building a dumb ETL system with JSONL files in 
> Amazon S3 has some real advantages. It allows loading prod data into a test
> cluster for load benchmarks, for example. Also good for disaster recovery, 
> just
> load the recent batches from S3. Want to know exactly which documents were
> in the index in October? Look at the batches in S3.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Nov 28, 2020, at 6:23 PM, matthew sporleder  wrote:
>> 
>> I went through the same stages of grief that you are about to start
>> but (luckily?) my core dataset grew some weird cousins and we ended up
>> writing our own indexer to join them all together/do partial
>> updates/other stuff beyond DIH.  It's not difficult to upload docs but
>> is definitely slower so far.  I think there is a bit of a 'clean core'
>> focus going on in solr-land right now and DIH is easy(!) but it's also
>> easy to hit its limits (atomic/partial updates?  wtf is an "entity?"
>> etc) so anyway try to be happy that you are aware of it now.
>> 
>> On Sat, Nov 28, 2020 at 7:41 PM Dmitri Maziuk  
>> wrote:
>>> 
>>> On 11/28/2020 5:48 PM, matthew sporleder wrote:
>>> 
 ...  The bottom of
 that github page isn't hopeful however :)
>>> 
>>> Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC
>>> JAR" :)
>>> 
>>> It's a more general queston though, what is the path forward for users
>>> who with data in two places? Hope that a community-maintained plugin
>>> will still be there tomorrow? Dump our tables to CSV (and POST them) and
>>> roll our own delta-updates logic? Or are we to choose one datastore and
>>> drop the other?
>>> 
>>> Dima
> 



Re: data import handler deprecated?

2020-11-29 Thread Walter Underwood
I recommend building an outboard loader, like I did a dozen years ago for
Solr 1.3 (before DIH) and did again recently. I’m glad to send you my Python
program, though it reads from a JSONL file, not a database.

Run a loop fetching records from a database. Put each record into a synchronized
(thread-safe) queue. Run multiple worker threads, each pulling records from the
queue, batching them up, and sending them to Solr. For maximum indexing speed
(at the expense of query performance), count the number of CPUs per shard leader
and run two worker threads per CPU.

Adjust the batch size to be maybe 10k to 50k bytes. That might be 20 to 1000 
documents, depending on the content.

With this setup, your database will probably be your bottleneck. I’ve had this
index a million (small) documents per minute to a multi-shard cluster, from a 
JSONL
file on local disk.

Also, don’t worry about finding the leaders and sending the right document to
the right shard. I just throw the batches at the load balancer and let Solr 
figure
it out. That is super simple and amazingly fast.

If you are doing big batches, building a dumb ETL system with JSONL files in 
Amazon S3 has some real advantages. It allows loading prod data into a test
cluster for load benchmarks, for example. Also good for disaster recovery, just
load the recent batches from S3. Want to know exactly which documents were
in the index in October? Look at the batches in S3.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 28, 2020, at 6:23 PM, matthew sporleder  wrote:
> 
> I went through the same stages of grief that you are about to start
> but (luckily?) my core dataset grew some weird cousins and we ended up
> writing our own indexer to join them all together/do partial
> updates/other stuff beyond DIH.  It's not difficult to upload docs but
> is definitely slower so far.  I think there is a bit of a 'clean core'
> focus going on in solr-land right now and DIH is easy(!) but it's also
> easy to hit its limits (atomic/partial updates?  wtf is an "entity?"
> etc) so anyway try to be happy that you are aware of it now.
> 
> On Sat, Nov 28, 2020 at 7:41 PM Dmitri Maziuk  wrote:
>> 
>> On 11/28/2020 5:48 PM, matthew sporleder wrote:
>> 
>>> ...  The bottom of
>>> that github page isn't hopeful however :)
>> 
>> Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC
>> JAR" :)
>> 
>> It's a more general queston though, what is the path forward for users
>> who with data in two places? Hope that a community-maintained plugin
>> will still be there tomorrow? Dump our tables to CSV (and POST them) and
>> roll our own delta-updates logic? Or are we to choose one datastore and
>> drop the other?
>> 
>> Dima



Re: data import handler deprecated?

2020-11-28 Thread matthew sporleder
I went through the same stages of grief that you are about to start
but (luckily?) my core dataset grew some weird cousins and we ended up
writing our own indexer to join them all together/do partial
updates/other stuff beyond DIH.  It's not difficult to upload docs but
is definitely slower so far.  I think there is a bit of a 'clean core'
focus going on in solr-land right now and DIH is easy(!) but it's also
easy to hit its limits (atomic/partial updates?  wtf is an "entity?"
etc) so anyway try to be happy that you are aware of it now.

On Sat, Nov 28, 2020 at 7:41 PM Dmitri Maziuk  wrote:
>
> On 11/28/2020 5:48 PM, matthew sporleder wrote:
>
> > ...  The bottom of
> > that github page isn't hopeful however :)
>
> Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC
> JAR" :)
>
> It's a more general queston though, what is the path forward for users
> who with data in two places? Hope that a community-maintained plugin
> will still be there tomorrow? Dump our tables to CSV (and POST them) and
> roll our own delta-updates logic? Or are we to choose one datastore and
> drop the other?
>
> Dima


Re: data import handler deprecated?

2020-11-28 Thread Dmitri Maziuk

On 11/28/2020 5:48 PM, matthew sporleder wrote:


...  The bottom of
that github page isn't hopeful however :)


Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC 
JAR" :)


It's a more general queston though, what is the path forward for users 
who with data in two places? Hope that a community-maintained plugin 
will still be there tomorrow? Dump our tables to CSV (and POST them) and 
roll our own delta-updates logic? Or are we to choose one datastore and 
drop the other?


Dima


Re: data import handler deprecated?

2020-11-28 Thread matthew sporleder
https://solr.cool/#utilities -> https://github.com/rohitbemax/dataimporthandler

You can import it in the many new/novel ways to add things to a solr
install and it should work like always (apparently).  The bottom of
that github page isn't hopeful however :)

On Sat, Nov 28, 2020 at 5:21 PM Dmitri Maziuk  wrote:
>
> Hi all,
>
> trying to set up solr-8.7.0, contrib/dataimporthandler/README.txt says
> this module is deprecated as of 8.6 and scheduled for removal in 9.0.
>
> How do we pull data out of our relational database in 8.7+?
>
> TIA
> Dima


data import handler deprecated?

2020-11-28 Thread Dmitri Maziuk

Hi all,

trying to set up solr-8.7.0, contrib/dataimporthandler/README.txt says 
this module is deprecated as of 8.6 and scheduled for removal in 9.0.


How do we pull data out of our relational database in 8.7+?

TIA
Dima


Re: Data Import Handler - Concurrent Entity Importing

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 5, 2020 at 1:58 PM Mikhail Khludnev  wrote:
>
> Hello, James.
>
> DataImportHandler has a lock preventing concurrent execution. If you need
> to run several imports in parallel at the same core, you need to duplicate
> "/dataimport" handlers definition in solrconfig.xml. Thus, you can run them
> in parallel. Regarding schema, I prefer the latter but mileage may vary.
>
> --
> Mikhail.
>
> On Tue, May 5, 2020 at 6:39 PM James Greene 
> wrote:
>
> > Hello, I'm new to the group here so please excuse me if I do not have the
> > etiquette down yet.
> >
> > Is it possible to have multiple entities (customer configurable, up to 40
> > atm) in a DIH configuration to be imported at once?  Right now I have
> > multiple root entities in my configuration but they get indexes
> > sequentially and this means the entities that are last are always delayed
> > hitting the index.
> >
> > I'm trying to migrate an existing setup (solr 6.6) that utilizes a
> > different collection for each "entity type" into a single collection (solr
> > 8.4) to get around some of the hurdles faced when needing to have searches
> > that require multiple block joins and currently does not work going cross
> > core.
> >
> > I'm also wondering if it is better to fully qualify a field name or use two
> > different fields for performing the "same" search.  i.e:
> >
> >
> > {
> > type_A_status; Active
> > type_A_value: Test
> > }
> > vs
> > {
> > type: A
> > status: Active
> > value: Test
> > }
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread Mikhail Khludnev
Hello, James.

DataImportHandler has a lock preventing concurrent execution. If you need
to run several imports in parallel at the same core, you need to duplicate
"/dataimport" handlers definition in solrconfig.xml. Thus, you can run them
in parallel. Regarding schema, I prefer the latter but mileage may vary.

--
Mikhail.

On Tue, May 5, 2020 at 6:39 PM James Greene 
wrote:

> Hello, I'm new to the group here so please excuse me if I do not have the
> etiquette down yet.
>
> Is it possible to have multiple entities (customer configurable, up to 40
> atm) in a DIH configuration to be imported at once?  Right now I have
> multiple root entities in my configuration but they get indexes
> sequentially and this means the entities that are last are always delayed
> hitting the index.
>
> I'm trying to migrate an existing setup (solr 6.6) that utilizes a
> different collection for each "entity type" into a single collection (solr
> 8.4) to get around some of the hurdles faced when needing to have searches
> that require multiple block joins and currently does not work going cross
> core.
>
> I'm also wondering if it is better to fully qualify a field name or use two
> different fields for performing the "same" search.  i.e:
>
>
> {
> type_A_status; Active
> type_A_value: Test
> }
> vs
> {
> type: A
> status: Active
> value: Test
> }
>


-- 
Sincerely yours
Mikhail Khludnev


Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread James Greene
Hello, I'm new to the group here so please excuse me if I do not have the
etiquette down yet.

Is it possible to have multiple entities (customer configurable, up to 40
atm) in a DIH configuration to be imported at once?  Right now I have
multiple root entities in my configuration but they get indexes
sequentially and this means the entities that are last are always delayed
hitting the index.

I'm trying to migrate an existing setup (solr 6.6) that utilizes a
different collection for each "entity type" into a single collection (solr
8.4) to get around some of the hurdles faced when needing to have searches
that require multiple block joins and currently does not work going cross
core.

I'm also wondering if it is better to fully qualify a field name or use two
different fields for performing the "same" search.  i.e:


{
type_A_status; Active
type_A_value: Test
}
vs
{
type: A
status: Active
value: Test
}


SOLR Data Import Handler : A command is still running...

2020-02-03 Thread Doss
We are doing hourly data import to our index, per day one or two requests
are getting failed with the message "A command is still running...".

1. Does it mean, the data import not happened for the last hour?
2. If you look at the "Full Dump Started" time has an older data, in the
below log all most 13 days, why is that so?

userinfoindex start - Wed Jan 22 05:12:01 IST 2020 {
"responseHeader":{ "status":0, "QTime":0},   "initArgs":[
"defaults",[   "config","data-import.xml"]],
"command":"full-import",   "status":"busy",   "importResponse":"A command
is still running...",   "statusMessages":{ "Time
Elapsed":"298:1:59.986", "Total Requests made to DataSource":"1",
"Total Rows Fetched":"17426", "Total Documents Processed":"17425",
"Total Documents Skipped":"0", "Full Dump Started":"2020-01-09
19:10:02"}}

Thanks,
Doss.


Re: SQL data import handler

2019-09-09 Thread Friscia, Michael
Thank you for your responses Vadim and Jörn. You both prompted me to try again 
and this time I succeeded. The trick seemed to be the way that I installed Java 
using Open JDK versus from Oracle. In addition, I imagine I accidentally had a 
lot of old versions of JAR files lying around so it was easier to start with a 
fresh VM. Now I was able to install using JDK12 and the latest Microsoft 7.4.x 
driver. Now it works out of the box as I wanted. 

Thanks again for being a sounding board for this, I primarily support 
Microsoft/dot net stuff so the Linux stuff sometimes gets away from me.

___
Michael Friscia
Office of Communications
Yale School of Medicine
(203) 737-7932 - office
(203) 931-5381 - mobile
http://web.yale.edu <http://web.yale.edu/>
 

On 9/9/19, 6:53 AM, "Vadim Ivanov"  wrote:

Hi,
Latest jdbc driver 7.4.1 seems to support JRE 8, 11, 12

https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fdownload%2Fdetails.aspx%3Fid%3D58505&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=3bLoGx8DzsAifCW9tv64V1sCeS7mTzFU3fAazODNGYE%3D&reserved=0
You have to delete all previous versions of Sql Server jdbc driver from 
Solr installation (/solr/server/lib/ in my case)

-- 
Vadim

> -Original Message-
> From: Friscia, Michael [mailto:michael.fris...@yale.edu]
> Sent: Monday, September 09, 2019 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: SQL data import handler
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with 
default-jre
> which installed version 11. So after a day of trying to make my Microsoft 
SQL
> Server data import handler work and failing, I built a new VM and 
installed
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. 
I’m not
> a Java programmer so I’m only going by what I uncovered digging through 
the
> error logs. I am not positive this is the only error to deal with, for 
all I know
> fixing that will just uncover something else that needs repair. There were
> solutions where you compile SOLR using Maven but this is moving out of my
> comfort zone as well as long term strategy to keep SOLR management (as 
well
> as other Linux systems management) out-of-the-box. There were also
> solutions to include some sort of dependency on this older library but 
I’m at a
> loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>   1.  Is it ok to run JRE 8 on a production server? It’s heavily 
firewalled and
> SOLR, Zookeeper nor anything else on these servers is available off the 
virtual
> network so it seems ok, but I try not to run very old versions of any 
software.
>   2.  Is there a way to fix this and keep the installation out-of-the-box 
or at
> least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> 
https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fweb.yale.edu&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=G5xMXdGQs12oK%2FDCxKy0zIn8sQ0uCpDLRGGatw45oiY%3D&reserved=0<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fweb.yale.edu%2F&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=8pGHZPLlREYqKioyMrZDu2f9p3RXJBcFpkmvdKoHaZc%3D&reserved=0>






RE: SQL data import handler

2019-09-09 Thread Vadim Ivanov
Hi,
Latest jdbc driver 7.4.1 seems to support JRE 8, 11, 12
https://www.microsoft.com/en-us/download/details.aspx?id=58505
You have to delete all previous versions of Sql Server jdbc driver from Solr 
installation (/solr/server/lib/ in my case)

-- 
Vadim

> -Original Message-
> From: Friscia, Michael [mailto:michael.fris...@yale.edu]
> Sent: Monday, September 09, 2019 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: SQL data import handler
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre
> which installed version 11. So after a day of trying to make my Microsoft SQL
> Server data import handler work and failing, I built a new VM and installed
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. I’m not
> a Java programmer so I’m only going by what I uncovered digging through the
> error logs. I am not positive this is the only error to deal with, for all I 
> know
> fixing that will just uncover something else that needs repair. There were
> solutions where you compile SOLR using Maven but this is moving out of my
> comfort zone as well as long term strategy to keep SOLR management (as well
> as other Linux systems management) out-of-the-box. There were also
> solutions to include some sort of dependency on this older library but I’m at 
> a
> loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>   1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled 
> and
> SOLR, Zookeeper nor anything else on these servers is available off the 
> virtual
> network so it seems ok, but I try not to run very old versions of any 
> software.
>   2.  Is there a way to fix this and keep the installation out-of-the-box or 
> at
> least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> http://web.yale.edu<http://web.yale.edu/>




Re: SQL data import handler

2019-09-09 Thread Jörn Franke
Hi Michael,

Thank you for sharing. You are right about your approach to not customize the 
distribution.

Solr supports JDK8 and it latest versions (8.x) also JDK11. I would not 
recommend to use it with JDK9 or JDK10 as they are out of support in many Java 
distributions. It might be also that your database driver does not support JDK9 
(check with Microsoft).
I don’t see it that critical at the moment to have JDK8 on this production 
server, but since it is out of support you should look for alternatives.

So if you are with Solr 8.x please go with JDK11 to have the latest fixes etc.

Best regards 

> Am 09.09.2019 um 12:21 schrieb Friscia, Michael :
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre 
> which installed version 11. So after a day of trying to make my Microsoft SQL 
> Server data import handler work and failing, I built a new VM and installed 
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. I’m 
> not a Java programmer so I’m only going by what I uncovered digging through 
> the error logs. I am not positive this is the only error to deal with, for 
> all I know fixing that will just uncover something else that needs repair. 
> There were solutions where you compile SOLR using Maven but this is moving 
> out of my comfort zone as well as long term strategy to keep SOLR management 
> (as well as other Linux systems management) out-of-the-box. There were also 
> solutions to include some sort of dependency on this older library but I’m at 
> a loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>  1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled 
> and SOLR, Zookeeper nor anything else on these servers is available off the 
> virtual network so it seems ok, but I try not to run very old versions of any 
> software.
>  2.  Is there a way to fix this and keep the installation out-of-the-box or 
> at least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> http://web.yale.edu<http://web.yale.edu/>
> 


SQL data import handler

2019-09-09 Thread Friscia, Michael
I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre 
which installed version 11. So after a day of trying to make my Microsoft SQL 
Server data import handler work and failing, I built a new VM and installed JRE 
8 and then everything works perfectly.

The root of the problem was the elimination of java.bind.xml in JRE 9. I’m not 
a Java programmer so I’m only going by what I uncovered digging through the 
error logs. I am not positive this is the only error to deal with, for all I 
know fixing that will just uncover something else that needs repair. There were 
solutions where you compile SOLR using Maven but this is moving out of my 
comfort zone as well as long term strategy to keep SOLR management (as well as 
other Linux systems management) out-of-the-box. There were also solutions to 
include some sort of dependency on this older library but I’m at a loss on how 
to relate that to a SOLR install.

My questions, since I am not that familiar with Java dependencies:

  1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled and 
SOLR, Zookeeper nor anything else on these servers is available off the virtual 
network so it seems ok, but I try not to run very old versions of any software.
  2.  Is there a way to fix this and keep the installation out-of-the-box or at 
least almost out of the box?

___
Michael Friscia
Office of Communications
Yale School of Medicine
(203) 737-7932 - office
(203) 931-5381 - mobile
http://web.yale.edu<http://web.yale.edu/>



Solr Cloud - Data Import from Cassandra

2019-04-08 Thread Furkan Çifçi
Hello everyone,

We are using Solr(7.1) on cloud mode and trying to get data from Cassandra 
source. Can't import data from Cassandra.

In the error logs;

Full Import 
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
PropertyWriter implementation:SimplePropertiesWriter
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:330)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
trying to do is not supported in ZooKeeper mode
at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:151)
at 
org.apache.solr.handler.dataimport.SimplePropertiesWriter.findDirectory(SimplePropertiesWriter.java:131)
at 
org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePropertiesWriter.java:93)
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:328)

Error logs says i cant do it in zookeeper mode.

Is there a  workaround for this situtation?

Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece adreslenen 
kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu e-postayı 
yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta yanlışlıkla size 
gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları sisteminizden siliniz ve 
göndereni hemen bilgilendiriniz. Ayrıca, bu e-posta ve ekindeki dosyaları virüs 
bulaşması ihtimaline karşı taratınız. İŞLEM GIS® bu e-posta ile taşınabilecek 
herhangi bir virüsün neden olabileceği hasarın sorumluluğunu kabul etmez. Bilgi 
için:b...@islem.com.tr This message may contain confidential information and is 
intended only for recipient name. If you are not the named addressee you should 
not disseminate, distribute or copy this e-mail. Please notify the sender 
immediately if you have received this e-mail by mistake and delete this e-mail 
from your system. Finally, the recipient should check this email and any 
attachments for the presence of viruses. İŞLEM GIS® accepts no liability for 
any damage may be caused by any virus transmitted by this email." For 
information: b...@islem.com.tr


Re: Sql server data import

2018-11-09 Thread Erick Erickson
Ok, what that means is you're letting Solr do its best to figure out
what fields you should have in the schema and how they're defined.
Almost invariably, you can do better by explicitly defining the fields
you need in your schema rather than enabling add-unknown. It's
fine for getting started, but not advised for production.

Best,
Erick
On Fri, Nov 9, 2018 at 7:52 AM Verthosa  wrote:
>
> Hello, i managed to fix the problem. I'm using Solr 7.5.0. My problem was
> that in the server logs i got "This Indexschema is not mutable" (i did not
> know about the logs folder, so i just found out 5 minutes ago). I fixed it
> by modifying solrconfig.xml to
>
>  name="add-unknown-fields-to-the-schema"
> default="${update.autoCreateFields:false*}"
>
> processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
> 
> 
> 
> 
>
> Since then the indexing is done correctly. I even got the blob fields
> indexation working now ! Thanks for your reply, everything is fixed for now.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sql server data import

Hello, i managed to fix the problem. I'm using Solr 7.5.0. My problem was
that in the server logs i got "This Indexschema is not mutable" (i did not
know about the logs folder, so i just found out 5 minutes ago). I fixed it
by modifying solrconfig.xml to

false*}"

processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">





Since then the indexing is done correctly. I even got the blob fields
indexation working now ! Thanks for your reply, everything is fixed for now. 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Sql server data import

What is "​"  in the PublicId?  Is it part of the data?  Did you check if 
the special characters in your data cause the problem?

Steve

###
Error creating document : SolrInputDocument(fields: [PublicId=10065,​
Id=117])

-Original Message-
From: Verthosa  
Sent: Friday, November 9, 2018 7:51 AM
To: solr-user@lucene.apache.org
Subject: Sql server data import

Hello, i managed to set up a connection to my sql server to import data into 
Solr. The idea is to import filetables but for now i first want to get it 
working using regular tables. So i created 

*data-config.xml*
   
 
 
  

  
  
 


*schema.xml*
i added
  


and changed uniqueKey entry to
Id

When i want to import my data (which is just data like Id: 5, PublicId:
"test"), i get the following error in the logging. 

Error creating document : SolrInputDocument(fields: [PublicId=10065,​
Id=117])


I tried all sorts of things but can't get it fixed. Is anyone want to give me a 
hand?

thanks in advance!




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sql server data import

Which version of Solr is it? Because we have not used schema.xml for a
very long time. It has been managed-schema instead.

Also, have you tried using DIH example that uses database and
modifying it just enough to read data from your database. Even if it
has a lot of extra junk, this would test half of the pipeline, which
you can then transfer to the clean setup.

Regards,
   Alex.
On Fri, 9 Nov 2018 at 08:09, Verthosa  wrote:
>
> Hello, i managed to set up a connection to my sql server to import data into
> Solr. The idea is to import filetables but for now i first want to get it
> working using regular tables. So i created
>
> *data-config.xml*
> 
> driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>
> url="jdbc:sqlserver://localhost;databaseName=inConnexion_Tenant2;integratedSecurity=true"
> />
>   
>
>  
>  
>
>   
>   
>
> *schema.xml*
> i added
>  multiValued="false" />
>  multiValued="false"/>
>
> and changed uniqueKey entry to
> Id
>
> When i want to import my data (which is just data like Id: 5, PublicId:
> "test"), i get the following error in the logging.
>
> Error creating document : SolrInputDocument(fields: [PublicId=10065,​
> Id=117])
>
>
> I tried all sorts of things but can't get it fixed. Is anyone want to give
> me a hand?
>
> thanks in advance!
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Sql server data import

Hello, i managed to set up a connection to my sql server to import data into
Solr. The idea is to import filetables but for now i first want to get it
working using regular tables. So i created 

*data-config.xml*
   
 
 
  

  
  
 


*schema.xml*
i added
  


and changed uniqueKey entry to 
Id

When i want to import my data (which is just data like Id: 5, PublicId:
"test"), i get the following error in the logging. 

Error creating document : SolrInputDocument(fields: [PublicId=10065,​
Id=117])


I tried all sorts of things but can't get it fixed. Is anyone want to give
me a hand?

thanks in advance!




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


SV: data-import-handler for solr-7.5.0

I made it work with the simplest of xml-files with some inspiration from 
https://opensolr.com/blog/2011/09/how-to-import-data-from-xml-files-into-your-solr-collection
 .

Data-config is now:


  

  
  
  


And the document is simply:


   
 2165432
 5
   

   
 28548113
 89
   


Now I guess I just have to add to this solution.

Thanks for your help Alex, and also thanks to Jan answering the first mail.

Best regards
Martin Frank Hansen

-Oprindelig meddelelse-
Fra: Alexandre Rafalovitch 
Sendt: 2. oktober 2018 19:52
Til: solr-user 
Emne: Re: data-import-handler for solr-7.5.0

Ok, so then you can switch to debug mode and keep trying to figure it out. Also 
try BinFileDataSource or URLDataSource, maybe it will have an easier way.

Or using relative path (example:
https://github.com/arafalov/solr-apachecon2018-presentation/blob/master/configsets/pets-final/pets-data-config.xml).

Regards,
   Alex.
On Tue, 2 Oct 2018 at 12:46, Martin Frank Hansen (MHQ)  wrote:
>
> Thanks for the info, the UI looks interesting... It does read the data-config 
> correctly, so the problem is probably in this file.
>
> Martin Frank Hansen, Senior Data Analytiker
>
> Data, IM & Analytics
>
>
>
> Lautrupparken 40-42, DK-2750 Ballerup
> E-mail m...@kmd.dk  Web www.kmd.dk
> Mobil +4525571418
>
> -Oprindelig meddelelse-
> Fra: Alexandre Rafalovitch 
> Sendt: 2. oktober 2018 18:18
> Til: solr-user 
> Emne: Re: data-import-handler for solr-7.5.0
>
> Admin UI for DIH will show you the config file read. So, if nothing is
> there, the path is most likely the issue
>
> You can also provide or update the configuration right in UI if you enable 
> debug.
>
> Finally, the config file is reread on every invocation, so you don't need to 
> restart the core after changing it.
>
> Hope this helps,
>Alex.
> On Tue, 2 Oct 2018 at 11:45, Jan Høydahl  wrote:
> >
> > > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
> >
> > Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ?
> >
> > --
> > Jan Høydahl, search solution architect Cominvent AS -
> > www.cominvent.com
> >
> > > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) :
> > >
> > > Hi,
> > >
> > > I am having some problems getting the data-import-handler in Solr to 
> > > work. I have tried a lot of things but I simply get no response from 
> > > Solr, not even an error.
> > >
> > > When calling the API:
> > > http://localhost:8983/solr/nh/dataimport?command=full-import
> > > {
> > >  "responseHeader":{
> > >"status":0,
> > >"QTime":38},
> > >  "initArgs":[
> > >"defaults",[
> > >
> > > "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
> > >  "command":"full-import",
> > >  "status":"idle",
> > >  "importResponse":"",
> > >  "statusMessages":{}}
> > >
> > > The data looks like this:
> > >
> > > 
> > >  
> > > 2165432
> > > 5  
> > >
> > >  
> > > 28548113
> > > 89   
> > >
> > >
> > > The data-config file looks like this:
> > >
> > > 
> > >  
> > >
> > >   > >name="xml"
> > >pk="id"
> > >processor="XPathEntityProcessor"
> > >stream="true"
> > >forEach="/journal/doc"
> > >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
> > >transformer="RegexTransformer,TemplateTransformer"
> > >>
> > >
> > >
> > >
> > >  
> > >  
> > > 
> > >
> > > And I referenced the jar files in the solr-config.xml as well as adding 
> > > the request-handler by adding the following lines:
> > >
> > >  > > regex="solr-dataimporthandler-\d.*\.jar" />  > > dir="${solr.install.dir:../../../..}/dist/"
> > > regex="solr-dataimporthandler-extras-\d.*\.jar" />
> > >
> > >
> > >  > > class="org.apache.solr.handler.dataimport.DataImportHandler">
> > >
> > >   > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml
> >

Re: data-import-handler for solr-7.5.0

Ok, so then you can switch to debug mode and keep trying to figure it
out. Also try BinFileDataSource or URLDataSource, maybe it will have
an easier way.

Or using relative path (example:
https://github.com/arafalov/solr-apachecon2018-presentation/blob/master/configsets/pets-final/pets-data-config.xml).

Regards,
   Alex.
On Tue, 2 Oct 2018 at 12:46, Martin Frank Hansen (MHQ)  wrote:
>
> Thanks for the info, the UI looks interesting... It does read the data-config 
> correctly, so the problem is probably in this file.
>
> Martin Frank Hansen, Senior Data Analytiker
>
> Data, IM & Analytics
>
>
>
> Lautrupparken 40-42, DK-2750 Ballerup
> E-mail m...@kmd.dk  Web www.kmd.dk
> Mobil +4525571418
>
> -Oprindelig meddelelse-
> Fra: Alexandre Rafalovitch 
> Sendt: 2. oktober 2018 18:18
> Til: solr-user 
> Emne: Re: data-import-handler for solr-7.5.0
>
> Admin UI for DIH will show you the config file read. So, if nothing is there, 
> the path is most likely the issue
>
> You can also provide or update the configuration right in UI if you enable 
> debug.
>
> Finally, the config file is reread on every invocation, so you don't need to 
> restart the core after changing it.
>
> Hope this helps,
>Alex.
> On Tue, 2 Oct 2018 at 11:45, Jan Høydahl  wrote:
> >
> > > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
> >
> > Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ?
> >
> > --
> > Jan Høydahl, search solution architect Cominvent AS -
> > www.cominvent.com
> >
> > > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) :
> > >
> > > Hi,
> > >
> > > I am having some problems getting the data-import-handler in Solr to 
> > > work. I have tried a lot of things but I simply get no response from 
> > > Solr, not even an error.
> > >
> > > When calling the API:
> > > http://localhost:8983/solr/nh/dataimport?command=full-import
> > > {
> > >  "responseHeader":{
> > >"status":0,
> > >"QTime":38},
> > >  "initArgs":[
> > >"defaults",[
> > >  "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
> > >  "command":"full-import",
> > >  "status":"idle",
> > >  "importResponse":"",
> > >  "statusMessages":{}}
> > >
> > > The data looks like this:
> > >
> > > 
> > >  
> > > 2165432
> > > 5  
> > >
> > >  
> > > 28548113
> > > 89   
> > >
> > >
> > > The data-config file looks like this:
> > >
> > > 
> > >  
> > >
> > >   > >name="xml"
> > >pk="id"
> > >processor="XPathEntityProcessor"
> > >stream="true"
> > >forEach="/journal/doc"
> > >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
> > >transformer="RegexTransformer,TemplateTransformer"
> > >>
> > >
> > >
> > >
> > >  
> > >  
> > > 
> > >
> > > And I referenced the jar files in the solr-config.xml as well as adding 
> > > the request-handler by adding the following lines:
> > >
> > >  > > regex="solr-dataimporthandler-\d.*\.jar" />  > > dir="${solr.install.dir:../../../..}/dist/"
> > > regex="solr-dataimporthandler-extras-\d.*\.jar" />
> > >
> > >
> > >  > > class="org.apache.solr.handler.dataimport.DataImportHandler">
> > >
> > >   > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml
> > >
> > >  
> > >
> > > I am running a core residing in the folder 
> > > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in 
> > > “C:/Users/z6mhq/Documents/solr-7.5.0”.
> > >
> > > I really hope that someone can spot my mistake…
> > >
> > > Thanks in advance.
> > >
> > > Martin Frank Hansen
> > >
> > >
> > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder 
> > > du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der 
> > > fortæl

SV: data-import-handler for solr-7.5.0

Thanks for the info, the UI looks interesting... It does read the data-config 
correctly, so the problem is probably in this file.

Martin Frank Hansen, Senior Data Analytiker

Data, IM & Analytics



Lautrupparken 40-42, DK-2750 Ballerup
E-mail m...@kmd.dk  Web www.kmd.dk
Mobil +4525571418

-Oprindelig meddelelse-
Fra: Alexandre Rafalovitch 
Sendt: 2. oktober 2018 18:18
Til: solr-user 
Emne: Re: data-import-handler for solr-7.5.0

Admin UI for DIH will show you the config file read. So, if nothing is there, 
the path is most likely the issue

You can also provide or update the configuration right in UI if you enable 
debug.

Finally, the config file is reread on every invocation, so you don't need to 
restart the core after changing it.

Hope this helps,
   Alex.
On Tue, 2 Oct 2018 at 11:45, Jan Høydahl  wrote:
>
> > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
>
> Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ?
>
> --
> Jan Høydahl, search solution architect Cominvent AS -
> www.cominvent.com
>
> > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) :
> >
> > Hi,
> >
> > I am having some problems getting the data-import-handler in Solr to work. 
> > I have tried a lot of things but I simply get no response from Solr, not 
> > even an error.
> >
> > When calling the API:
> > http://localhost:8983/solr/nh/dataimport?command=full-import
> > {
> >  "responseHeader":{
> >"status":0,
> >"QTime":38},
> >  "initArgs":[
> >"defaults",[
> >  "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
> >  "command":"full-import",
> >  "status":"idle",
> >  "importResponse":"",
> >  "statusMessages":{}}
> >
> > The data looks like this:
> >
> > 
> >  
> > 2165432
> > 5  
> >
> >  
> > 28548113
> > 89   
> >
> >
> > The data-config file looks like this:
> >
> > 
> >  
> >
> >   >name="xml"
> >pk="id"
> >processor="XPathEntityProcessor"
> >stream="true"
> >forEach="/journal/doc"
> >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
> >transformer="RegexTransformer,TemplateTransformer"
> >>
> >
> >
> >
> >  
> >  
> > 
> >
> > And I referenced the jar files in the solr-config.xml as well as adding the 
> > request-handler by adding the following lines:
> >
> >  > regex="solr-dataimporthandler-\d.*\.jar" />  > dir="${solr.install.dir:../../../..}/dist/"
> > regex="solr-dataimporthandler-extras-\d.*\.jar" />
> >
> >
> >  > class="org.apache.solr.handler.dataimport.DataImportHandler">
> >
> >   > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml
> >
> >  
> >
> > I am running a core residing in the folder 
> > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in 
> > “C:/Users/z6mhq/Documents/solr-7.5.0”.
> >
> > I really hope that someone can spot my mistake…
> >
> > Thanks in advance.
> >
> > Martin Frank Hansen
> >
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> > KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der 
> > fortæller, hvordan vi behandler oplysninger om dig.
> >
> > Protection of your personal data is important to us. Here you can read 
> > KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we 
> > process your personal data.
> >
> > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst 
> > informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi 
> > dig slette e-mailen i dit system uden at videresende eller kopiere den. 
> > Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri 
> > for virus og andre fejl, som kan påvirke computeren eller it-systemet, 
> > hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi 
> > påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse 
> > med at modtage og bruge e-mailen.
> >
> > Please note that this message may contain confidential information. If you 
> > have received this message by mistake, please inform the sender of the 
> > mistake by sending a reply, then delete the message from your system 
> > without making, distributing or retaining any copies of it. Although we 
> > believe that the message and any attachments are free from viruses and 
> > other errors that might affect the computer or it-system where it is 
> > received and read, the recipient opens the message at his or her own risk. 
> > We assume no responsibility for any loss or damage arising from the receipt 
> > or use of this message.
>


SV: data-import-handler for solr-7.5.0

Unfortunately, still no luck.

{
  "responseHeader":{
"status":0,
"QTime":8},
  "initArgs":[
"defaults",[
  "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
  "command":"full-import",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{
"Total Requests made to DataSource":"0",
"Total Rows Fetched":"0",
"Total Documents Processed":"0",
"Total Documents Skipped":"0",
"Full Dump Started":"2018-10-02 16:15:21",
"":"Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.",
"Committed":"2018-10-02 16:15:22",
"Time taken":"0:0:0.136"}}

Seems like it is not even trying to read the data.

Martin Frank Hansen

-Oprindelig meddelelse-
Fra: Jan Høydahl 
Sendt: 2. oktober 2018 17:46
Til: solr-user@lucene.apache.org
Emne: Re: data-import-handler for solr-7.5.0

> url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"

Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) :
>
> Hi,
>
> I am having some problems getting the data-import-handler in Solr to work. I 
> have tried a lot of things but I simply get no response from Solr, not even 
> an error.
>
> When calling the API:
> http://localhost:8983/solr/nh/dataimport?command=full-import
> {
>  "responseHeader":{
>"status":0,
>"QTime":38},
>  "initArgs":[
>"defaults",[
>  "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
>  "command":"full-import",
>  "status":"idle",
>  "importResponse":"",
>  "statusMessages":{}}
>
> The data looks like this:
>
> 
>  
> 2165432
> 5  
>
>  
> 28548113
> 89   
>
>
> The data-config file looks like this:
>
> 
>  
>
>  name="xml"
>pk="id"
>processor="XPathEntityProcessor"
>stream="true"
>forEach="/journal/doc"
>url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
>transformer="RegexTransformer,TemplateTransformer"
>>
>
>
>
>  
>  
> 
>
> And I referenced the jar files in the solr-config.xml as well as adding the 
> request-handler by adding the following lines:
>
>  regex="solr-dataimporthandler-\d.*\.jar" />  dir="${solr.install.dir:../../../..}/dist/"
> regex="solr-dataimporthandler-extras-\d.*\.jar" />
>
>
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
>
>   name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml
>
>  
>
> I am running a core residing in the folder 
> “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in 
> “C:/Users/z6mhq/Documents/solr-7.5.0”.
>
> I really hope that someone can spot my mistake…
>
> Thanks in advance.
>
> Martin Frank Hansen
>
>
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
>
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process 
> your personal data.
>
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
>
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.



Re: data-import-handler for solr-7.5.0

Admin UI for DIH will show you the config file read. So, if nothing is
there, the path is most likely the issue

You can also provide or update the configuration right in UI if you
enable debug.

Finally, the config file is reread on every invocation, so you don't
need to restart the core after changing it.

Hope this helps,
   Alex.
On Tue, 2 Oct 2018 at 11:45, Jan Høydahl  wrote:
>
> > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
>
> Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) :
> >
> > Hi,
> >
> > I am having some problems getting the data-import-handler in Solr to work. 
> > I have tried a lot of things but I simply get no response from Solr, not 
> > even an error.
> >
> > When calling the API: 
> > http://localhost:8983/solr/nh/dataimport?command=full-import
> > {
> >  "responseHeader":{
> >"status":0,
> >"QTime":38},
> >  "initArgs":[
> >"defaults",[
> >  "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
> >  "command":"full-import",
> >  "status":"idle",
> >  "importResponse":"",
> >  "statusMessages":{}}
> >
> > The data looks like this:
> >
> > 
> >  
> > 2165432
> > 5
> >  
> >
> >  
> > 28548113
> > 89
> >  
> > 
> >
> >
> > The data-config file looks like this:
> >
> > 
> >  
> >
> >   >name="xml"
> >pk="id"
> >processor="XPathEntityProcessor"
> >stream="true"
> >forEach="/journal/doc"
> >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
> >transformer="RegexTransformer,TemplateTransformer"
> >>
> >
> >
> >
> >  
> >  
> > 
> >
> > And I referenced the jar files in the solr-config.xml as well as adding the 
> > request-handler by adding the following lines:
> >
> >  > regex="solr-dataimporthandler-\d.*\.jar" />
> >  > regex="solr-dataimporthandler-extras-\d.*\.jar" />
> >
> >
> >  > class="org.apache.solr.handler.dataimport.DataImportHandler">
> >
> >   > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml
> >
> >  
> >
> > I am running a core residing in the folder 
> > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in 
> > “C:/Users/z6mhq/Documents/solr-7.5.0”.
> >
> > I really hope that someone can spot my mistake…
> >
> > Thanks in advance.
> >
> > Martin Frank Hansen
> >
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> > KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der 
> > fortæller, hvordan vi behandler oplysninger om dig.
> >
> > Protection of your personal data is important to us. Here you can read 
> > KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we 
> > process your personal data.
> >
> > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst 
> > informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi 
> > dig slette e-mailen i dit system uden at videresende eller kopiere den. 
> > Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri 
> > for virus og andre fejl, som kan påvirke computeren eller it-systemet, 
> > hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi 
> > påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse 
> > med at modtage og bruge e-mailen.
> >
> > Please note that this message may contain confidential information. If you 
> > have received this message by mistake, please inform the sender of the 
> > mistake by sending a reply, then delete the message from your system 
> > without making, distributing or retaining any copies of it. Although we 
> > believe that the message and any attachments are free from viruses and 
> > other errors that might affect the computer or it-system where it is 
> > received and read, the recipient opens the message at his or her own risk. 
> > We assume no responsibility for any loss or damage arising from the receipt 
> > or use of this message.
>


Re: data-import-handler for solr-7.5.0

> url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"

Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) :
> 
> Hi,
> 
> I am having some problems getting the data-import-handler in Solr to work. I 
> have tried a lot of things but I simply get no response from Solr, not even 
> an error.
> 
> When calling the API: 
> http://localhost:8983/solr/nh/dataimport?command=full-import
> {
>  "responseHeader":{
>"status":0,
>"QTime":38},
>  "initArgs":[
>"defaults",[
>  "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
>  "command":"full-import",
>  "status":"idle",
>  "importResponse":"",
>  "statusMessages":{}}
> 
> The data looks like this:
> 
> 
>  
> 2165432
> 5
>  
> 
>  
> 28548113
> 89
>  
> 
> 
> 
> The data-config file looks like this:
> 
> 
>  
>
>  name="xml"
>pk="id"
>processor="XPathEntityProcessor"
>stream="true"
>forEach="/journal/doc"
>url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml"
>transformer="RegexTransformer,TemplateTransformer"
>> 
>
>
> 
>  
>  
> 
> 
> And I referenced the jar files in the solr-config.xml as well as adding the 
> request-handler by adding the following lines:
> 
>  regex="solr-dataimporthandler-\d.*\.jar" />
>  regex="solr-dataimporthandler-extras-\d.*\.jar" />
> 
> 
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
>
>   name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml
>
>  
> 
> I am running a core residing in the folder 
> “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in 
> “C:/Users/z6mhq/Documents/solr-7.5.0”.
> 
> I really hope that someone can spot my mistake…
> 
> Thanks in advance.
> 
> Martin Frank Hansen
> 
> 
> Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
> KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
> hvordan vi behandler oplysninger om dig.
> 
> Protection of your personal data is important to us. Here you can read KMD’s 
> Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process 
> your personal data.
> 
> Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. 
> Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
> afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
> e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen 
> og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og 
> andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages 
> og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget 
> ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge 
> e-mailen.
> 
> Please note that this message may contain confidential information. If you 
> have received this message by mistake, please inform the sender of the 
> mistake by sending a reply, then delete the message from your system without 
> making, distributing or retaining any copies of it. Although we believe that 
> the message and any attachments are free from viruses and other errors that 
> might affect the computer or it-system where it is received and read, the 
> recipient opens the message at his or her own risk. We assume no 
> responsibility for any loss or damage arising from the receipt or use of this 
> message.



data-import-handler for solr-7.5.0

Hi,

I am having some problems getting the data-import-handler in Solr to work. I 
have tried a lot of things but I simply get no response from Solr, not even an 
error.

When calling the API: 
http://localhost:8983/solr/nh/dataimport?command=full-import
{
  "responseHeader":{
"status":0,
"QTime":38},
  "initArgs":[
"defaults",[
  "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]],
  "command":"full-import",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{}}

The data looks like this:


  
 2165432
 5
  

  
 28548113
 89
  



The data-config file looks like this:


  

  



  
  


And I referenced the jar files in the solr-config.xml as well as adding the 
request-handler by adding the following lines:







  C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml

  

I am running a core residing in the folder “C:/Users/z6mhq/Desktop/nh/nh/conf” 
while the Solr installation is in “C:/Users/z6mhq/Documents/solr-7.5.0”.

I really hope that someone can spot my mistake…

Thanks in advance.

Martin Frank Hansen


Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du 
KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, 
hvordan vi behandler oplysninger om dig.

Protection of your personal data is important to us. Here you can read KMD’s 
Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your 
personal data.

Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis 
du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette 
e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og 
ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre 
fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og 
læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar 
for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen.

Please note that this message may contain confidential information. If you have 
received this message by mistake, please inform the sender of the mistake by 
sending a reply, then delete the message from your system without making, 
distributing or retaining any copies of it. Although we believe that the 
message and any attachments are free from viruses and other errors that might 
affect the computer or it-system where it is received and read, the recipient 
opens the message at his or her own risk. We assume no responsibility for any 
loss or damage arising from the receipt or use of this message.


Re: Data Import Handler with Solr Source behind Load Balancer

Hi Thomas,
Is this SolrCloud or Solr master-slave? Do you update index while indexing? Did 
you check if all your instances behind LB are in sync if you are using 
master-slave?
My guess would be that DIH is using cursors to read data from another Solr. If 
you are using multiple Solr instances behind LB there might be some diffs in 
index that results in different documents being returned for the same cursor 
mark. Is num doc and max doc the same on new instance after import?

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Sep 2018, at 05:53, Zimmermann, Thomas  
> wrote:
> 
> We have a Solr v7 Instance sourcing data from a Data Import Handler with a 
> Solr data source running Solr v4. When it hits a single server in that 
> instance directly, all documents are read and written correctly to the v7. 
> When we hit the load balancer DNS entry, the resulting data import handler 
> json states that it read all the documents and skipped none, and all looks 
> fine, but the result set is missing ~20% of the documents in the v7 core. 
> This has happened multiple time on multiple environments.
> 
> Any thoughts on whether this might be a bug in the underlying DIH code? I'll 
> also pass it along to the server admins on our side for input.



Data Import Handler with Solr Source behind Load Balancer

We have a Solr v7 Instance sourcing data from a Data Import Handler with a Solr 
data source running Solr v4. When it hits a single server in that instance 
directly, all documents are read and written correctly to the v7. When we hit 
the load balancer DNS entry, the resulting data import handler json states that 
it read all the documents and skipped none, and all looks fine, but the result 
set is missing ~20% of the documents in the v7 core. This has happened multiple 
time on multiple environments.

Any thoughts on whether this might be a bug in the underlying DIH code? I'll 
also pass it along to the server admins on our side for input.


Re: Data Import from Command Line

Thank you both for the responses. I was able to get the import working
through telnet, and I'll see if I can get the post utility working as that
seems like a better option.

Thanks,
Adam

On Mon, Aug 20, 2018, 2:04 PM Alexandre Rafalovitch 
wrote:

> Admin UI just hits Solr for a particular URL with specific parameters.
> You could totally call it from the command line, but it _would_ need
> to be an HTTP client of some sort. You could encode all of the
> parameters into the DIH (or a new) handler, it is all defined in
> solrconfig.xml (/dataimport is the default one).
>
> If you don't have curl, maybe you have wget? Or lynx? Or, just for
> giggles, you could Telnet into port 80 and manually type the required
> command (
> http://blog.tonycode.com/tech-stuff/http-notes/making-http-requests-via-telnet/
> ):
> GET /dataimport?param=value HTTP/1.0
>
> Regards,
>Alex.
> P.s. And yes, maybe bin/post could be used as well. Or the previous
> direct java invocation of the posttool jar. May need to massage the
> parameters a bit though.
>
> On 20 August 2018 at 13:45, Adam Blank  wrote:
> > Hi,
> >
> > I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way to
> import
> > the index from the command line instead of using the admin console?  I
> > don't have the ability to use a HTTP client such as cURL to connect to
> the
> > console.
> >
> > Thank you,
> > Adam
>


Re: Data Import from Command Line

Admin UI just hits Solr for a particular URL with specific parameters.
You could totally call it from the command line, but it _would_ need
to be an HTTP client of some sort. You could encode all of the
parameters into the DIH (or a new) handler, it is all defined in
solrconfig.xml (/dataimport is the default one).

If you don't have curl, maybe you have wget? Or lynx? Or, just for
giggles, you could Telnet into port 80 and manually type the required
command 
(http://blog.tonycode.com/tech-stuff/http-notes/making-http-requests-via-telnet/):
GET /dataimport?param=value HTTP/1.0

Regards,
   Alex.
P.s. And yes, maybe bin/post could be used as well. Or the previous
direct java invocation of the posttool jar. May need to massage the
parameters a bit though.

On 20 August 2018 at 13:45, Adam Blank  wrote:
> Hi,
>
> I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way to import
> the index from the command line instead of using the admin console?  I
> don't have the ability to use a HTTP client such as cURL to connect to the
> console.
>
> Thank you,
> Adam


Re: Data Import from Command Line

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Adam,

On 8/20/18 1:45 PM, Adam Blank wrote:
> I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way
> to import the index from the command line instead of using the
> admin console?  I don't have the ability to use a HTTP client such
> as cURL to connect to the console.

I'm not sure when it was added, but there is a program called "post"
which comes with later versions of Solr that can be used to load data
into an index.

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt7AfcACgkQHPApP6U8
pFgtgw/7BTV7shvNcXKrpTB11g0wjYXAJOlqARlYgWFcQIhVcs1jfbJi8O6Yxh0x
BIA/EAdob9zC/EgYbMfkM/duibr2A1/wF+CkhhTd6M/HcoSOXbI31L1LDo/xa0lg
z6t3AO9WYYKnFmD2JIxdidH1zHpIz74cAc3q43PFVtLNW2fVT2cNlg7Vn6vdVmoi
79VLPnvdyxZRdQtxbhdvCribPdFP6YLC3dgxh1KeeZzdO0OcjQykSrssX/hd207z
9iuw2TusoUIgXQsMLRtnKqqVp38MYPppk49uGprhB8iTJjDAVlvgD3jURef7S7s/
w1KBPVZTGQFh6cvzjOOZHUkaj0hX4PuYkun/hQY3Uy5kBIw5fo0Y10bjVcRZGYrb
SQDTUe0sdfU27qaY8DLqSf21to5K+wTIuOO28C1TkHkjKymg0w7THz583o0aOCzr
5fjNN00FevrWFLm+n7c2tToW3H1cAZkh5XRDDDUYnqzVzchSOHlFKM1X0gMOq8Lf
If434uctruwsqBrkscTWcS5UALGLxuwtNk9trLLeRII8YapB6MI6xoUnCvWFv1sO
fziqKXXwBmrI+v/1FqiR8Md3r32jm8Gy54acViJc9+szUEM26C+FSzvsdGnf5oVr
tlsHVwLBPORS6hGJ+MvqMGkrxlO1WNm5MrJxHNoyQ5KqAL7WT+s=
=+VTK
-END PGP SIGNATURE-


Data Import from Command Line

Hi,

I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way to import
the index from the command line instead of using the admin console?  I
don't have the ability to use a HTTP client such as cURL to connect to the
console.

Thank you,
Adam


Re: Child=true does not work for data import handler

But in my case i see output as below


  0
  0
  
*:*
on
xml
1533734431931
  


  
IT
1
1
1608130338704326656
  
Data
1
2
1608130338704326656
  
omkar
1
1608130338704326656
  
ITI
2
3
1608130338712715264
  
Entry
2
4
1608130338712715264
  
ashwin
2
1608130338712715264





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Child=true does not work for data import handler

This is how nested docs look like. These are document blocks with parent in
the end. Block Join Queries work on these blocks.

On Wed, Aug 8, 2018 at 12:47 PM omp...@rediffmail.com <
omkar.pra...@gmail.com> wrote:

> Thanks a lot Mikhail. But as per documentation below nested document
> ingestion is possible. Is this limitation of DIH?
>
>
> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments
>
>
> Also can block join query be used to get expect relationship for data i
> have
> ingested using DIH?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Child=true does not work for data import handler

Thanks a lot Mikhail. But as per documentation below nested document
ingestion is possible. Is this limitation of DIH?

https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments


Also can block join query be used to get expect relationship for data i have
ingested using DIH?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Child=true does not work for data import handler

It never works like you expect. You need to search for parents and then
hook up [child]. I see some improvements are coming, but now that is.

On Mon, Aug 6, 2018 at 9:11 PM omp...@rediffmail.com 
wrote:

> Thanks Mikhail verbose did help. _root_ field was missing in schema also in
> make some changes in child entity. Like i created id as alias to emp_id (
> in
> child query) which is id column of parent table.
>
>  query="SELECT id,name
> FROM emp">
> 
> 
>  name="child"  query="SELECT dept,emp_id as id
> FROM emp_details where emp_id='${parent.id}' ">
>  column="dept" name="dept" />
> 
> 
>
>
> Data seems to be returning correctly as below. but it show child documents
> and parent documents are shown as individual document. i was expecting 2
> documents and 2 child document for each doc.
> Any inputs will be helpful
>
>
>  "response":{"numFound":6,"start":0,"docs":[
>   {
> "dept":"IT",
> "id":"1",
> "_version_":1608073809653399552},
>   {
> "dept":"Data",
> "id":"1",
> "_version_":1608073809653399552},
>   {
> "name":"omkar",
> "id":"1",
> "_version_":1608073809653399552},
>   {
> "dept":"ITI",
> "id":"2",
> "_version_":1608073809667031040},
>   {
> "dept":"Entry",
> "id":"2",
> "_version_":1608073809667031040},
>   {
> "name":"ashwin",
> "id":"2",
> "_version_":1608073809667031040}]
>   }}
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Child=true does not work for data import handler

Thanks Mikhail verbose did help. _root_ field was missing in schema also in
make some changes in child entity. Like i created id as alias to emp_id ( in
child query) which is id column of parent table.




 





Data seems to be returning correctly as below. but it show child documents
and parent documents are shown as individual document. i was expecting 2
documents and 2 child document for each doc.
Any inputs will be helpful


 "response":{"numFound":6,"start":0,"docs":[
  {
"dept":"IT",
"id":"1",
"_version_":1608073809653399552},
  {
"dept":"Data",
"id":"1",
"_version_":1608073809653399552},
  {
"name":"omkar",
"id":"1",
"_version_":1608073809653399552},
  {
"dept":"ITI",
"id":"2",
"_version_":1608073809667031040},
  {
"dept":"Entry",
"id":"2",
"_version_":1608073809667031040},
  {
"name":"ashwin",
"id":"2",
"_version_":1608073809667031040}]
  }}



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Child=true does not work for data import handler

DIH has debug&verbose modes. Have you tried to use them?

On Mon, Aug 6, 2018 at 4:11 PM omp...@rediffmail.com 
wrote:

> Thanks Mikhail, i tried changing conf but that did not help
>
> 
>  driver="com.mysql.jdbc.Driver"
>   url="jdbc:mysql://localhost:3306/test"
>   user="root"
>   password=""
>   session.group_concat_max_len = '7'
>   />
>
>
>
>  transformer="RegexTransformer"
>  query="SELECT id,name  FROM emp">
> 
> 
>
>  
>  name="dept" />
>  name="childpk" />
>  
>
> 
>
>   
>
> 
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Child=true does not work for data import handler

Thanks Mikhail, i tried changing conf but that did not help


  
 
   
   




 


 
 


  
  





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Child=true does not work for data import handler

Hi, Omkar.

Could it happen that child docs as well as parents are assigned same "id"
field values implicitly and removed due to uniqueKey collision?

On Sat, Aug 4, 2018 at 10:12 PM omkar.pra...@gmail.com <
omkar.pra...@gmail.com> wrote:

> I am using similar db-data config as below for indexing this parent-child
> data. solr version 6.6.2
>
> SELECT   id as emp_id,   name FROM emp;
> +++
> | emp_id | name   |
> +++
> |  1 | omkar  |
> |  2 | ashwin |
> +++
> 2 rows in set (0.00 sec)
>
> select  * from emp_details ;
> +--++---+
> | id   | emp_id | dept  |
> +--++---+
> |1 |  1 | IT|
> |2 |  1 | Data  |
> |3 |  2 | ITI   |
> |4 |  2 | Entry |
> +--++---+
> 4 rows in set (0.00 sec)
>
> 
>  driver="com.mysql.jdbc.Driver"
>   url="jdbc:mysql://localhost:3306/test"
>   user="root"
>   password=""
>   session.group_concat_max_len = '7'
>   />
>
>
>
>  transformer="RegexTransformer"
>  query=" SELECT   id, name FROM  emp">
>
> 
> 
>
>  
>  name="dept" />
>  
>
> 
>
>   
>
> 
>
>
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":0,
> "params":{
>   "q":"*:*",
>   "indent":"on",
>   "wt":"json",
>   "_":"1533325469162"}},
>   "response":{"numFound":2,"start":0,"docs":[
>   {
> "name":"omkar",
> "id":"1",
> "dept":"IT",
> "_version_":1607809693975052288},
>   {
> "name":"ashwin",
> "id":"2",
> "dept":"ITI",
> "_version_":1607809693978198016}]
>   }}
>
>
> I am expecting multi child documents. so i added child=true
>
>  
>
> but output of indexing is as below and it does not process any doucment
>
> Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
> Requests: 3 , Fetched: 6 , Skipped: 0 , Processed: 0
> Started: less than a minute ago
>
> can you helping me if there is any issue with db or solr config
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Child=true does not work for data import handler

I am using similar db-data config as below for indexing this parent-child
data. solr version 6.6.2

SELECT   id as emp_id,   name FROM emp;
+++
| emp_id | name   |
+++
|  1 | omkar  |
|  2 | ashwin |
+++
2 rows in set (0.00 sec)

select  * from emp_details ;
+--++---+
| id   | emp_id | dept  |
+--++---+
|1 |  1 | IT|
|2 |  1 | Data  |
|3 |  2 | ITI   |
|4 |  2 | Entry |
+--++---+
4 rows in set (0.00 sec)


  
 
   
   





 

 
 


  
  




{
  "responseHeader":{
"status":0,
"QTime":0,
"params":{
  "q":"*:*",
  "indent":"on",
  "wt":"json",
  "_":"1533325469162"}},
  "response":{"numFound":2,"start":0,"docs":[
  {
"name":"omkar",
"id":"1",
"dept":"IT",
"_version_":1607809693975052288},
  {
"name":"ashwin",
"id":"2",
"dept":"ITI",
"_version_":1607809693978198016}]
  }}


I am expecting multi child documents. so i added child=true

 

but output of indexing is as below and it does not process any doucment

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 3 , Fetched: 6 , Skipped: 0 , Processed: 0 
Started: less than a minute ago

can you helping me if there is any issue with db or solr config 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to use tika-OCR in data import handler?

Hi,

I am trying to use tika-OCR(Tesseract) in data import handler
and found that processing English documents was quite good.

But I am struggling to process the other languages such as
Japanese, Chinese, etc...

So, I want to know how to switch Tesseract-OCR's processing
language via data import handler config or tikaConfig param.

Any points would be appreciated.

Thanks,
Yasufumi


Re: How to know the name(url) of documents that data import handler skipped

Hi, Rahul.

Thank you for your reply.
I already tried that, and I could see what files were read(via
FileDataSource) and what files were added(via UpdateLog).
So, by checking both, I could determine bad files.
But I want to know bad files directly.

Thanks,
Yasufumi

2018年7月9日(月) 12:47 Rahul Singh :

> Have you tried changing the log level
> https://lucene.apache.org/solr/guide/7_2/configuring-logging.html
>
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
> On Jul 8, 2018, 8:54 PM -0500, Yasufumi Mizoguchi ,
> wrote:
> > Hi,
> >
> > I am trying to indexing files into Solr 7.2 using data import handler
> with
> > onError=skip option.
> > But, I am struggling with determining the skipped documents as logs do
> not
> > tell which file was bad.
> > So, how can I know those files?
> >
> > Thanks,
> > Yasufumi
>


Re: How to know the name(url) of documents that data import handler skipped

Have you tried changing the log level
https://lucene.apache.org/solr/guide/7_2/configuring-logging.html


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On Jul 8, 2018, 8:54 PM -0500, Yasufumi Mizoguchi , 
wrote:
> Hi,
>
> I am trying to indexing files into Solr 7.2 using data import handler with
> onError=skip option.
> But, I am struggling with determining the skipped documents as logs do not
> tell which file was bad.
> So, how can I know those files?
>
> Thanks,
> Yasufumi


How to know the name(url) of documents that data import handler skipped

Hi,

I am trying to indexing files into Solr 7.2 using data import handler with
onError=skip option.
But, I am struggling with determining the skipped documents as logs do not
tell which file was bad.
So, how can I know those files?

Thanks,
Yasufumi


RE: SolrCloud DIH (Data Import Handler) MySQL 404

I have add debug and I have this error:

null:java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:429)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:183)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:530)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
at java.lang.Thread.run(Thread.java:748)

what mysql jdbc connector version I need ?





-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] 
Envoyé : jeudi 26 avril 2018 13:13
À : solr-user@lucene.apache.org
Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404

Hello,

Where I add that? In the Solr start command?

I have add -verbose:class in the /etc/default/solr.in.sh file but they logs are 
they sames.

Thanks,

-Message d'origine-
De : Mikhail Khludnev [mailto:m...@apache.org] Envoyé : mercredi 25 avril 2018 
15:40 À : solr-user  Objet : Re: SolrCloud DIH 
(Data Import Handler) MySQL 404

Can you share more log lines around this odd NPE?
It might be necessary to restart jvm with -verbose:class and look through its' 
output to find why it can't load this class.

On Wed, Apr 25, 2018 at 11:42 AM, msaunier  wrote:

> Hello Shawn,
>
> I have install SolrCloud 7.3 on an other server and the problem not apear.
> I create a Jira Ticket ?
>
> But I have an other problem:
>
> Full Import 
> failed:org.apache.solr.handler.dataimport.DataImp

RE: SolrCloud DIH (Data Import Handler) MySQL 404

Hello,

Where I add that? In the Solr start command?

I have add -verbose:class in the /etc/default/solr.in.sh file but they logs are 
they sames.

Thanks,

-Message d'origine-
De : Mikhail Khludnev [mailto:m...@apache.org] 
Envoyé : mercredi 25 avril 2018 15:40
À : solr-user 
Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404

Can you share more log lines around this odd NPE?
It might be necessary to restart jvm with -verbose:class and look through its' 
output to find why it can't load this class.

On Wed, Apr 25, 2018 at 11:42 AM, msaunier  wrote:

> Hello Shawn,
>
> I have install SolrCloud 7.3 on an other server and the problem not apear.
> I create a Jira Ticket ?
>
> But I have an other problem:
>
> Full Import 
> failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to PropertyWriter implementation:ZKPropertiesWriter
> at org.apache.solr.handler.dataimport.DataImporter.
> createPropertyWriter(DataImporter.java:330)
> at org.apache.solr.handler.dataimport.DataImporter.
> doFullImport(DataImporter.java:411)
> at org.apache.solr.handler.dataimport.DataImporter.
> runCmd(DataImporter.java:474)
> at org.apache.solr.handler.dataimport.DataImporter.
> lambda$runAsync$0(DataImporter.java:457)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at org.apache.solr.handler.dataimport.DocBuilder.
> loadClass(DocBuilder.java:935)
> at org.apache.solr.handler.dataimport.DataImporter.
> createPropertyWriter(DataImporter.java:326)
> ... 4 more
>
> I regard to solved the problem.
>
> Cordialement,
>
>
>
>
>
> -Message d'origine-
> De : Shawn Heisey [mailto:elyog...@elyograg.org] Envoyé : mardi 24 
> avril 2018 17:39 À : solr-user@lucene.apache.org Objet : Re: SolrCloud 
> DIH (Data Import Handler) MySQL 404
>
> On 4/24/2018 2:03 AM, msaunier wrote:
> > If I access to the interface, I have a null pointer exception:
> >
> > null:java.lang.NullPointerException
> >   at
> > org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandler
> > Ba
> > se.java:233)
>
> The line of code where this exception occurred uses fundamental Java 
> methods. Based on the error, either the getClass method common to all 
> java objects, or the getPackage method on the class, is returning 
> null. That shouldn't be possible.  This has me wondering whether there 
> is something broken in your particular Solr installation -- corrupt 
> jars, or something like that.  Or maybe something broken in your Java.
>
> Thanks,
> Shawn
>
>
>


--
Sincerely yours
Mikhail Khludnev



Re: SolrCloud DIH (Data Import Handler) MySQL 404

Can you share more log lines around this odd NPE?
It might be necessary to restart jvm with -verbose:class and look through
its' output to find why it can't load this class.

On Wed, Apr 25, 2018 at 11:42 AM, msaunier  wrote:

> Hello Shawn,
>
> I have install SolrCloud 7.3 on an other server and the problem not apear.
> I create a Jira Ticket ?
>
> But I have an other problem:
>
> Full Import 
> failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to PropertyWriter implementation:ZKPropertiesWriter
> at org.apache.solr.handler.dataimport.DataImporter.
> createPropertyWriter(DataImporter.java:330)
> at org.apache.solr.handler.dataimport.DataImporter.
> doFullImport(DataImporter.java:411)
> at org.apache.solr.handler.dataimport.DataImporter.
> runCmd(DataImporter.java:474)
> at org.apache.solr.handler.dataimport.DataImporter.
> lambda$runAsync$0(DataImporter.java:457)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at org.apache.solr.handler.dataimport.DocBuilder.
> loadClass(DocBuilder.java:935)
> at org.apache.solr.handler.dataimport.DataImporter.
> createPropertyWriter(DataImporter.java:326)
> ... 4 more
>
> I regard to solved the problem.
>
> Cordialement,
>
>
>
>
>
> -Message d'origine-
> De : Shawn Heisey [mailto:elyog...@elyograg.org]
> Envoyé : mardi 24 avril 2018 17:39
> À : solr-user@lucene.apache.org
> Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404
>
> On 4/24/2018 2:03 AM, msaunier wrote:
> > If I access to the interface, I have a null pointer exception:
> >
> > null:java.lang.NullPointerException
> >   at
> > org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBa
> > se.java:233)
>
> The line of code where this exception occurred uses fundamental Java
> methods. Based on the error, either the getClass method common to all java
> objects, or the getPackage method on the class, is returning null. That
> shouldn't be possible.  This has me wondering whether there is something
> broken in your particular Solr installation -- corrupt jars, or something
> like that.  Or maybe something broken in your Java.
>
> Thanks,
> Shawn
>
>
>


-- 
Sincerely yours
Mikhail Khludnev


RE: SolrCloud DIH (Data Import Handler) MySQL 404

Hello Shawn,

I have install SolrCloud 7.3 on an other server and the problem not apear. I 
create a Jira Ticket ?

But I have an other problem:

Full Import 
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
PropertyWriter implementation:ZKPropertiesWriter
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:330)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:326)
... 4 more

I regard to solved the problem.

Cordialement,





-Message d'origine-
De : Shawn Heisey [mailto:elyog...@elyograg.org] 
Envoyé : mardi 24 avril 2018 17:39
À : solr-user@lucene.apache.org
Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404

On 4/24/2018 2:03 AM, msaunier wrote:
> If I access to the interface, I have a null pointer exception:
>
> null:java.lang.NullPointerException
>   at 
> org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBa
> se.java:233)

The line of code where this exception occurred uses fundamental Java methods. 
Based on the error, either the getClass method common to all java objects, or 
the getPackage method on the class, is returning null. That shouldn't be 
possible.  This has me wondering whether there is something broken in your 
particular Solr installation -- corrupt jars, or something like that.  Or maybe 
something broken in your Java.

Thanks,
Shawn




Re: SolrCloud DIH (Data Import Handler) MySQL 404


On 4/24/2018 2:03 AM, msaunier wrote:

If I access to the interface, I have a null pointer exception:

null:java.lang.NullPointerException
at 
org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBase.java:233)


The line of code where this exception occurred uses fundamental Java 
methods. Based on the error, either the getClass method common to all 
java objects, or the getPackage method on the class, is returning null.  
That shouldn't be possible.  This has me wondering whether there is 
something broken in your particular Solr installation -- corrupt jars, 
or something like that.  Or maybe something broken in your Java.


Thanks,
Shawn



RE: SolrCloud DIH (Data Import Handler) MySQL 404

I have modify DIH definition to simplify but sames errors:

## indexation_events.xml











##

Maxence,





-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] 
Envoyé : mardi 24 avril 2018 10:04
À : solr-user@lucene.apache.org
Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404

If I access to the interface, I have a null pointer exception:

null:java.lang.NullPointerException
at 
org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBase.java:233)
at 
org.apache.solr.handler.admin.SolrInfoMBeanHandler.addMBean(SolrInfoMBeanHandler.java:187)
at 
org.apache.solr.handler.admin.SolrInfoMBeanHandler.getMBeanInfo(SolrInfoMBeanHandler.java:163)
at 
org.apache.solr.handler.admin.SolrInfoMBeanHandler.handleRequestBody(SolrInfoMBeanHandler.java:80)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)





-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] Envoyé : mardi 24 avril 2018 09:25 À 
: solr-user@lucene.apache.org Objet : RE: SolrCloud DIH (Data Import Handler) 
MySQL 404

Hello Shawn,
Thanks for your answers. 

#
So, indexation_events.xml file is:














































#
And the config file is the configoverlay.xml, it's in cloud:

{
  "updateProcessor":{},

  

RE: SolrCloud DIH (Data Import Handler) MySQL 404

If I access to the interface, I have a null pointer exception:

null:java.lang.NullPointerException
at 
org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBase.java:233)
at 
org.apache.solr.handler.admin.SolrInfoMBeanHandler.addMBean(SolrInfoMBeanHandler.java:187)
at 
org.apache.solr.handler.admin.SolrInfoMBeanHandler.getMBeanInfo(SolrInfoMBeanHandler.java:163)
at 
org.apache.solr.handler.admin.SolrInfoMBeanHandler.handleRequestBody(SolrInfoMBeanHandler.java:80)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)





-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] 
Envoyé : mardi 24 avril 2018 09:25
À : solr-user@lucene.apache.org
Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404

Hello Shawn,
Thanks for your answers. 

#
So, indexation_events.xml file is:














































#
And the config file is the configoverlay.xml, it's in cloud:

{
  "updateProcessor":{},

  "runtimeLib":{
"mysql-connector-java":{
  "name":"mysql-connector-java",
      "version":1},

"data-import-handler":{
  "name":"data-import-handler",
  "version":1}},

  "requestHandler":{"/test_dih":{
  "name":"/test_dih",
  "class":"org.apache.so

RE: SolrCloud DIH (Data Import Handler) MySQL 404

Hello Shawn,
Thanks for your answers. 

#
So, indexation_events.xml file is:














































#
And the config file is the configoverlay.xml, it's in cloud:

{
  "updateProcessor":{},

  "runtimeLib":{
"mysql-connector-java":{
  "name":"mysql-connector-java",
  "version":1},

"data-import-handler":{
  "name":"data-import-handler",
  "version":1}},

  "requestHandler":{"/test_dih":{
  "name":"/test_dih",
  "class":"org.apache.solr.handler.dataimport.DataImportHandler",
  "runtimeLib":true,
  "version":1,
  "defaults":{"config":"DIH/indexation_events.xml"}}}
}

I go to regard the solr.log

Thanks,
Maxence





-Message d'origine-
De : Shawn Heisey [mailto:apa...@elyograg.org] 
Envoyé : lundi 23 avril 2018 18:28
À : solr-user@lucene.apache.org
Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404

On 4/23/2018 8:30 AM, msaunier wrote:
> I have add debug:
>
> curl
> "http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=f
> ull-im
> port&commit=true&debug=true"
>name="responseHeader">500 name="QTime">588 name="runtimeLib">true1 name="defaults"> name="config">DIH/indexation_events.xml

Re: SolrCloud DIH (Data Import Handler) MySQL 404

t;
> curl
> "http://srv-formation-solr:8983/solr/arguments_test/test_
> dih?command=full-im
> port&commit=true&debug=true&command=reload-config"
> 
> 
> 500 name="QTime">647 name="msg">java.util.Arrays$ArrayList cannot be cast to
> java.lang.Stringjava.lang.ClassCastException:
> java.util.Arrays$ArrayList cannot be cast to java.lang.String
> at
> org.apache.solr.handler.dataimport.RequestInfo.<
> init>(RequestInfo.java
> :52)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.
> handleRequestBody(DataI
> mportHandler.java:128)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.
> java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:529)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:
> 361)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:
> 305)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler
> .java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143
> )
> at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java
> :226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java
> :1180)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:
> 185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:
> 1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141
> )
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHand
> lerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.
> java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:1
> 34)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
> RewriteHandler.java:
> 335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:1
> 34)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConne
> ction.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:
> 93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceC
> onsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(
> ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProd
> uceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:
> 671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:5
> 89)
> at java.lang.Thread.run(Thread.java:748)
> 500
> 
>
>
> -Message d'origine-
> De : msaunier [mailto:msaun...@citya.com]
> Envoyé : lundi 23 avril 2018 14:47
> À : solr-user@lucene.apache.org
> Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404
>
> I have correct url to : curl
> http://srv-formation-solr:8983/solr/arguments_test/test_
> dih?command=full-imp
> ort
>
> And change overlay config
> "/configs/arguments_test/DIH/indexation_events.xml" to "
> DIH/indexation_events.xml"
>
> But I have a new error:
>
> Full Import
> failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable
> to PropertyWriter implementation:ZKPropertiesWriter
> at
> org.apache.solr.handle

Re: SolrCloud DIH (Data Import Handler) MySQL 404


On 4/23/2018 8:30 AM, msaunier wrote:

I have add debug:

curl
"http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-im
port&commit=true&debug=true"


500588true1DIH/indexation_events.xml

This is looking like a really nasty error that I cannot understand, 
possibly caused by an error in configuration.


Can you share your dataimport handler config (will likely be in 
solrconfig.xml) and the contents of DIH/indexation_events.xml?  There is 
probably a database password in that file, you'll want to redact that.


You should look at solr.log and see if there are other errors happening 
that didn't make it into the response.


Thanks,
Shawn



Re: SolrCloud DIH (Data Import Handler) MySQL 404


On 4/23/2018 6:12 AM, msaunier wrote:

I have a problem with DIH in SolrCloud. I don't understand why, so I need
your help.

Solr 6.6 in Cloud.

##

COMMAND:

curl http://srv-formation-solr:8983/solr/test_dih?command=full-import

RESULT:


   
 
 Error 404 Not Found
   
   HTTP ERROR 404
 Problem accessing /solr/test_dih. Reason:
   Not Found
   



This looks like an incomplete URL.

What exactly is test-dih?  If it is the name of your collection, then 
you are missing the handler, which is usually "/dataimport". If 
"/test-dih" is the name if your handler, then you are missing the name 
of the core or the collection.


With SolrCloud, it's actually better to direct your request to a 
specific core for DIH, something like collection_shard1_replica1.  If 
you direct it to the collection you never know which core will actually 
end up with the request, and will have a hard time getting the status of 
the import if the status request ends up on a different core than the 
full-import command.


A correct full URL should look something like this:

http://host:port/solr/test_shard1_replica2/dataimport?command=full-import

Looking at later messages, you may have figured this out at least 
partially.  The exception in your second message looks really odd.  (and 
I really have no idea what you are talking about with an overlay)


Thanks,
Shawn



RE: SolrCloud DIH (Data Import Handler) MySQL 404

er.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java
:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java
:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:
185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:
1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141
)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.
java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
34)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:
335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1
34)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConne
ction.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:
93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceC
onsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(
ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProd
uceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:
671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:5
89)
at java.lang.Thread.run(Thread.java:748)
500



-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] 
Envoyé : lundi 23 avril 2018 14:47
À : solr-user@lucene.apache.org
Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404

I have correct url to : curl
http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-imp
ort

And change overlay config
"/configs/arguments_test/DIH/indexation_events.xml" to "
DIH/indexation_events.xml"

But I have a new error:

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to PropertyWriter implementation:ZKPropertiesWriter
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:330)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474
)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport
er.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:326)
... 4 more

Cordialement,





-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] Envoyé : lundi 23 avril 2018 14:12
À : solr-user@lucene.apache.org Objet : SolrCloud DIH (Data Import Handler)
MySQL 404

Hello,

 

I have a problem with DIH in SolrCloud. I don't understand why, so I need
your help.

 

Solr 6.6 in Cloud.

 

##

COMMAND:

curl http://srv-formation-solr:8983/solr/test_dih?command=full-import

 

RESULT:



  



Error 404 Not Found

  

  HTTP ERROR 404

Problem accessing /solr/test_dih. Reason:

  Not Found

  



 

 

##

CONFIG:

1.  I have create with the command the .system collection

2.  I have post in the blob the DataImportHandler jar file and the MySQL
connector jar

3.  I have add data-import-handler and mysql-connector-java runtimeLib
on the configoverlay.json file with the API

4.  I have create the DIH folder on the cloud with zkcli.sh script

5.  I have push with zkcli the DIH .xml configuration file

 

CONFIGOVERLAY CONTENT :

{

  "runtimeLib":{

    "mysql-connector-java":{

  "name":"mysql-connector-java",

  "version":1},

"data-import-handler":{

  "name":"data-import-handler",

  "version":1}},

  "requestHandler":{"/test_dih":{

  "name":"/test_dih",

  "class":"org.apache.solr.handler.dataimport.DataImportHandler",

  "runtimeLib":true,

  "version":1,

 
"defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}}

}

 

 

Thanks for your help





RE: SolrCloud DIH (Data Import Handler) MySQL 404

I have correct url to : curl
http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-imp
ort

And change overlay config
"/configs/arguments_test/DIH/indexation_events.xml" to "
DIH/indexation_events.xml"

But I have a new error:

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
to PropertyWriter implementation:ZKPropertiesWriter
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:330)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:411)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474
)
at
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport
er.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935)
at
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp
orter.java:326)
... 4 more

Cordialement,





-Message d'origine-
De : msaunier [mailto:msaun...@citya.com] 
Envoyé : lundi 23 avril 2018 14:12
À : solr-user@lucene.apache.org
Objet : SolrCloud DIH (Data Import Handler) MySQL 404

Hello,

 

I have a problem with DIH in SolrCloud. I don't understand why, so I need
your help.

 

Solr 6.6 in Cloud.

 

##

COMMAND:

curl http://srv-formation-solr:8983/solr/test_dih?command=full-import

 

RESULT:



  



Error 404 Not Found

  

  HTTP ERROR 404

Problem accessing /solr/test_dih. Reason:

  Not Found

  



 

 

##

CONFIG:

1.  I have create with the command the .system collection

2.  I have post in the blob the DataImportHandler jar file and the MySQL
connector jar

3.  I have add data-import-handler and mysql-connector-java runtimeLib
on the configoverlay.json file with the API

4.  I have create the DIH folder on the cloud with zkcli.sh script

5.  I have push with zkcli the DIH .xml configuration file

 

CONFIGOVERLAY CONTENT :

{

  "runtimeLib":{

"mysql-connector-java":{

  "name":"mysql-connector-java",

  "version":1},

"data-import-handler":{

  "name":"data-import-handler",

  "version":1}},

  "requestHandler":{"/test_dih":{

  "name":"/test_dih",

  "class":"org.apache.solr.handler.dataimport.DataImportHandler",

  "runtimeLib":true,

  "version":1,

 
"defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}}

}

 

 

Thanks for your help




SolrCloud DIH (Data Import Handler) MySQL 404

Hello,

 

I have a problem with DIH in SolrCloud. I don't understand why, so I need
your help.

 

Solr 6.6 in Cloud.

 

##

COMMAND:

curl http://srv-formation-solr:8983/solr/test_dih?command=full-import

 

RESULT:



  



Error 404 Not Found

  

  HTTP ERROR 404

Problem accessing /solr/test_dih. Reason:

  Not Found

  



 

 

##

CONFIG:

1.  I have create with the command the .system collection

2.  I have post in the blob the DataImportHandler jar file and the MySQL
connector jar

3.  I have add data-import-handler and mysql-connector-java runtimeLib
on the configoverlay.json file with the API

4.  I have create the DIH folder on the cloud with zkcli.sh script

5.  I have push with zkcli the DIH .xml configuration file

 

CONFIGOVERLAY CONTENT :

{

  "runtimeLib":{

"mysql-connector-java":{

  "name":"mysql-connector-java",

  "version":1},

"data-import-handler":{

  "name":"data-import-handler",

  "version":1}},

  "requestHandler":{"/test_dih":{

  "name":"/test_dih",

  "class":"org.apache.solr.handler.dataimport.DataImportHandler",

  "runtimeLib":true,

  "version":1,

 
"defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}}

}

 

 

Thanks for your help



Re: Data import batch mode for delta


On 4/16/2018 7:32 PM, gadelkareem wrote:

I cannot complain cuz it actually worked well for me so far but..

I still do not understand if Solr already paginates the results from the
full import, why not do the same for the delta. It is almost the same query:
`select id from t where t.lastmod > ${solrTime}`
`select * from t where id IN ${dataimporter.ids} limit 1000 offset 0`
and so on..


Solr does not paginate SQL queries made by the dataimport handler 
(DIH).  It sends the query exactly as it is configured in the DIH config.


Thanks,
Shawn



Re: Data import batch mode for delta

Thanks Shawn.

I cannot complain cuz it actually worked well for me so far but..

I still do not understand if Solr already paginates the results from the
full import, why not do the same for the delta. It is almost the same query:
`select id from t where t.lastmod > ${solrTime}`
`select * from t where id IN ${dataimporter.ids} limit 1000 offset 0` 
and so on..



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Data import batch mode for delta


On 4/5/2018 7:31 PM, gadelkareem wrote:

Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of
something like where id IN ('${dataimporter.id})'


Because there's only one value for that property.

If the deltaQuery returns a million rows, then deltaImportQuery is going 
to be executed a million times.  Once for each row returned by the 
deltaQuery.


That IS as inefficient as it sounds.  Think of the dataimport handler as 
a stop-gap solution -- to help you get started with loading data from a 
database, until you can write a proper application to do your indexing.


Thanks,
Shawn



Data import batch mode for delta

Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of
something like where id IN ('${dataimporter.id})'



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: data import class not found

I just tried putting the solr-dataimporthandler-6.6.0.jar in server/solr/lib 
and I got past the problem.  I still don't understand why not found in /dist

-Original Message-
From: Steve Pruitt [mailto:bpru...@opentext.com] 
Sent: Thursday, August 31, 2017 3:05 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] - data import class not found

I still can't understand how Solr establishes the classpath.

I have a custom entity processor that subclasses EntityProcessorBase.  When I 
execute the /dataimport call I get

java.lang.NoClassDefFoundError: 
org/apache/solr/handler/dataimport/EntityProcessorBase

no matter how I state in solrconfig.xml to locate the solr-dataimporthandler 
jar.

I have  tried:

from the existing libs in solrconfig.xml 

from the Ref Guide


try anything


But, I always get the class not found error.  The DataImportHandler class is 
found when Solr starts, since EntityProcessorBase is in the same jar why is it 
not found.

I have not tried putting in the core's lib thinking the above should work.  Of 
course, the 3rd choice is only an experiment.


Thanks.

-S


data import class not found

I still can't understand how Solr establishes the classpath.

I have a custom entity processor that subclasses EntityProcessorBase.  When I 
execute the /dataimport call I get

java.lang.NoClassDefFoundError: 
org/apache/solr/handler/dataimport/EntityProcessorBase

no matter how I state in solrconfig.xml to locate the solr-dataimporthandler 
jar.

I have  tried:

from the existing libs in solrconfig.xml


from the Ref Guide


try anything


But, I always get the class not found error.  The DataImportHandler class is 
found when Solr starts, since EntityProcessorBase is in the same jar why is it 
not found.

I have not tried putting in the core's lib thinking the above should work.  Of 
course, the 3rd choice is only an experiment.


Thanks.

-S


SOLR 4.10 Data import error

Hi,


I am getting following error , when I index data using Dataimporter.


I am using File Data source in the data config file

here is the config file




  

  



  



















































  



  






The error is:


ERROR org.apache.solr.handler.dataimport.DocBuilder: Exception while 
processing: f document : 
null:org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
 (resolved to: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:286)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:502)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Could not 
find file: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
 (resolved to: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:127)
at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:86)
at 
org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:48)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283)
... 11 more
Caused by: java.io.FileNotFoundException: Could not find file: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
 (resolved to: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
at 
org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:123)
... 14 more
ERROR org.apache.solr.handler.dataimport.DataImporter: Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
 (resolved to: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
 (resolved to: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: 
/opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_

Re: Solr Search Problem with Multiple Data-Import Handler

I suspect Erik's right that clean=true is the problem. That's the default
in the DIH interface.


I find that when using DIH, it's best to set preImportDeleteQuery for every
entity. This safely scopes the clean variable to just that entity.
It doesn't look like the docs have examples of using preImportDeleteQuery,
so I put one here:




On Wed, Jun 21, 2017 at 7:48 PM Erick Erickson 
wrote:

> First place I'd look is whether the jobs have clean=true set. If so the
> first thing DIH does is delete all documents.
>
> Best,
> Erick
>
> On Wed, Jun 21, 2017 at 3:52 PM, Pandey Brahmdev 
> wrote:
>
> > Hi,
> > I have setup Apache Solr 6.6.0 on Windows 10, 64-bit.
> >
> > I have created a simple core & configured DataImport Handlers.
> > I have configured 2 dataImport handlers in the Solr-config.xml file.
> >
> > First for to connect to DB & have data from DB Tables.
> > And Second for to have data from all pdf files using TikaEntityProcessor.
> >
> > Now the problem is there is no error in the console or anywhere but
> > whenever I want to search using "Query" tab it gives me the result of
> Data
> > Import.
> >
> > So let's say if I last Imported data for Tables then it gives me to
> result
> > from the table and if I imported PDF Files then it searches inside PDF
> > Files.
> >
> > But now when I again want to search for DB Tables values then It doesn't
> > give me the result instead I again need to Import Data for
> > DataImportHandler for File & vice-versa.
> >
> > Can you please help me out here?
> > Very sorry if I am doing anything wrong as I have started using Apache
> Solr
> > only 2 days back.
> >
> > Thanks & Regards,
> > Brahmdev Pandey
> > +46 767086309 <+46%2076%20708%2063%2009>
> >
>


Re: Solr Search Problem with Multiple Data-Import Handler

First place I'd look is whether the jobs have clean=true set. If so the
first thing DIH does is delete all documents.

Best,
Erick

On Wed, Jun 21, 2017 at 3:52 PM, Pandey Brahmdev 
wrote:

> Hi,
> I have setup Apache Solr 6.6.0 on Windows 10, 64-bit.
>
> I have created a simple core & configured DataImport Handlers.
> I have configured 2 dataImport handlers in the Solr-config.xml file.
>
> First for to connect to DB & have data from DB Tables.
> And Second for to have data from all pdf files using TikaEntityProcessor.
>
> Now the problem is there is no error in the console or anywhere but
> whenever I want to search using "Query" tab it gives me the result of Data
> Import.
>
> So let's say if I last Imported data for Tables then it gives me to result
> from the table and if I imported PDF Files then it searches inside PDF
> Files.
>
> But now when I again want to search for DB Tables values then It doesn't
> give me the result instead I again need to Import Data for
> DataImportHandler for File & vice-versa.
>
> Can you please help me out here?
> Very sorry if I am doing anything wrong as I have started using Apache Solr
> only 2 days back.
>
> Thanks & Regards,
> Brahmdev Pandey
> +46 767086309
>


Solr Search Problem with Multiple Data-Import Handler

Hi,
I have setup Apache Solr 6.6.0 on Windows 10, 64-bit.

I have created a simple core & configured DataImport Handlers.
I have configured 2 dataImport handlers in the Solr-config.xml file.

First for to connect to DB & have data from DB Tables.
And Second for to have data from all pdf files using TikaEntityProcessor.

Now the problem is there is no error in the console or anywhere but
whenever I want to search using "Query" tab it gives me the result of Data
Import.

So let's say if I last Imported data for Tables then it gives me to result
from the table and if I imported PDF Files then it searches inside PDF
Files.

But now when I again want to search for DB Tables values then It doesn't
give me the result instead I again need to Import Data for
DataImportHandler for File & vice-versa.

Can you please help me out here?
Very sorry if I am doing anything wrong as I have started using Apache Solr
only 2 days back.

Thanks & Regards,
Brahmdev Pandey
+46 767086309


Data import handler and no status in web-ui

Hi,

I use DIH in solr-cloud mode (implicit route) in solr6.5.1.
When I start the import it works fine and I see the progress in the logfile.
However, when I click the "Refresh Status" button in the web-ui while the 
import is running
I only see "No information available (idle)". 
So I have to look in the logfile the observe when the import was finished.

In the old solr, non-cloud and non-partitioned, there was a hourglass while the 
import was running.

Any idea?

Best regards
Thomas


RE: Using the Data Import Handler with SQLite

Hi Zac,
  I think you have added entity closing tag 2 times. that might be
causing an issue. It been a long time . not sure whether you are still
working on it or not. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-the-Data-Import-Handler-with-SQLite-tp2765655p4336690.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Version conflict during data import from another Solr instance into clean Solr

Hi, I ran into the same problem. Chris' first solution worked for us, however
the second solution on its own doesn't work, as the conflict error arises
before the update processors' code is even reached. However, creating an
alias for the _version_ field in the dataconfig file, together with an
update processor that removes the temporary field (and possibly other
unwanted fields) seemed to work great for us.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Version-conflict-during-data-import-from-another-Solr-instance-into-clean-Solr-tp4046937p4331876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Import

If Solr is down, then adding through SolrJ would fail as well. Kafka's new
API has some great features for this sort of thing. The new client API is
designed to be run in a long-running loop where you poll for new messages
with a certain amount of defined timeout (ex: consumer.poll(1000) for 1s)
So if Solr becomes unstable or goes down, it's easy to have the consumer
just stop and either wait until Solr comes back up or save the data to
disk/commit the Kafka offsets to ZK and stop running.

On Fri, Mar 17, 2017 at 1:24 PM, OTH  wrote:

> Are Kafka and SQS interchangeable?  (The latter does not seem to be free.)
>
> @Wunder:
> I'm assuming, that updating to Solr would fail if Solr is unavailable not
> just if posting via say a DB trigger, but probably also if trying to post
> through SolrJ?  (Which is what I'm using for now.)  So, even if using
> SolrJ, it would be a good idea to use a queuing software?
>
> Thanks
>
> On Fri, Mar 17, 2017 at 10:12 PM, vishal jain  wrote:
>
> > Streaming the data through kafka would be a good option if near real time
> > data indexing is the key requirement.
> > In our application the RDBMS data is populated by an ETL job periodically
> > so we don't need real time data indexing for now.
> >
> > Cheers,
> > Vishal
> >
> > On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson <
> erickerick...@gmail.com>
> > wrote:
> >
> > > Or set a trigger on your RDBMS's main table to put the relevant
> > > information in a different table (call it EVENTS) and have your SolrJ
> > > consult the EVENTS table periodically. Essentially you're using the
> > > EVENTS table as a queue where the trigger is the producer and the
> > > SolrJ program is the consumer.
> > >
> > > It's a polling solution though, so not event-driven. There's no
> > > mechanism that I know of have, say, your RDBMS push an event to DIH
> > > for instance.
> > >
> > > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
> > > for this kind of problem..
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
> > >  wrote:
> > > > One assumes by hooking into the same code that updates RDBMS, as
> > > > opposed to be reverse engineering the changes from looking at the DB
> > > > content. This would be especially the case for Delete changes.
> > > >
> > > > Regards,
> > > >Alex.
> > > > 
> > > > http://www.solr-start.com/ - Resources for Solr users, new and
> > > experienced
> > > >
> > > >
> > > > On 17 March 2017 at 11:37, OTH  wrote:
> > > >>>
> > > >>> Also, solrj is good when you want your RDBMS updates make
> immediately
> > > >>> available in solr.
> > > >>
> > > >> How can SolrJ be used to make RDBMS updates immediately available?
> > > >> Thanks
> > > >>
> > > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
> > > sujaybawas...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Vishal,
> > > >>>
> > > >>> As per my experience DIH is the best for RDBMS to solr index. DIH
> > with
> > > >>> caching has best performance. DIH nested entities allow you to
> define
> > > >>> simple queries.
> > > >>> Also, solrj is good when you want your RDBMS updates make
> immediately
> > > >>> available in solr. DIH full import can be used for index all data
> > first
> > > >>> time or restore index in case index is corrupted.
> > > >>>
> > > >>> Thanks,
> > > >>> Sujay
> > > >>>
> > > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain 
> > > wrote:
> > > >>>
> > > >>> > Hi,
> > > >>> >
> > > >>> >
> > > >>> > I am new to Solr and am trying to move data from my RDBMS to
> Solr.
> > I
> > > know
> > > >>> > the available options are:
> > > >>> > 1) Post Tool
> > > >>> > 2) DIH
> > > >>> > 3) SolrJ (as ours is a J2EE application).
> > > >>> >
> > > >>> > I want to know what is the recommended way for Data import in
> > > production
> > > >>> > environment.
> > > >>> > Will sending data via SolrJ in batches be faster than posting a
> csv
> > > using
> > > >>> > POST tool?
> > > >>> >
> > > >>> >
> > > >>> > Thanks,
> > > >>> > Vishal
> > > >>> >
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Thanks,
> > > >>> Sujay P Bawaskar
> > > >>> M:+91-77091 53669
> > > >>>
> > >
> >
>


Re: Data Import

Are Kafka and SQS interchangeable?  (The latter does not seem to be free.)

@Wunder:
I'm assuming, that updating to Solr would fail if Solr is unavailable not
just if posting via say a DB trigger, but probably also if trying to post
through SolrJ?  (Which is what I'm using for now.)  So, even if using
SolrJ, it would be a good idea to use a queuing software?

Thanks

On Fri, Mar 17, 2017 at 10:12 PM, vishal jain  wrote:

> Streaming the data through kafka would be a good option if near real time
> data indexing is the key requirement.
> In our application the RDBMS data is populated by an ETL job periodically
> so we don't need real time data indexing for now.
>
> Cheers,
> Vishal
>
> On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson 
> wrote:
>
> > Or set a trigger on your RDBMS's main table to put the relevant
> > information in a different table (call it EVENTS) and have your SolrJ
> > consult the EVENTS table periodically. Essentially you're using the
> > EVENTS table as a queue where the trigger is the producer and the
> > SolrJ program is the consumer.
> >
> > It's a polling solution though, so not event-driven. There's no
> > mechanism that I know of have, say, your RDBMS push an event to DIH
> > for instance.
> >
> > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
> > for this kind of problem..
> >
> > Best,
> > Erick
> >
> > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
> >  wrote:
> > > One assumes by hooking into the same code that updates RDBMS, as
> > > opposed to be reverse engineering the changes from looking at the DB
> > > content. This would be especially the case for Delete changes.
> > >
> > > Regards,
> > >Alex.
> > > 
> > > http://www.solr-start.com/ - Resources for Solr users, new and
> > experienced
> > >
> > >
> > > On 17 March 2017 at 11:37, OTH  wrote:
> > >>>
> > >>> Also, solrj is good when you want your RDBMS updates make immediately
> > >>> available in solr.
> > >>
> > >> How can SolrJ be used to make RDBMS updates immediately available?
> > >> Thanks
> > >>
> > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
> > sujaybawas...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Vishal,
> > >>>
> > >>> As per my experience DIH is the best for RDBMS to solr index. DIH
> with
> > >>> caching has best performance. DIH nested entities allow you to define
> > >>> simple queries.
> > >>> Also, solrj is good when you want your RDBMS updates make immediately
> > >>> available in solr. DIH full import can be used for index all data
> first
> > >>> time or restore index in case index is corrupted.
> > >>>
> > >>> Thanks,
> > >>> Sujay
> > >>>
> > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain 
> > wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> >
> > >>> > I am new to Solr and am trying to move data from my RDBMS to Solr.
> I
> > know
> > >>> > the available options are:
> > >>> > 1) Post Tool
> > >>> > 2) DIH
> > >>> > 3) SolrJ (as ours is a J2EE application).
> > >>> >
> > >>> > I want to know what is the recommended way for Data import in
> > production
> > >>> > environment.
> > >>> > Will sending data via SolrJ in batches be faster than posting a csv
> > using
> > >>> > POST tool?
> > >>> >
> > >>> >
> > >>> > Thanks,
> > >>> > Vishal
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks,
> > >>> Sujay P Bawaskar
> > >>> M:+91-77091 53669
> > >>>
> >
>


RE: Data Import

NO, I use the free version. I have the driver from someone else. I can share it 
if you want to use Cassandra.
They have modified it for me since the free JDBC driver I found will timeout 
when the document is greater than 16mb.

Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com



-Original Message-
From: vishal jain [mailto:jain02...@gmail.com]
Sent: Friday, March 17, 2017 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Data Import

Hi Daphne,

Are you using DSE?


Thanks & Regards,
Vishal

On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne 
wrote:

> I just want to share my recent project. I have successfully sent all
> our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import
> JDBC Cassandra connector indexing our documents.
> Since Cassandra is so fast for writing, compression rate is around 13%
> and all my documents can be keep in my Cassandra clusters' memory, we
> are very happy with the result.
>
>
> Kind regards,
>
> Daphne Liu
> BI Architect - Matrix SCM
>
> CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL
> 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 /
> daphne@cevalogistics.com
>
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Friday, March 17, 2017 9:54 AM
> To: solr-user 
> Subject: Re: Data Import
>
> I feel DIH is much better for prototyping, even though people do use
> it in production. If you do want to use DIH, you may benefit from
> reviewing the DIH-DB example I am currently rewriting in
> https://issues.apache.org/jira/browse/SOLR-10312 (may need to change
> luceneMatchVersion in solrconfig.xml first).
>
> CSV, etc, could be useful if you want to keep history of past imports,
> again useful during development, as you evolve schema.
>
> SolrJ may actually be easiest/best for production since you already
> have Java stack.
>
> The choice is yours in the end.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
>
>
> On 17 March 2017 at 08:56, Shawn Heisey  wrote:
> > On 3/17/2017 3:04 AM, vishal jain wrote:
> >> I am new to Solr and am trying to move data from my RDBMS to Solr.
> >> I
> know the available options are:
> >> 1) Post Tool
> >> 2) DIH
> >> 3) SolrJ (as ours is a J2EE application).
> >>
> >> I want to know what is the recommended way for Data import in
> >> production environment. Will sending data via SolrJ in batches be
> faster than posting a csv using POST tool?
> >
> > I've heard that CSV import runs EXTREMELY fast, but I have never
> > tested it.  The same threading problem that I discuss below would
> > apply to indexing this way.
> >
> > DIH is extremely powerful, but it has one glaring problem:  It's
> > single-threaded, which means that only one stream of data is going
> > into Solr, and each batch of documents to be inserted must wait for
> > the previous one to finish inserting before it can start.  I do not
> > know if DIH batches documents or sends them in one at a time.  If
> > you have a manually sharded index, you can run DIH on each shard in
> > parallel, but each one will be single-threaded.  That single thread
> > is pretty efficient, but it's still only one thread.
> >
> > Sending multiple index updates to Solr in parallel (multi-threading)
> > is how you radically speed up the Solr part of indexing.  This is
> > usually done with a custom indexing program, which might be written
> > with SolrJ or even in a completely different language.
> >
> > One thing to keep in mind with ANY indexing method:  Once the
> > situation is examined closely, most people find that it's not Solr
> > that makes their indexing slow.  The bottleneck is usually the
> > source system -- how quickly the data can be retrieved.  It usually
> > takes a lot longer to obtain the data than it does for Solr to index it.
> >
> > Thanks,
> > Shawn
> >
> This e-mail message is intended for the above named recipient(s) only.
> It may contain confidential information that is privileged. If you are
> not the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this e-mail and any
> attachment(s) is strictly prohibited. If you have received this e-mail
> by error, please immediately notify the sender by replying to this
> e-mail and deleting the message including 

Re: Data Import

Streaming the data through kafka would be a good option if near real time
data indexing is the key requirement.
In our application the RDBMS data is populated by an ETL job periodically
so we don't need real time data indexing for now.

Cheers,
Vishal

On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson 
wrote:

> Or set a trigger on your RDBMS's main table to put the relevant
> information in a different table (call it EVENTS) and have your SolrJ
> consult the EVENTS table periodically. Essentially you're using the
> EVENTS table as a queue where the trigger is the producer and the
> SolrJ program is the consumer.
>
> It's a polling solution though, so not event-driven. There's no
> mechanism that I know of have, say, your RDBMS push an event to DIH
> for instance.
>
> Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
> for this kind of problem..
>
> Best,
> Erick
>
> On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
>  wrote:
> > One assumes by hooking into the same code that updates RDBMS, as
> > opposed to be reverse engineering the changes from looking at the DB
> > content. This would be especially the case for Delete changes.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 17 March 2017 at 11:37, OTH  wrote:
> >>>
> >>> Also, solrj is good when you want your RDBMS updates make immediately
> >>> available in solr.
> >>
> >> How can SolrJ be used to make RDBMS updates immediately available?
> >> Thanks
> >>
> >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
> sujaybawas...@gmail.com>
> >> wrote:
> >>
> >>> Hi Vishal,
> >>>
> >>> As per my experience DIH is the best for RDBMS to solr index. DIH with
> >>> caching has best performance. DIH nested entities allow you to define
> >>> simple queries.
> >>> Also, solrj is good when you want your RDBMS updates make immediately
> >>> available in solr. DIH full import can be used for index all data first
> >>> time or restore index in case index is corrupted.
> >>>
> >>> Thanks,
> >>> Sujay
> >>>
> >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain 
> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> >
> >>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I
> know
> >>> > the available options are:
> >>> > 1) Post Tool
> >>> > 2) DIH
> >>> > 3) SolrJ (as ours is a J2EE application).
> >>> >
> >>> > I want to know what is the recommended way for Data import in
> production
> >>> > environment.
> >>> > Will sending data via SolrJ in batches be faster than posting a csv
> using
> >>> > POST tool?
> >>> >
> >>> >
> >>> > Thanks,
> >>> > Vishal
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Sujay P Bawaskar
> >>> M:+91-77091 53669
> >>>
>


Re: Data Import

That fails if Solr is not available.

To avoid dropping updates, you need some kind of persistent queue. We use 
Amazon SQS for our incremental updates.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 17, 2017, at 10:09 AM, OTH  wrote:
> 
> Could the database trigger not just post the change to solr?
> 
> On Fri, Mar 17, 2017 at 10:00 PM, Erick Erickson 
> wrote:
> 
>> Or set a trigger on your RDBMS's main table to put the relevant
>> information in a different table (call it EVENTS) and have your SolrJ
>> consult the EVENTS table periodically. Essentially you're using the
>> EVENTS table as a queue where the trigger is the producer and the
>> SolrJ program is the consumer.
>> 
>> It's a polling solution though, so not event-driven. There's no
>> mechanism that I know of have, say, your RDBMS push an event to DIH
>> for instance.
>> 
>> Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
>> for this kind of problem..
>> 
>> Best,
>> Erick
>> 
>> On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
>>  wrote:
>>> One assumes by hooking into the same code that updates RDBMS, as
>>> opposed to be reverse engineering the changes from looking at the DB
>>> content. This would be especially the case for Delete changes.
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and
>> experienced
>>> 
>>> 
>>> On 17 March 2017 at 11:37, OTH  wrote:
>>>>> 
>>>>> Also, solrj is good when you want your RDBMS updates make immediately
>>>>> available in solr.
>>>> 
>>>> How can SolrJ be used to make RDBMS updates immediately available?
>>>> Thanks
>>>> 
>>>> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
>> sujaybawas...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi Vishal,
>>>>> 
>>>>> As per my experience DIH is the best for RDBMS to solr index. DIH with
>>>>> caching has best performance. DIH nested entities allow you to define
>>>>> simple queries.
>>>>> Also, solrj is good when you want your RDBMS updates make immediately
>>>>> available in solr. DIH full import can be used for index all data first
>>>>> time or restore index in case index is corrupted.
>>>>> 
>>>>> Thanks,
>>>>> Sujay
>>>>> 
>>>>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain 
>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> 
>>>>>> I am new to Solr and am trying to move data from my RDBMS to Solr. I
>> know
>>>>>> the available options are:
>>>>>> 1) Post Tool
>>>>>> 2) DIH
>>>>>> 3) SolrJ (as ours is a J2EE application).
>>>>>> 
>>>>>> I want to know what is the recommended way for Data import in
>> production
>>>>>> environment.
>>>>>> Will sending data via SolrJ in batches be faster than posting a csv
>> using
>>>>>> POST tool?
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Vishal
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Thanks,
>>>>> Sujay P Bawaskar
>>>>> M:+91-77091 53669
>>>>> 
>> 



Re: Data Import

Hi Daphne,

Are you using DSE?


Thanks & Regards,
Vishal

On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne 
wrote:

> I just want to share my recent project. I have successfully sent all our
> EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import JDBC
> Cassandra connector indexing our documents.
> Since Cassandra is so fast for writing, compression rate is around 13% and
> all my documents can be keep in my Cassandra clusters' memory, we are very
> happy with the result.
>
>
> Kind regards,
>
> Daphne Liu
> BI Architect - Matrix SCM
>
> CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL
> 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 /
> daphne@cevalogistics.com
>
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Friday, March 17, 2017 9:54 AM
> To: solr-user 
> Subject: Re: Data Import
>
> I feel DIH is much better for prototyping, even though people do use it in
> production. If you do want to use DIH, you may benefit from reviewing the
> DIH-DB example I am currently rewriting in
> https://issues.apache.org/jira/browse/SOLR-10312 (may need to change
> luceneMatchVersion in solrconfig.xml first).
>
> CSV, etc, could be useful if you want to keep history of past imports,
> again useful during development, as you evolve schema.
>
> SolrJ may actually be easiest/best for production since you already have
> Java stack.
>
> The choice is yours in the end.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 17 March 2017 at 08:56, Shawn Heisey  wrote:
> > On 3/17/2017 3:04 AM, vishal jain wrote:
> >> I am new to Solr and am trying to move data from my RDBMS to Solr. I
> know the available options are:
> >> 1) Post Tool
> >> 2) DIH
> >> 3) SolrJ (as ours is a J2EE application).
> >>
> >> I want to know what is the recommended way for Data import in
> >> production environment. Will sending data via SolrJ in batches be
> faster than posting a csv using POST tool?
> >
> > I've heard that CSV import runs EXTREMELY fast, but I have never
> > tested it.  The same threading problem that I discuss below would
> > apply to indexing this way.
> >
> > DIH is extremely powerful, but it has one glaring problem:  It's
> > single-threaded, which means that only one stream of data is going
> > into Solr, and each batch of documents to be inserted must wait for
> > the previous one to finish inserting before it can start.  I do not
> > know if DIH batches documents or sends them in one at a time.  If you
> > have a manually sharded index, you can run DIH on each shard in
> > parallel, but each one will be single-threaded.  That single thread is
> > pretty efficient, but it's still only one thread.
> >
> > Sending multiple index updates to Solr in parallel (multi-threading)
> > is how you radically speed up the Solr part of indexing.  This is
> > usually done with a custom indexing program, which might be written
> > with SolrJ or even in a completely different language.
> >
> > One thing to keep in mind with ANY indexing method:  Once the
> > situation is examined closely, most people find that it's not Solr
> > that makes their indexing slow.  The bottleneck is usually the source
> > system -- how quickly the data can be retrieved.  It usually takes a
> > lot longer to obtain the data than it does for Solr to index it.
> >
> > Thanks,
> > Shawn
> >
> This e-mail message is intended for the above named recipient(s) only. It
> may contain confidential information that is privileged. If you are not the
> intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this e-mail and any attachment(s) is strictly
> prohibited. If you have received this e-mail by error, please immediately
> notify the sender by replying to this e-mail and deleting the message
> including any attachment(s) from your system. Thank you in advance for your
> cooperation and assistance. Although the company has taken reasonable
> precautions to ensure no viruses are present in this email, the company
> cannot accept responsibility for any loss or damage arising from the use of
> this email or attachments.
>


Re: Data Import

Could the database trigger not just post the change to solr?

On Fri, Mar 17, 2017 at 10:00 PM, Erick Erickson 
wrote:

> Or set a trigger on your RDBMS's main table to put the relevant
> information in a different table (call it EVENTS) and have your SolrJ
> consult the EVENTS table periodically. Essentially you're using the
> EVENTS table as a queue where the trigger is the producer and the
> SolrJ program is the consumer.
>
> It's a polling solution though, so not event-driven. There's no
> mechanism that I know of have, say, your RDBMS push an event to DIH
> for instance.
>
> Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
> for this kind of problem..
>
> Best,
> Erick
>
> On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
>  wrote:
> > One assumes by hooking into the same code that updates RDBMS, as
> > opposed to be reverse engineering the changes from looking at the DB
> > content. This would be especially the case for Delete changes.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 17 March 2017 at 11:37, OTH  wrote:
> >>>
> >>> Also, solrj is good when you want your RDBMS updates make immediately
> >>> available in solr.
> >>
> >> How can SolrJ be used to make RDBMS updates immediately available?
> >> Thanks
> >>
> >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar <
> sujaybawas...@gmail.com>
> >> wrote:
> >>
> >>> Hi Vishal,
> >>>
> >>> As per my experience DIH is the best for RDBMS to solr index. DIH with
> >>> caching has best performance. DIH nested entities allow you to define
> >>> simple queries.
> >>> Also, solrj is good when you want your RDBMS updates make immediately
> >>> available in solr. DIH full import can be used for index all data first
> >>> time or restore index in case index is corrupted.
> >>>
> >>> Thanks,
> >>> Sujay
> >>>
> >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain 
> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> >
> >>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I
> know
> >>> > the available options are:
> >>> > 1) Post Tool
> >>> > 2) DIH
> >>> > 3) SolrJ (as ours is a J2EE application).
> >>> >
> >>> > I want to know what is the recommended way for Data import in
> production
> >>> > environment.
> >>> > Will sending data via SolrJ in batches be faster than posting a csv
> using
> >>> > POST tool?
> >>> >
> >>> >
> >>> > Thanks,
> >>> > Vishal
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Sujay P Bawaskar
> >>> M:+91-77091 53669
> >>>
>


Re: Data Import

Or set a trigger on your RDBMS's main table to put the relevant
information in a different table (call it EVENTS) and have your SolrJ
consult the EVENTS table periodically. Essentially you're using the
EVENTS table as a queue where the trigger is the producer and the
SolrJ program is the consumer.

It's a polling solution though, so not event-driven. There's no
mechanism that I know of have, say, your RDBMS push an event to DIH
for instance.

Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka)
for this kind of problem..

Best,
Erick

On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch
 wrote:
> One assumes by hooking into the same code that updates RDBMS, as
> opposed to be reverse engineering the changes from looking at the DB
> content. This would be especially the case for Delete changes.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 17 March 2017 at 11:37, OTH  wrote:
>>>
>>> Also, solrj is good when you want your RDBMS updates make immediately
>>> available in solr.
>>
>> How can SolrJ be used to make RDBMS updates immediately available?
>> Thanks
>>
>> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar 
>> wrote:
>>
>>> Hi Vishal,
>>>
>>> As per my experience DIH is the best for RDBMS to solr index. DIH with
>>> caching has best performance. DIH nested entities allow you to define
>>> simple queries.
>>> Also, solrj is good when you want your RDBMS updates make immediately
>>> available in solr. DIH full import can be used for index all data first
>>> time or restore index in case index is corrupted.
>>>
>>> Thanks,
>>> Sujay
>>>
>>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain  wrote:
>>>
>>> > Hi,
>>> >
>>> >
>>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I know
>>> > the available options are:
>>> > 1) Post Tool
>>> > 2) DIH
>>> > 3) SolrJ (as ours is a J2EE application).
>>> >
>>> > I want to know what is the recommended way for Data import in production
>>> > environment.
>>> > Will sending data via SolrJ in batches be faster than posting a csv using
>>> > POST tool?
>>> >
>>> >
>>> > Thanks,
>>> > Vishal
>>> >
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Sujay P Bawaskar
>>> M:+91-77091 53669
>>>


Re: Data Import

Thanks to all of you for the valuable inputs.
Being on J2ee platform I also felt using solrJ in a multi threaded
environment would be a better choice to index RDBMS data into SolrCloud.
I will try with a scheduler triggered micro service to do the job using
SolrJ.

Regards,
Vishal

On Fri, Mar 17, 2017 at 9:11 PM, Alexandre Rafalovitch 
wrote:

> One assumes by hooking into the same code that updates RDBMS, as
> opposed to be reverse engineering the changes from looking at the DB
> content. This would be especially the case for Delete changes.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 17 March 2017 at 11:37, OTH  wrote:
> >>
> >> Also, solrj is good when you want your RDBMS updates make immediately
> >> available in solr.
> >
> > How can SolrJ be used to make RDBMS updates immediately available?
> > Thanks
> >
> > On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar  >
> > wrote:
> >
> >> Hi Vishal,
> >>
> >> As per my experience DIH is the best for RDBMS to solr index. DIH with
> >> caching has best performance. DIH nested entities allow you to define
> >> simple queries.
> >> Also, solrj is good when you want your RDBMS updates make immediately
> >> available in solr. DIH full import can be used for index all data first
> >> time or restore index in case index is corrupted.
> >>
> >> Thanks,
> >> Sujay
> >>
> >> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain 
> wrote:
> >>
> >> > Hi,
> >> >
> >> >
> >> > I am new to Solr and am trying to move data from my RDBMS to Solr. I
> know
> >> > the available options are:
> >> > 1) Post Tool
> >> > 2) DIH
> >> > 3) SolrJ (as ours is a J2EE application).
> >> >
> >> > I want to know what is the recommended way for Data import in
> production
> >> > environment.
> >> > Will sending data via SolrJ in batches be faster than posting a csv
> using
> >> > POST tool?
> >> >
> >> >
> >> > Thanks,
> >> > Vishal
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Sujay P Bawaskar
> >> M:+91-77091 53669
> >>
>


Re: Data Import

One assumes by hooking into the same code that updates RDBMS, as
opposed to be reverse engineering the changes from looking at the DB
content. This would be especially the case for Delete changes.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 17 March 2017 at 11:37, OTH  wrote:
>>
>> Also, solrj is good when you want your RDBMS updates make immediately
>> available in solr.
>
> How can SolrJ be used to make RDBMS updates immediately available?
> Thanks
>
> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar 
> wrote:
>
>> Hi Vishal,
>>
>> As per my experience DIH is the best for RDBMS to solr index. DIH with
>> caching has best performance. DIH nested entities allow you to define
>> simple queries.
>> Also, solrj is good when you want your RDBMS updates make immediately
>> available in solr. DIH full import can be used for index all data first
>> time or restore index in case index is corrupted.
>>
>> Thanks,
>> Sujay
>>
>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain  wrote:
>>
>> > Hi,
>> >
>> >
>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I know
>> > the available options are:
>> > 1) Post Tool
>> > 2) DIH
>> > 3) SolrJ (as ours is a J2EE application).
>> >
>> > I want to know what is the recommended way for Data import in production
>> > environment.
>> > Will sending data via SolrJ in batches be faster than posting a csv using
>> > POST tool?
>> >
>> >
>> > Thanks,
>> > Vishal
>> >
>>
>>
>>
>> --
>> Thanks,
>> Sujay P Bawaskar
>> M:+91-77091 53669
>>


Re: Data Import

>
> Also, solrj is good when you want your RDBMS updates make immediately
> available in solr.

How can SolrJ be used to make RDBMS updates immediately available?
Thanks

On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar 
wrote:

> Hi Vishal,
>
> As per my experience DIH is the best for RDBMS to solr index. DIH with
> caching has best performance. DIH nested entities allow you to define
> simple queries.
> Also, solrj is good when you want your RDBMS updates make immediately
> available in solr. DIH full import can be used for index all data first
> time or restore index in case index is corrupted.
>
> Thanks,
> Sujay
>
> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain  wrote:
>
> > Hi,
> >
> >
> > I am new to Solr and am trying to move data from my RDBMS to Solr. I know
> > the available options are:
> > 1) Post Tool
> > 2) DIH
> > 3) SolrJ (as ours is a J2EE application).
> >
> > I want to know what is the recommended way for Data import in production
> > environment.
> > Will sending data via SolrJ in batches be faster than posting a csv using
> > POST tool?
> >
> >
> > Thanks,
> > Vishal
> >
>
>
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
>


RE: Data Import

I just want to share my recent project. I have successfully sent all our EDI 
documents to Cassandra 3.7 clusters using Solr 6.3 Data Import JDBC Cassandra 
connector indexing our documents.
Since Cassandra is so fast for writing, compression rate is around 13% and all 
my documents can be keep in my Cassandra clusters' memory, we are very happy 
with the result.


Kind regards,

Daphne Liu
BI Architect - Matrix SCM

CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
daphne@cevalogistics.com



-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Friday, March 17, 2017 9:54 AM
To: solr-user 
Subject: Re: Data Import

I feel DIH is much better for prototyping, even though people do use it in 
production. If you do want to use DIH, you may benefit from reviewing the 
DIH-DB example I am currently rewriting in
https://issues.apache.org/jira/browse/SOLR-10312 (may need to change 
luceneMatchVersion in solrconfig.xml first).

CSV, etc, could be useful if you want to keep history of past imports, again 
useful during development, as you evolve schema.

SolrJ may actually be easiest/best for production since you already have Java 
stack.

The choice is yours in the end.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 17 March 2017 at 08:56, Shawn Heisey  wrote:
> On 3/17/2017 3:04 AM, vishal jain wrote:
>> I am new to Solr and am trying to move data from my RDBMS to Solr. I know 
>> the available options are:
>> 1) Post Tool
>> 2) DIH
>> 3) SolrJ (as ours is a J2EE application).
>>
>> I want to know what is the recommended way for Data import in
>> production environment. Will sending data via SolrJ in batches be faster 
>> than posting a csv using POST tool?
>
> I've heard that CSV import runs EXTREMELY fast, but I have never
> tested it.  The same threading problem that I discuss below would
> apply to indexing this way.
>
> DIH is extremely powerful, but it has one glaring problem:  It's
> single-threaded, which means that only one stream of data is going
> into Solr, and each batch of documents to be inserted must wait for
> the previous one to finish inserting before it can start.  I do not
> know if DIH batches documents or sends them in one at a time.  If you
> have a manually sharded index, you can run DIH on each shard in
> parallel, but each one will be single-threaded.  That single thread is
> pretty efficient, but it's still only one thread.
>
> Sending multiple index updates to Solr in parallel (multi-threading)
> is how you radically speed up the Solr part of indexing.  This is
> usually done with a custom indexing program, which might be written
> with SolrJ or even in a completely different language.
>
> One thing to keep in mind with ANY indexing method:  Once the
> situation is examined closely, most people find that it's not Solr
> that makes their indexing slow.  The bottleneck is usually the source
> system -- how quickly the data can be retrieved.  It usually takes a
> lot longer to obtain the data than it does for Solr to index it.
>
> Thanks,
> Shawn
>
This e-mail message is intended for the above named recipient(s) only. It may 
contain confidential information that is privileged. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail by error, please immediately 
notify the sender by replying to this e-mail and deleting the message including 
any attachment(s) from your system. Thank you in advance for your cooperation 
and assistance. Although the company has taken reasonable precautions to ensure 
no viruses are present in this email, the company cannot accept responsibility 
for any loss or damage arising from the use of this email or attachments.


Re: Data Import

I feel DIH is much better for prototyping, even though people do use
it in production. If you do want to use DIH, you may benefit from
reviewing the DIH-DB example I am currently rewriting in
https://issues.apache.org/jira/browse/SOLR-10312 (may need to change
luceneMatchVersion in solrconfig.xml first).

CSV, etc, could be useful if you want to keep history of past imports,
again useful during development, as you evolve schema.

SolrJ may actually be easiest/best for production since you already
have Java stack.

The choice is yours in the end.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 17 March 2017 at 08:56, Shawn Heisey  wrote:
> On 3/17/2017 3:04 AM, vishal jain wrote:
>> I am new to Solr and am trying to move data from my RDBMS to Solr. I know 
>> the available options are:
>> 1) Post Tool
>> 2) DIH
>> 3) SolrJ (as ours is a J2EE application).
>>
>> I want to know what is the recommended way for Data import in production
>> environment. Will sending data via SolrJ in batches be faster than posting a 
>> csv using POST tool?
>
> I've heard that CSV import runs EXTREMELY fast, but I have never tested
> it.  The same threading problem that I discuss below would apply to
> indexing this way.
>
> DIH is extremely powerful, but it has one glaring problem:  It's
> single-threaded, which means that only one stream of data is going into
> Solr, and each batch of documents to be inserted must wait for the
> previous one to finish inserting before it can start.  I do not know if
> DIH batches documents or sends them in one at a time.  If you have a
> manually sharded index, you can run DIH on each shard in parallel, but
> each one will be single-threaded.  That single thread is pretty
> efficient, but it's still only one thread.
>
> Sending multiple index updates to Solr in parallel (multi-threading) is
> how you radically speed up the Solr part of indexing.  This is usually
> done with a custom indexing program, which might be written with SolrJ
> or even in a completely different language.
>
> One thing to keep in mind with ANY indexing method:  Once the situation
> is examined closely, most people find that it's not Solr that makes
> their indexing slow.  The bottleneck is usually the source system -- how
> quickly the data can be retrieved.  It usually takes a lot longer to
> obtain the data than it does for Solr to index it.
>
> Thanks,
> Shawn
>


Re: Data Import

On 3/17/2017 3:04 AM, vishal jain wrote:
> I am new to Solr and am trying to move data from my RDBMS to Solr. I know the 
> available options are:
> 1) Post Tool
> 2) DIH
> 3) SolrJ (as ours is a J2EE application).
>
> I want to know what is the recommended way for Data import in production
> environment. Will sending data via SolrJ in batches be faster than posting a 
> csv using POST tool?

I've heard that CSV import runs EXTREMELY fast, but I have never tested
it.  The same threading problem that I discuss below would apply to
indexing this way.

DIH is extremely powerful, but it has one glaring problem:  It's
single-threaded, which means that only one stream of data is going into
Solr, and each batch of documents to be inserted must wait for the
previous one to finish inserting before it can start.  I do not know if
DIH batches documents or sends them in one at a time.  If you have a
manually sharded index, you can run DIH on each shard in parallel, but
each one will be single-threaded.  That single thread is pretty
efficient, but it's still only one thread.

Sending multiple index updates to Solr in parallel (multi-threading) is
how you radically speed up the Solr part of indexing.  This is usually
done with a custom indexing program, which might be written with SolrJ
or even in a completely different language.

One thing to keep in mind with ANY indexing method:  Once the situation
is examined closely, most people find that it's not Solr that makes
their indexing slow.  The bottleneck is usually the source system -- how
quickly the data can be retrieved.  It usually takes a lot longer to
obtain the data than it does for Solr to index it.

Thanks,
Shawn



Re: Data Import

Hi Vishal,

As per my experience DIH is the best for RDBMS to solr index. DIH with
caching has best performance. DIH nested entities allow you to define
simple queries.
Also, solrj is good when you want your RDBMS updates make immediately
available in solr. DIH full import can be used for index all data first
time or restore index in case index is corrupted.

Thanks,
Sujay

On Fri, Mar 17, 2017 at 2:34 PM, vishal jain  wrote:

> Hi,
>
>
> I am new to Solr and am trying to move data from my RDBMS to Solr. I know
> the available options are:
> 1) Post Tool
> 2) DIH
> 3) SolrJ (as ours is a J2EE application).
>
> I want to know what is the recommended way for Data import in production
> environment.
> Will sending data via SolrJ in batches be faster than posting a csv using
> POST tool?
>
>
> Thanks,
> Vishal
>



-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669


Data Import

Hi,


I am new to Solr and am trying to move data from my RDBMS to Solr. I know
the available options are:
1) Post Tool
2) DIH
3) SolrJ (as ours is a J2EE application).

I want to know what is the recommended way for Data import in production
environment.
Will sending data via SolrJ in batches be faster than posting a csv using
POST tool?


Thanks,
Vishal


Solr data Import

Hi,


I am new to Solr and am trying to move data from my RDBMS to Solr. I know
the available options are:
1) Post Tool
2) DIH
3) SolrJ (as ours is a J2EE application).

I want to know what is the recommended way for Data import in production
environment.
Will sending data via SolrJ in batches be faster than posting a csv using
POST tool?


Thanks,
Vishal


Re: Data Import Handler on 6.4.1

Also, upgrade to 6.4.2. There are serious performance problems in 6.4.0 and 
6.4.1.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Mar 15, 2017, at 12:05 PM, Liu, Daphne  
> wrote:
> 
> For Solr 6.3,  I have to move mine to 
> ../solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib. If you are using jetty.
> 
> Kind regards,
> 
> Daphne Liu
> BI Architect - Matrix SCM
> 
> CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 
> USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / 
> daphne@cevalogistics.com
> 
> 
> -Original Message-
> From: Michael Tobias [mailto:mtob...@btinternet.com]
> Sent: Wednesday, March 15, 2017 2:36 PM
> To: solr-user@lucene.apache.org
> Subject: Data Import Handler on 6.4.1
> 
> I am sure I am missing something simple but
> 
> I am running Solr 4.8.1 and trialling 6.4.1 on another computer.
> 
> I have had to manually modify the automatic 6.4.1 scheme config as we use a 
> set of specialised field types.  They work fine.
> 
> I am now trying to populate my core with data and having problems.
> 
> Exactly what names/paths should I be using in the solrconfig.xml file to get 
> this working - I don’t recall doing ANYTHING for 4.8.1
> 
>   regex=".*\.jar" />  
>   regex="solr-dataimporthandler-.*\.jar" /> ?
> 
> And where do I put the mysql-connector-java-5.1.29-bin.jar file and how do I 
> reference it to get it loaded?
> 
>
> ??
> 
> And then later in the solrconfig.xml I have:
> 
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
>  
>db-data-config.xml
>  
> 
> 
> 
> Any help much appreciated.
> 
> Regards
> 
> Michael
> 
> 
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: 15 March 2017 17:47
> To: solr-user@lucene.apache.org
> Subject: Re: Get handler not working
> 
> from your previous email:
> "There is no "id"
> field defined in the schema."
> 
> you need an id field to use the get handler
> 
> On Wed, Mar 15, 2017 at 1:45 PM, Chris Ulicny  wrote:
> 
>> I thought that "id" and "ids" were fixed parameters for the get
>> handler, but I never remember, so I've already tried both. Each time
>> it comes back with the same response of no document.
>> 
>> On Wed, Mar 15, 2017 at 1:31 PM Alexandre Rafalovitch
>> 
>> wrote:
>> 
>>> Actually.
>>> 
>>> I think Real Time Get handler has "id" as a magical parameter, not
>>> as a field name. It maps to the real id field via the uniqueKey
>>> definition:
>>> https://cwiki.apache.org/confluence/display/solr/RealTime+Get
>>> 
>>> So, if you have not, could you try the way you originally wrote it.
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> http://www.solr-start.com/ - Resources for Solr users, new and
>> experienced
>>> 
>>> 
>>> On 15 March 2017 at 13:22, Chris Ulicny  wrote:
>>>> Sorry, that is a typo. The get is using the iqdocid field. There
>>>> is no
>>> "id"
>>>> field defined in the schema.
>>>> 
>>>> solr/TestCollection/get?iqdocid=2957-TV-201604141900
>>>> 
>>>> solr/TestCollection/select?q=*:*&fq=iqdocid:2957-TV-201604141900
>>>> 
>>>> On Wed, Mar 15, 2017 at 1:15 PM Erick Erickson <
>> erickerick...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Is this a typo or are you trying to use get with an "id" field
>>>>> and your filter query uses "iqdocid"?
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>> On Wed, Mar 15, 2017 at 8:31 AM, Chris Ulicny 
>> wrote:
>>>>>> Yes, we're using a fixed schema with the iqdocid field set as
>>>>>> the
>>>>> uniqueKey.
>>>>>> 
>>>>>> On Wed, Mar 15, 2017 at 11:28 AM Alexandre Rafalovitch <
>>>>> arafa...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> What is your uniqueKey? Is it iqdocid?
>>>>>>> 
>>>>>>> Regards,
>>>>>>>   Alex.
>>>>>>> 
>>>>>>> http://www.solr-start.com/ - Resources for Solr users, new and
>>>>> experienced
>>>>&g

  1   2   3   4   5   6   7   >