subject:"Data Import"

Re: Data Import Handler (DIH) - Installing and running

2020-12-23 Thread Erick Erickson

Have you done what the message says and looked at your Solr log? If so,
what information is there?

> On Dec 23, 2020, at 5:13 AM, DINSD | SPAutores 
>  wrote:
> 
> Hi,
> 
> I'm trying to install the package "data-import-handler", since it was 
> discontinued from core SolR distro.
> 
> https://github.com/rohitbemax/dataimporthandler
> 
> However, as soon as the first command is carried out
> 
> solr -c -Denable.packages=true
> 
> I get this screen in web interface
> 
> 
> 
> Has anyone been through this, or have any idea why it's happening ?
> 
> Thanks for any help
> Rui Pimentel
> 
> 
> 
> DINSD - Departamento de Informática / SPA Digital
> Av. Duque de Loulé, 31 - 1069-153 Lisboa  PORTUGAL
> T (+ 351) 21 359 44 36 / (+ 351) 21 359 44 00  F (+ 351) 21 353 02 57
>  informat...@spautores.pt
>  www.SPAutores.pt
> 
> Please consider the environment before printing this email 
> 
> Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém 
> informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se 
> unicamente à pessoa e endereço electrónico acima indicados. Se não for o 
> destinatário desta mensagem, agradecemos que a elimine e nos comunique de 
> imediato através do telefone  +351 21 359 44 00 ou por email para: 
> ge...@spautores.pt 
> 
> This electronic mail transmission including any attachment hereof, contains 
> information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM DISCLOSURE, and 
> it is only for the use of the person and the e-mail address above indicated. 
> If you have received this electronic mail transmission in error, please 
> destroy it and notify us immediately through the telephone number  +351 21 
> 359 44 00 or at the e-mail address:  ge...@spautores.pt
>

Data Import Handler (DIH) - Installing and running

2020-12-23 Thread DINSD | SPAutores


Hi,

I'm trying to install the package "data-import-handler", since it was 
discontinued from core SolR distro.


https://github.com/rohitbemax/dataimporthandler

However, as soon as the first command is carried out

solr -c -Denable.packages=true

I get this screen in web interface

Has anyone been through this, or have any idea why it's happening ?

Assinatura SPA Thanks for any help
**
*Rui Pimentel*


**
*DINSD - Departamento de Informática / SPA Digital*
Av. Duque de Loulé, 31 - 1069-153 Lisboa PORTUGAL
*T * (+ 351) 21 359 44 36 */* (+ 351) 21 359 44 00 *F* (+ 351) 21 353 02 57
<mailto:%7bmailsector...@spautores.pt> informat...@spautores.pt
<http://www.spautores.pt/>www.SPAutores.pt
<https://www.facebook.com/spautores> 
<https://www.youtube.com/user/SPAutores1925><https://plus.google.com/107542947146636584118><https://www.linkedin.com/company/spautores> 


Please consider the environment before printing this email

Esta mensagem electrónica, incluindo qualquer dos seus anexos, contém 
informação PRIVADA, CONFIDENCIAL e de DIVULGAÇÃO PROIBIDA,e destina-se 
unicamente à pessoa e endereço electrónico acima indicados. Se não for o 
destinatário desta mensagem, agradecemos que a elimine e nos comunique 
de imediato através do telefone +351 21 359 44 00 ou por email para: 
ge...@spautores.pt <mailto:ge...@spautores.pt>


This electronic mail transmission including any attachment hereof, 
contains information that is PRIVATE, CONFIDENTIAL and PROTECTED FROM 
DISCLOSURE, and it is only for the use of the person and the e-mail 
address above indicated. If you have received this electronic mail 
transmission in error, please destroy it and notify us immediately 
through the telephone number +351 21 359 44 00 or at the e-mail address: 
ge...@spautores.pt

Assinatura SPA

Re: Data Import Blocker - Solr

2020-12-19 Thread Shawn Heisey


On 12/18/2020 12:03 AM, basel altameme wrote:

While trying to Import & Index data from MySQL DB custom view i am facing the 
error below:
Data Config problem: The value of attribute "query" associated with an element type 
"entity" must not contain the '<' character.
Please note that in my SQL statements i am using '<>' as an operator for 
comparing only.
sample line:
         when (`v`.`live_type_id` <> 1) then 100


These configurations are written in XML.  So you must encode the 
character using XML-friendly notation.


Instead of <> it should say <> to be correct.  Or you could use != 
which is also correct SQL notation for "not equal to".


Thanks,
Shawn

Re: Data Import Blocker - Solr

2020-12-18 Thread Erick Erickson

Have you tried escaping that character?

> On Dec 18, 2020, at 2:03 AM, basel altameme  
> wrote:
> 
> Dear,
> While trying to Import & Index data from MySQL DB custom view i am facing the 
> error below:
> Data Config problem: The value of attribute "query" associated with an 
> element type "entity" must not contain the '<' character.
> Please note that in my SQL statements i am using '<>' as an operator for 
> comparing only.
> sample line:
> when (`v`.`live_type_id` <> 1) then 100
> 
> Kindly advice.
> Regards,Basel
>

Data Import Blocker - Solr

2020-12-18 Thread basel altameme

Dear,
While trying to Import & Index data from MySQL DB custom view i am facing the 
error below:
Data Config problem: The value of attribute "query" associated with an element 
type "entity" must not contain the '<' character.
Please note that in my SQL statements i am using '<>' as an operator for 
comparing only.
sample line:
        when (`v`.`live_type_id` <> 1) then 100

Kindly advice.
Regards,Basel

Re: data import handler deprecated?

2020-11-30 Thread Dmitri Maziuk


On 11/30/2020 7:50 AM, David Smiley wrote:

Yes, absolutely to what Eric said.  We goofed on news / release highlights
on how to communicate what's happening in Solr.  From a Solr insider point
of view, we are "deprecating" because strictly speaking, the code isn't in
our codebase any longer.  From a user point of view (the audience of news /
release notes), the functionality has *moved*.


Just FYI, there is the dih 8.7.0 jar in 
repo1.maven.org/maven2/org/apache/solr -- whereas the github build is on 
8.6.0.


Dima

Re: data import handler deprecated?

2020-11-30 Thread David Smiley

Yes, absolutely to what Eric said.  We goofed on news / release highlights
on how to communicate what's happening in Solr.  From a Solr insider point
of view, we are "deprecating" because strictly speaking, the code isn't in
our codebase any longer.  From a user point of view (the audience of news /
release notes), the functionality has *moved*.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Nov 30, 2020 at 8:04 AM Eric Pugh 
wrote:

> You don’t need to abandon DIH right now….   You can just use the Github
> hosted version….   The more people who use it, the better a community it
> will form around it!It’s a bit chicken and egg, since no one is
> actively discussing it, submitting PR’s etc, it may languish.   If you use
> it, and test it, and support other community folks using it, then it will
> continue on!
>
>
>
> > On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk 
> wrote:
> >
> > On 11/29/2020 10:32 AM, Erick Erickson wrote:
> >
> >> And I absolutely agree with Walter that the DB is often where
> >> the bottleneck lies. You might be able to
> >> use multiple threads and/or processes to query the
> >> DB if that’s the case and you can find some kind of partition
> >> key.
> >
> > IME the difficult part has always been dealing with incremental updates,
> if we were to roll our own, my vote would be for a database trigger that
> does a POST in whichever language the DBMS likes.
> >
> > But this has not been a part of our "solr 6.5 update" project until now.
> >
> > Thanks everyone,
> > Dima
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Re: data import handler deprecated?

2020-11-30 Thread Eric Pugh

You don’t need to abandon DIH right now….   You can just use the Github hosted 
version….   The more people who use it, the better a community it will form 
around it!It’s a bit chicken and egg, since no one is actively discussing 
it, submitting PR’s etc, it may languish.   If you use it, and test it, and 
support other community folks using it, then it will continue on!

> On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk  wrote:
> 
> On 11/29/2020 10:32 AM, Erick Erickson wrote:
> 
>> And I absolutely agree with Walter that the DB is often where
>> the bottleneck lies. You might be able to
>> use multiple threads and/or processes to query the
>> DB if that’s the case and you can find some kind of partition
>> key.
> 
> IME the difficult part has always been dealing with incremental updates, if 
> we were to roll our own, my vote would be for a database trigger that does a 
> POST in whichever language the DBMS likes.
> 
> But this has not been a part of our "solr 6.5 update" project until now.
> 
> Thanks everyone,
> Dima

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: data import handler deprecated?

2020-11-29 Thread Dmitri Maziuk


On 11/29/2020 10:32 AM, Erick Erickson wrote:


And I absolutely agree with Walter that the DB is often where
the bottleneck lies. You might be able to
use multiple threads and/or processes to query the
DB if that’s the case and you can find some kind of partition
key.


IME the difficult part has always been dealing with incremental updates, 
if we were to roll our own, my vote would be for a database trigger that 
does a POST in whichever language the DBMS likes.


But this has not been a part of our "solr 6.5 update" project until now.

Thanks everyone,
Dima

Re: data import handler deprecated?

2020-11-29 Thread Erick Erickson

If you like Java instead of Python, here’s a skeletal program:

https://lucidworks.com/post/indexing-with-solrj/

It’s simple and single-threaded, but could serve as a basis for
something along the lines that Walter suggests.

And I absolutely agree with Walter that the DB is often where
the bottleneck lies. You might be able to
use multiple threads and/or processes to query the
DB if that’s the case and you can find some kind of partition
key.

You also might (and it depends on the Solr version) be able,
to wrap a jdbc stream in an update decorator.

https://lucene.apache.org/solr/guide/8_0/stream-source-reference.html

https://lucene.apache.org/solr/guide/8_0/stream-decorator-reference.html

Best,
Erick

> On Nov 29, 2020, at 3:04 AM, Walter Underwood  wrote:
> 
> I recommend building an outboard loader, like I did a dozen years ago for
> Solr 1.3 (before DIH) and did again recently. I’m glad to send you my Python
> program, though it reads from a JSONL file, not a database.
> 
> Run a loop fetching records from a database. Put each record into a 
> synchronized
> (thread-safe) queue. Run multiple worker threads, each pulling records from 
> the
> queue, batching them up, and sending them to Solr. For maximum indexing speed
> (at the expense of query performance), count the number of CPUs per shard 
> leader
> and run two worker threads per CPU.
> 
> Adjust the batch size to be maybe 10k to 50k bytes. That might be 20 to 1000 
> documents, depending on the content.
> 
> With this setup, your database will probably be your bottleneck. I’ve had this
> index a million (small) documents per minute to a multi-shard cluster, from a 
> JSONL
> file on local disk.
> 
> Also, don’t worry about finding the leaders and sending the right document to
> the right shard. I just throw the batches at the load balancer and let Solr 
> figure
> it out. That is super simple and amazingly fast.
> 
> If you are doing big batches, building a dumb ETL system with JSONL files in 
> Amazon S3 has some real advantages. It allows loading prod data into a test
> cluster for load benchmarks, for example. Also good for disaster recovery, 
> just
> load the recent batches from S3. Want to know exactly which documents were
> in the index in October? Look at the batches in S3.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Nov 28, 2020, at 6:23 PM, matthew sporleder  wrote:
>> 
>> I went through the same stages of grief that you are about to start
>> but (luckily?) my core dataset grew some weird cousins and we ended up
>> writing our own indexer to join them all together/do partial
>> updates/other stuff beyond DIH.  It's not difficult to upload docs but
>> is definitely slower so far.  I think there is a bit of a 'clean core'
>> focus going on in solr-land right now and DIH is easy(!) but it's also
>> easy to hit its limits (atomic/partial updates?  wtf is an "entity?"
>> etc) so anyway try to be happy that you are aware of it now.
>> 
>> On Sat, Nov 28, 2020 at 7:41 PM Dmitri Maziuk  
>> wrote:
>>> 
>>> On 11/28/2020 5:48 PM, matthew sporleder wrote:
>>> 
 ...  The bottom of
 that github page isn't hopeful however :)
>>> 
>>> Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC
>>> JAR" :)
>>> 
>>> It's a more general queston though, what is the path forward for users
>>> who with data in two places? Hope that a community-maintained plugin
>>> will still be there tomorrow? Dump our tables to CSV (and POST them) and
>>> roll our own delta-updates logic? Or are we to choose one datastore and
>>> drop the other?
>>> 
>>> Dima
>

Re: data import handler deprecated?

2020-11-29 Thread Walter Underwood

I recommend building an outboard loader, like I did a dozen years ago for
Solr 1.3 (before DIH) and did again recently. I’m glad to send you my Python
program, though it reads from a JSONL file, not a database.

Run a loop fetching records from a database. Put each record into a synchronized
(thread-safe) queue. Run multiple worker threads, each pulling records from the
queue, batching them up, and sending them to Solr. For maximum indexing speed
(at the expense of query performance), count the number of CPUs per shard leader
and run two worker threads per CPU.

Adjust the batch size to be maybe 10k to 50k bytes. That might be 20 to 1000 
documents, depending on the content.

With this setup, your database will probably be your bottleneck. I’ve had this
index a million (small) documents per minute to a multi-shard cluster, from a 
JSONL
file on local disk.

Also, don’t worry about finding the leaders and sending the right document to
the right shard. I just throw the batches at the load balancer and let Solr 
figure
it out. That is super simple and amazingly fast.

If you are doing big batches, building a dumb ETL system with JSONL files in 
Amazon S3 has some real advantages. It allows loading prod data into a test
cluster for load benchmarks, for example. Also good for disaster recovery, just
load the recent batches from S3. Want to know exactly which documents were
in the index in October? Look at the batches in S3.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 28, 2020, at 6:23 PM, matthew sporleder  wrote:
> 
> I went through the same stages of grief that you are about to start
> but (luckily?) my core dataset grew some weird cousins and we ended up
> writing our own indexer to join them all together/do partial
> updates/other stuff beyond DIH.  It's not difficult to upload docs but
> is definitely slower so far.  I think there is a bit of a 'clean core'
> focus going on in solr-land right now and DIH is easy(!) but it's also
> easy to hit its limits (atomic/partial updates?  wtf is an "entity?"
> etc) so anyway try to be happy that you are aware of it now.
> 
> On Sat, Nov 28, 2020 at 7:41 PM Dmitri Maziuk  wrote:
>> 
>> On 11/28/2020 5:48 PM, matthew sporleder wrote:
>> 
>>> ...  The bottom of
>>> that github page isn't hopeful however :)
>> 
>> Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC
>> JAR" :)
>> 
>> It's a more general queston though, what is the path forward for users
>> who with data in two places? Hope that a community-maintained plugin
>> will still be there tomorrow? Dump our tables to CSV (and POST them) and
>> roll our own delta-updates logic? Or are we to choose one datastore and
>> drop the other?
>> 
>> Dima

Re: data import handler deprecated?

2020-11-28 Thread matthew sporleder

I went through the same stages of grief that you are about to start
but (luckily?) my core dataset grew some weird cousins and we ended up
writing our own indexer to join them all together/do partial
updates/other stuff beyond DIH.  It's not difficult to upload docs but
is definitely slower so far.  I think there is a bit of a 'clean core'
focus going on in solr-land right now and DIH is easy(!) but it's also
easy to hit its limits (atomic/partial updates?  wtf is an "entity?"
etc) so anyway try to be happy that you are aware of it now.

On Sat, Nov 28, 2020 at 7:41 PM Dmitri Maziuk  wrote:
>
> On 11/28/2020 5:48 PM, matthew sporleder wrote:
>
> > ...  The bottom of
> > that github page isn't hopeful however :)
>
> Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC
> JAR" :)
>
> It's a more general queston though, what is the path forward for users
> who with data in two places? Hope that a community-maintained plugin
> will still be there tomorrow? Dump our tables to CSV (and POST them) and
> roll our own delta-updates logic? Or are we to choose one datastore and
> drop the other?
>
> Dima

Re: data import handler deprecated?

2020-11-28 Thread Dmitri Maziuk


On 11/28/2020 5:48 PM, matthew sporleder wrote:


...  The bottom of
that github page isn't hopeful however :)


Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC 
JAR" :)


It's a more general queston though, what is the path forward for users 
who with data in two places? Hope that a community-maintained plugin 
will still be there tomorrow? Dump our tables to CSV (and POST them) and 
roll our own delta-updates logic? Or are we to choose one datastore and 
drop the other?


Dima

Re: data import handler deprecated?

2020-11-28 Thread matthew sporleder

https://solr.cool/#utilities -> https://github.com/rohitbemax/dataimporthandler

You can import it in the many new/novel ways to add things to a solr
install and it should work like always (apparently).  The bottom of
that github page isn't hopeful however :)

On Sat, Nov 28, 2020 at 5:21 PM Dmitri Maziuk  wrote:
>
> Hi all,
>
> trying to set up solr-8.7.0, contrib/dataimporthandler/README.txt says
> this module is deprecated as of 8.6 and scheduled for removal in 9.0.
>
> How do we pull data out of our relational database in 8.7+?
>
> TIA
> Dima

data import handler deprecated?

2020-11-28 Thread Dmitri Maziuk


Hi all,

trying to set up solr-8.7.0, contrib/dataimporthandler/README.txt says 
this module is deprecated as of 8.6 and scheduled for removal in 9.0.


How do we pull data out of our relational database in 8.7+?

TIA
Dima

Re: Data Import Handler - Concurrent Entity Importing

2020-05-13 Thread ART GALLERY

check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Tue, May 5, 2020 at 1:58 PM Mikhail Khludnev  wrote:
>
> Hello, James.
>
> DataImportHandler has a lock preventing concurrent execution. If you need
> to run several imports in parallel at the same core, you need to duplicate
> "/dataimport" handlers definition in solrconfig.xml. Thus, you can run them
> in parallel. Regarding schema, I prefer the latter but mileage may vary.
>
> --
> Mikhail.
>
> On Tue, May 5, 2020 at 6:39 PM James Greene 
> wrote:
>
> > Hello, I'm new to the group here so please excuse me if I do not have the
> > etiquette down yet.
> >
> > Is it possible to have multiple entities (customer configurable, up to 40
> > atm) in a DIH configuration to be imported at once?  Right now I have
> > multiple root entities in my configuration but they get indexes
> > sequentially and this means the entities that are last are always delayed
> > hitting the index.
> >
> > I'm trying to migrate an existing setup (solr 6.6) that utilizes a
> > different collection for each "entity type" into a single collection (solr
> > 8.4) to get around some of the hurdles faced when needing to have searches
> > that require multiple block joins and currently does not work going cross
> > core.
> >
> > I'm also wondering if it is better to fully qualify a field name or use two
> > different fields for performing the "same" search.  i.e:
> >
> >
> > {
> > type_A_status; Active
> > type_A_value: Test
> > }
> > vs
> > {
> > type: A
> > status: Active
> > value: Test
> > }
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev

Re: Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread Mikhail Khludnev

Hello, James.

DataImportHandler has a lock preventing concurrent execution. If you need
to run several imports in parallel at the same core, you need to duplicate
"/dataimport" handlers definition in solrconfig.xml. Thus, you can run them
in parallel. Regarding schema, I prefer the latter but mileage may vary.

--
Mikhail.

On Tue, May 5, 2020 at 6:39 PM James Greene 
wrote:

> Hello, I'm new to the group here so please excuse me if I do not have the
> etiquette down yet.
>
> Is it possible to have multiple entities (customer configurable, up to 40
> atm) in a DIH configuration to be imported at once?  Right now I have
> multiple root entities in my configuration but they get indexes
> sequentially and this means the entities that are last are always delayed
> hitting the index.
>
> I'm trying to migrate an existing setup (solr 6.6) that utilizes a
> different collection for each "entity type" into a single collection (solr
> 8.4) to get around some of the hurdles faced when needing to have searches
> that require multiple block joins and currently does not work going cross
> core.
>
> I'm also wondering if it is better to fully qualify a field name or use two
> different fields for performing the "same" search.  i.e:
>
>
> {
> type_A_status; Active
> type_A_value: Test
> }
> vs
> {
> type: A
> status: Active
> value: Test
> }
>


-- 
Sincerely yours
Mikhail Khludnev

Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread James Greene

Hello, I'm new to the group here so please excuse me if I do not have the
etiquette down yet.

Is it possible to have multiple entities (customer configurable, up to 40
atm) in a DIH configuration to be imported at once?  Right now I have
multiple root entities in my configuration but they get indexes
sequentially and this means the entities that are last are always delayed
hitting the index.

I'm trying to migrate an existing setup (solr 6.6) that utilizes a
different collection for each "entity type" into a single collection (solr
8.4) to get around some of the hurdles faced when needing to have searches
that require multiple block joins and currently does not work going cross
core.

I'm also wondering if it is better to fully qualify a field name or use two
different fields for performing the "same" search.  i.e:


{
type_A_status; Active
type_A_value: Test
}
vs
{
type: A
status: Active
value: Test
}

SOLR Data Import Handler : A command is still running...

2020-02-03 Thread Doss

We are doing hourly data import to our index, per day one or two requests
are getting failed with the message "A command is still running...".

1. Does it mean, the data import not happened for the last hour?
2. If you look at the "Full Dump Started" time has an older data, in the
below log all most 13 days, why is that so?

userinfoindex start - Wed Jan 22 05:12:01 IST 2020 {
"responseHeader":{ "status":0, "QTime":0},   "initArgs":[
"defaults",[   "config","data-import.xml"]],
"command":"full-import",   "status":"busy",   "importResponse":"A command
is still running...",   "statusMessages":{ "Time
Elapsed":"298:1:59.986", "Total Requests made to DataSource":"1",
"Total Rows Fetched":"17426", "Total Documents Processed":"17425",
"Total Documents Skipped":"0", "Full Dump Started":"2020-01-09
19:10:02"}}

Thanks,
Doss.

Re: SQL data import handler

2019-09-09 Thread Friscia, Michael

Thank you for your responses Vadim and Jörn. You both prompted me to try again 
and this time I succeeded. The trick seemed to be the way that I installed Java 
using Open JDK versus from Oracle. In addition, I imagine I accidentally had a 
lot of old versions of JAR files lying around so it was easier to start with a 
fresh VM. Now I was able to install using JDK12 and the latest Microsoft 7.4.x 
driver. Now it works out of the box as I wanted. 

Thanks again for being a sounding board for this, I primarily support 
Microsoft/dot net stuff so the Linux stuff sometimes gets away from me.

___
Michael Friscia
Office of Communications
Yale School of Medicine
(203) 737-7932 - office
(203) 931-5381 - mobile
http://web.yale.edu <http://web.yale.edu/>
 

On 9/9/19, 6:53 AM, "Vadim Ivanov"  wrote:

Hi,
Latest jdbc driver 7.4.1 seems to support JRE 8, 11, 12

https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fdownload%2Fdetails.aspx%3Fid%3D58505&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=3bLoGx8DzsAifCW9tv64V1sCeS7mTzFU3fAazODNGYE%3D&reserved=0
You have to delete all previous versions of Sql Server jdbc driver from 
Solr installation (/solr/server/lib/ in my case)

-- 
Vadim

> -Original Message-
> From: Friscia, Michael [mailto:michael.fris...@yale.edu]
> Sent: Monday, September 09, 2019 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: SQL data import handler
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with 
default-jre
> which installed version 11. So after a day of trying to make my Microsoft 
SQL
> Server data import handler work and failing, I built a new VM and 
installed
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. 
I’m not
> a Java programmer so I’m only going by what I uncovered digging through 
the
> error logs. I am not positive this is the only error to deal with, for 
all I know
> fixing that will just uncover something else that needs repair. There were
> solutions where you compile SOLR using Maven but this is moving out of my
> comfort zone as well as long term strategy to keep SOLR management (as 
well
> as other Linux systems management) out-of-the-box. There were also
> solutions to include some sort of dependency on this older library but 
I’m at a
> loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>   1.  Is it ok to run JRE 8 on a production server? It’s heavily 
firewalled and
> SOLR, Zookeeper nor anything else on these servers is available off the 
virtual
> network so it seems ok, but I try not to run very old versions of any 
software.
>   2.  Is there a way to fix this and keep the installation out-of-the-box 
or at
> least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> 
https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fweb.yale.edu&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=G5xMXdGQs12oK%2FDCxKy0zIn8sQ0uCpDLRGGatw45oiY%3D&reserved=0<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fweb.yale.edu%2F&data=02%7C01%7Cmichael.friscia%40yale.edu%7C93626e2acbd4457d7f1608d73513f44d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637036232130960752&sdata=8pGHZPLlREYqKioyMrZDu2f9p3RXJBcFpkmvdKoHaZc%3D&reserved=0>

RE: SQL data import handler

2019-09-09 Thread Vadim Ivanov

Hi,
Latest jdbc driver 7.4.1 seems to support JRE 8, 11, 12
https://www.microsoft.com/en-us/download/details.aspx?id=58505
You have to delete all previous versions of Sql Server jdbc driver from Solr 
installation (/solr/server/lib/ in my case)

-- 
Vadim

> -Original Message-
> From: Friscia, Michael [mailto:michael.fris...@yale.edu]
> Sent: Monday, September 09, 2019 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: SQL data import handler
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre
> which installed version 11. So after a day of trying to make my Microsoft SQL
> Server data import handler work and failing, I built a new VM and installed
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. I’m not
> a Java programmer so I’m only going by what I uncovered digging through the
> error logs. I am not positive this is the only error to deal with, for all I 
> know
> fixing that will just uncover something else that needs repair. There were
> solutions where you compile SOLR using Maven but this is moving out of my
> comfort zone as well as long term strategy to keep SOLR management (as well
> as other Linux systems management) out-of-the-box. There were also
> solutions to include some sort of dependency on this older library but I’m at 
> a
> loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>   1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled 
> and
> SOLR, Zookeeper nor anything else on these servers is available off the 
> virtual
> network so it seems ok, but I try not to run very old versions of any 
> software.
>   2.  Is there a way to fix this and keep the installation out-of-the-box or 
> at
> least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> http://web.yale.edu<http://web.yale.edu/>

Re: SQL data import handler

2019-09-09 Thread Jörn Franke

Hi Michael,

Thank you for sharing. You are right about your approach to not customize the 
distribution.

Solr supports JDK8 and it latest versions (8.x) also JDK11. I would not 
recommend to use it with JDK9 or JDK10 as they are out of support in many Java 
distributions. It might be also that your database driver does not support JDK9 
(check with Microsoft).
I don’t see it that critical at the moment to have JDK8 on this production 
server, but since it is out of support you should look for alternatives.

So if you are with Solr 8.x please go with JDK11 to have the latest fixes etc.

Best regards 

> Am 09.09.2019 um 12:21 schrieb Friscia, Michael :
> 
> I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre 
> which installed version 11. So after a day of trying to make my Microsoft SQL 
> Server data import handler work and failing, I built a new VM and installed 
> JRE 8 and then everything works perfectly.
> 
> The root of the problem was the elimination of java.bind.xml in JRE 9. I’m 
> not a Java programmer so I’m only going by what I uncovered digging through 
> the error logs. I am not positive this is the only error to deal with, for 
> all I know fixing that will just uncover something else that needs repair. 
> There were solutions where you compile SOLR using Maven but this is moving 
> out of my comfort zone as well as long term strategy to keep SOLR management 
> (as well as other Linux systems management) out-of-the-box. There were also 
> solutions to include some sort of dependency on this older library but I’m at 
> a loss on how to relate that to a SOLR install.
> 
> My questions, since I am not that familiar with Java dependencies:
> 
>  1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled 
> and SOLR, Zookeeper nor anything else on these servers is available off the 
> virtual network so it seems ok, but I try not to run very old versions of any 
> software.
>  2.  Is there a way to fix this and keep the installation out-of-the-box or 
> at least almost out of the box?
> 
> ___
> Michael Friscia
> Office of Communications
> Yale School of Medicine
> (203) 737-7932 - office
> (203) 931-5381 - mobile
> http://web.yale.edu<http://web.yale.edu/>
>

SQL data import handler

2019-09-09 Thread Friscia, Michael

I setup SOLR on Ubuntu 18.04 and installed Java from apt-get with default-jre 
which installed version 11. So after a day of trying to make my Microsoft SQL 
Server data import handler work and failing, I built a new VM and installed JRE 
8 and then everything works perfectly.

The root of the problem was the elimination of java.bind.xml in JRE 9. I’m not 
a Java programmer so I’m only going by what I uncovered digging through the 
error logs. I am not positive this is the only error to deal with, for all I 
know fixing that will just uncover something else that needs repair. There were 
solutions where you compile SOLR using Maven but this is moving out of my 
comfort zone as well as long term strategy to keep SOLR management (as well as 
other Linux systems management) out-of-the-box. There were also solutions to 
include some sort of dependency on this older library but I’m at a loss on how 
to relate that to a SOLR install.

My questions, since I am not that familiar with Java dependencies:

  1.  Is it ok to run JRE 8 on a production server? It’s heavily firewalled and 
SOLR, Zookeeper nor anything else on these servers is available off the virtual 
network so it seems ok, but I try not to run very old versions of any software.
  2.  Is there a way to fix this and keep the installation out-of-the-box or at 
least almost out of the box?

___
Michael Friscia
Office of Communications
Yale School of Medicine
(203) 737-7932 - office
(203) 931-5381 - mobile
http://web.yale.edu<http://web.yale.edu/>

Solr Cloud - Data Import from Cassandra

2019-04-08 Thread Furkan Çifçi

Hello everyone,

We are using Solr(7.1) on cloud mode and trying to get data from Cassandra 
source. Can't import data from Cassandra.

In the error logs;

Full Import 
failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
PropertyWriter implementation:SimplePropertiesWriter
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:330)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
trying to do is not supported in ZooKeeper mode
at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:151)
at 
org.apache.solr.handler.dataimport.SimplePropertiesWriter.findDirectory(SimplePropertiesWriter.java:131)
at 
org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePropertiesWriter.java:93)
at 
org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:328)

Error logs says i cant do it in zookeeper mode.

Is there a  workaround for this situtation?

Bu e-posta ve ekindekiler gizli bilgiler içeriyor olabilir ve sadece adreslenen 
kişileri ilgilendirir. Eğer adreslenen kişi siz değilseniz, bu e-postayı 
yaymayınız, dağıtmayınız veya kopyalamayınız. Eğer bu e-posta yanlışlıkla size 
gönderildiyse, lütfen bu e-posta ve ekindeki dosyaları sisteminizden siliniz ve 
göndereni hemen bilgilendiriniz. Ayrıca, bu e-posta ve ekindeki dosyaları virüs 
bulaşması ihtimaline karşı taratınız. İŞLEM GIS® bu e-posta ile taşınabilecek 
herhangi bir virüsün neden olabileceği hasarın sorumluluğunu kabul etmez. Bilgi 
için:b...@islem.com.tr This message may contain confidential information and is 
intended only for recipient name. If you are not the named addressee you should 
not disseminate, distribute or copy this e-mail. Please notify the sender 
immediately if you have received this e-mail by mistake and delete this e-mail 
from your system. Finally, the recipient should check this email and any 
attachments for the presence of viruses. İŞLEM GIS® accepts no liability for 
any damage may be caused by any virus transmitted by this email." For 
information: b...@islem.com.tr

Re: Sql server data import

2018-11-09 Thread Erick Erickson

Ok, what that means is you're letting Solr do its best to figure out
what fields you should have in the schema and how they're defined.
Almost invariably, you can do better by explicitly defining the fields
you need in your schema rather than enabling add-unknown. It's
fine for getting started, but not advised for production.

Best,
Erick
On Fri, Nov 9, 2018 at 7:52 AM Verthosa  wrote:
>
> Hello, i managed to fix the problem. I'm using Solr 7.5.0. My problem was
> that in the server logs i got "This Indexschema is not mutable" (i did not
> know about the logs folder, so i just found out 5 minutes ago). I fixed it
> by modifying solrconfig.xml to
>
>  name="add-unknown-fields-to-the-schema"
> default="${update.autoCreateFields:false*}"
>
> processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
> 
> 
> 
> 
>
> Since then the indexing is done correctly. I even got the blob fields
> indexation working now ! Thanks for your reply, everything is fixed for now.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Sql server data import

2018-11-09 Thread Verthosa

Hello, i managed to fix the problem. I'm using Solr 7.5.0. My problem was that in the server logs i got "This Indexschema is not mutable" (i did not know about the logs folder, so i just found out 5 minutes ago). I fixed it by modifying solrconfig.xml to false*}" processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields"> Since then the indexing is done correctly. I even got the blob fields indexation working now ! Thanks for your reply, everything is fixed for now. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Sql server data import

2018-11-09 Thread Gu, Steve (CDC/DDPHSS/OS) (CTR)

What is "" in the PublicId? Is it part of the data? Did you check if the special characters in your data cause the problem? Steve ### Error creating document : SolrInputDocument(fields: [PublicId=10065, Id=117]) -Original Message- From: Verthosa Sent: Friday, November 9, 2018 7:51 AM To: solr-user@lucene.apache.org Subject: Sql server data import Hello, i managed to set up a connection to my sql server to import data into Solr. The idea is to import filetables but for now i first want to get it working using regular tables. So i created *data-config.xml* *schema.xml* i added and changed uniqueKey entry to Id When i want to import my data (which is just data like Id: 5, PublicId: "test"), i get the following error in the logging. Error creating document : SolrInputDocument(fields: [PublicId=10065, Id=117]) I tried all sorts of things but can't get it fixed. Is anyone want to give me a hand? thanks in advance! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Sql server data import

2018-11-09 Thread Alexandre Rafalovitch

Which version of Solr is it? Because we have not used schema.xml for a very long time. It has been managed-schema instead. Also, have you tried using DIH example that uses database and modifying it just enough to read data from your database. Even if it has a lot of extra junk, this would test half of the pipeline, which you can then transfer to the clean setup. Regards, Alex. On Fri, 9 Nov 2018 at 08:09, Verthosa wrote: > > Hello, i managed to set up a connection to my sql server to import data into > Solr. The idea is to import filetables but for now i first want to get it > working using regular tables. So i created > > *data-config.xml* > > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" > > url="jdbc:sqlserver://localhost;databaseName=inConnexion_Tenant2;integratedSecurity=true" > /> > > > > > > > > > *schema.xml* > i added > multiValued="false" /> > multiValued="false"/> > > and changed uniqueKey entry to > Id > > When i want to import my data (which is just data like Id: 5, PublicId: > "test"), i get the following error in the logging. > > Error creating document : SolrInputDocument(fields: [PublicId=10065, > Id=117]) > > > I tried all sorts of things but can't get it fixed. Is anyone want to give > me a hand? > > thanks in advance! > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Sql server data import

2018-11-09 Thread Verthosa

Hello, i managed to set up a connection to my sql server to import data into Solr. The idea is to import filetables but for now i first want to get it working using regular tables. So i created *data-config.xml* *schema.xml* i added and changed uniqueKey entry to Id When i want to import my data (which is just data like Id: 5, PublicId: "test"), i get the following error in the logging. Error creating document : SolrInputDocument(fields: [PublicId=10065, Id=117]) I tried all sorts of things but can't get it fixed. Is anyone want to give me a hand? thanks in advance! -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

SV: data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)

I made it work with the simplest of xml-files with some inspiration from https://opensolr.com/blog/2011/09/how-to-import-data-from-xml-files-into-your-solr-collection . Data-config is now: And the document is simply: 2165432 5 28548113 89 Now I guess I just have to add to this solution. Thanks for your help Alex, and also thanks to Jan answering the first mail. Best regards Martin Frank Hansen -Oprindelig meddelelse- Fra: Alexandre Rafalovitch Sendt: 2. oktober 2018 19:52 Til: solr-user Emne: Re: data-import-handler for solr-7.5.0 Ok, so then you can switch to debug mode and keep trying to figure it out. Also try BinFileDataSource or URLDataSource, maybe it will have an easier way. Or using relative path (example: https://github.com/arafalov/solr-apachecon2018-presentation/blob/master/configsets/pets-final/pets-data-config.xml). Regards, Alex. On Tue, 2 Oct 2018 at 12:46, Martin Frank Hansen (MHQ) wrote: > > Thanks for the info, the UI looks interesting... It does read the data-config > correctly, so the problem is probably in this file. > > Martin Frank Hansen, Senior Data Analytiker > > Data, IM & Analytics > > > > Lautrupparken 40-42, DK-2750 Ballerup > E-mail m...@kmd.dk Web www.kmd.dk > Mobil +4525571418 > > -Oprindelig meddelelse- > Fra: Alexandre Rafalovitch > Sendt: 2. oktober 2018 18:18 > Til: solr-user > Emne: Re: data-import-handler for solr-7.5.0 > > Admin UI for DIH will show you the config file read. So, if nothing is > there, the path is most likely the issue > > You can also provide or update the configuration right in UI if you enable > debug. > > Finally, the config file is reread on every invocation, so you don't need to > restart the core after changing it. > > Hope this helps, >Alex. > On Tue, 2 Oct 2018 at 11:45, Jan Høydahl wrote: > > > > > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > > > > Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? > > > > -- > > Jan Høydahl, search solution architect Cominvent AS - > > www.cominvent.com > > > > > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > > > > > Hi, > > > > > > I am having some problems getting the data-import-handler in Solr to > > > work. I have tried a lot of things but I simply get no response from > > > Solr, not even an error. > > > > > > When calling the API: > > > http://localhost:8983/solr/nh/dataimport?command=full-import > > > { > > > "responseHeader":{ > > >"status":0, > > >"QTime":38}, > > > "initArgs":[ > > >"defaults",[ > > > > > > "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], > > > "command":"full-import", > > > "status":"idle", > > > "importResponse":"", > > > "statusMessages":{}} > > > > > > The data looks like this: > > > > > > > > > > > > 2165432 > > > 5 > > > > > > > > > 28548113 > > > 89 > > > > > > > > > The data-config file looks like this: > > > > > > > > > > > > > > > > >name="xml" > > >pk="id" > > >processor="XPathEntityProcessor" > > >stream="true" > > >forEach="/journal/doc" > > >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > > >transformer="RegexTransformer,TemplateTransformer" > > >> > > > > > > > > > > > > > > > > > > > > > > > > And I referenced the jar files in the solr-config.xml as well as adding > > > the request-handler by adding the following lines: > > > > > > > > regex="solr-dataimporthandler-\d.*\.jar" /> > > dir="${solr.install.dir:../../../..}/dist/" > > > regex="solr-dataimporthandler-extras-\d.*\.jar" /> > > > > > > > > > > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > > > > > > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml > >

Re: data-import-handler for solr-7.5.0

2018-10-02 Thread Alexandre Rafalovitch

Ok, so then you can switch to debug mode and keep trying to figure it out. Also try BinFileDataSource or URLDataSource, maybe it will have an easier way. Or using relative path (example: https://github.com/arafalov/solr-apachecon2018-presentation/blob/master/configsets/pets-final/pets-data-config.xml). Regards, Alex. On Tue, 2 Oct 2018 at 12:46, Martin Frank Hansen (MHQ) wrote: > > Thanks for the info, the UI looks interesting... It does read the data-config > correctly, so the problem is probably in this file. > > Martin Frank Hansen, Senior Data Analytiker > > Data, IM & Analytics > > > > Lautrupparken 40-42, DK-2750 Ballerup > E-mail m...@kmd.dk Web www.kmd.dk > Mobil +4525571418 > > -Oprindelig meddelelse- > Fra: Alexandre Rafalovitch > Sendt: 2. oktober 2018 18:18 > Til: solr-user > Emne: Re: data-import-handler for solr-7.5.0 > > Admin UI for DIH will show you the config file read. So, if nothing is there, > the path is most likely the issue > > You can also provide or update the configuration right in UI if you enable > debug. > > Finally, the config file is reread on every invocation, so you don't need to > restart the core after changing it. > > Hope this helps, >Alex. > On Tue, 2 Oct 2018 at 11:45, Jan Høydahl wrote: > > > > > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > > > > Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? > > > > -- > > Jan Høydahl, search solution architect Cominvent AS - > > www.cominvent.com > > > > > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > > > > > Hi, > > > > > > I am having some problems getting the data-import-handler in Solr to > > > work. I have tried a lot of things but I simply get no response from > > > Solr, not even an error. > > > > > > When calling the API: > > > http://localhost:8983/solr/nh/dataimport?command=full-import > > > { > > > "responseHeader":{ > > >"status":0, > > >"QTime":38}, > > > "initArgs":[ > > >"defaults",[ > > > "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], > > > "command":"full-import", > > > "status":"idle", > > > "importResponse":"", > > > "statusMessages":{}} > > > > > > The data looks like this: > > > > > > > > > > > > 2165432 > > > 5 > > > > > > > > > 28548113 > > > 89 > > > > > > > > > The data-config file looks like this: > > > > > > > > > > > > > > > > >name="xml" > > >pk="id" > > >processor="XPathEntityProcessor" > > >stream="true" > > >forEach="/journal/doc" > > >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > > >transformer="RegexTransformer,TemplateTransformer" > > >> > > > > > > > > > > > > > > > > > > > > > > > > And I referenced the jar files in the solr-config.xml as well as adding > > > the request-handler by adding the following lines: > > > > > > > > regex="solr-dataimporthandler-\d.*\.jar" /> > > dir="${solr.install.dir:../../../..}/dist/" > > > regex="solr-dataimporthandler-extras-\d.*\.jar" /> > > > > > > > > > > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > > > > > > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml > > > > > > > > > > > > I am running a core residing in the folder > > > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in > > > “C:/Users/z6mhq/Documents/solr-7.5.0”. > > > > > > I really hope that someone can spot my mistake… > > > > > > Thanks in advance. > > > > > > Martin Frank Hansen > > > > > > > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder > > > du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der > > > fortæl

SV: data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)

Thanks for the info, the UI looks interesting... It does read the data-config correctly, so the problem is probably in this file. Martin Frank Hansen, Senior Data Analytiker Data, IM & Analytics Lautrupparken 40-42, DK-2750 Ballerup E-mail m...@kmd.dk Web www.kmd.dk Mobil +4525571418 -Oprindelig meddelelse- Fra: Alexandre Rafalovitch Sendt: 2. oktober 2018 18:18 Til: solr-user Emne: Re: data-import-handler for solr-7.5.0 Admin UI for DIH will show you the config file read. So, if nothing is there, the path is most likely the issue You can also provide or update the configuration right in UI if you enable debug. Finally, the config file is reread on every invocation, so you don't need to restart the core after changing it. Hope this helps, Alex. On Tue, 2 Oct 2018 at 11:45, Jan Høydahl wrote: > > > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > > Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? > > -- > Jan Høydahl, search solution architect Cominvent AS - > www.cominvent.com > > > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > > > Hi, > > > > I am having some problems getting the data-import-handler in Solr to work. > > I have tried a lot of things but I simply get no response from Solr, not > > even an error. > > > > When calling the API: > > http://localhost:8983/solr/nh/dataimport?command=full-import > > { > > "responseHeader":{ > >"status":0, > >"QTime":38}, > > "initArgs":[ > >"defaults",[ > > "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], > > "command":"full-import", > > "status":"idle", > > "importResponse":"", > > "statusMessages":{}} > > > > The data looks like this: > > > > > > > > 2165432 > > 5 > > > > > > 28548113 > > 89 > > > > > > The data-config file looks like this: > > > > > > > > > > >name="xml" > >pk="id" > >processor="XPathEntityProcessor" > >stream="true" > >forEach="/journal/doc" > >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > >transformer="RegexTransformer,TemplateTransformer" > >> > > > > > > > > > > > > > > > > And I referenced the jar files in the solr-config.xml as well as adding the > > request-handler by adding the following lines: > > > > > regex="solr-dataimporthandler-\d.*\.jar" /> > dir="${solr.install.dir:../../../..}/dist/" > > regex="solr-dataimporthandler-extras-\d.*\.jar" /> > > > > > > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > > > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml > > > > > > > > I am running a core residing in the folder > > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in > > “C:/Users/z6mhq/Documents/solr-7.5.0”. > > > > I really hope that someone can spot my mistake… > > > > Thanks in advance. > > > > Martin Frank Hansen > > > > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du > > KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der > > fortæller, hvordan vi behandler oplysninger om dig. > > > > Protection of your personal data is important to us. Here you can read > > KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we > > process your personal data. > > > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. > > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst > > informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi > > dig slette e-mailen i dit system uden at videresende eller kopiere den. > > Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri > > for virus og andre fejl, som kan påvirke computeren eller it-systemet, > > hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi > > påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse > > med at modtage og bruge e-mailen. > > > > Please note that this message may contain confidential information. If you > > have received this message by mistake, please inform the sender of the > > mistake by sending a reply, then delete the message from your system > > without making, distributing or retaining any copies of it. Although we > > believe that the message and any attachments are free from viruses and > > other errors that might affect the computer or it-system where it is > > received and read, the recipient opens the message at his or her own risk. > > We assume no responsibility for any loss or damage arising from the receipt > > or use of this message. >

SV: data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)

Unfortunately, still no luck. { "responseHeader":{ "status":0, "QTime":8}, "initArgs":[ "defaults",[ "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], "command":"full-import", "status":"idle", "importResponse":"", "statusMessages":{ "Total Requests made to DataSource":"0", "Total Rows Fetched":"0", "Total Documents Processed":"0", "Total Documents Skipped":"0", "Full Dump Started":"2018-10-02 16:15:21", "":"Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.", "Committed":"2018-10-02 16:15:22", "Time taken":"0:0:0.136"}} Seems like it is not even trying to read the data. Martin Frank Hansen -Oprindelig meddelelse- Fra: Jan Høydahl Sendt: 2. oktober 2018 17:46 Til: solr-user@lucene.apache.org Emne: Re: data-import-handler for solr-7.5.0 > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > Hi, > > I am having some problems getting the data-import-handler in Solr to work. I > have tried a lot of things but I simply get no response from Solr, not even > an error. > > When calling the API: > http://localhost:8983/solr/nh/dataimport?command=full-import > { > "responseHeader":{ >"status":0, >"QTime":38}, > "initArgs":[ >"defaults",[ > "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], > "command":"full-import", > "status":"idle", > "importResponse":"", > "statusMessages":{}} > > The data looks like this: > > > > 2165432 > 5 > > > 28548113 > 89 > > > The data-config file looks like this: > > > > > name="xml" >pk="id" >processor="XPathEntityProcessor" >stream="true" >forEach="/journal/doc" >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" >transformer="RegexTransformer,TemplateTransformer" >> > > > > > > > > And I referenced the jar files in the solr-config.xml as well as adding the > request-handler by adding the following lines: > > regex="solr-dataimporthandler-\d.*\.jar" /> dir="${solr.install.dir:../../../..}/dist/" > regex="solr-dataimporthandler-extras-\d.*\.jar" /> > > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml > > > > I am running a core residing in the folder > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in > “C:/Users/z6mhq/Documents/solr-7.5.0”. > > I really hope that someone can spot my mistake… > > Thanks in advance. > > Martin Frank Hansen > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du > KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, > hvordan vi behandler oplysninger om dig. > > Protection of your personal data is important to us. Here you can read KMD’s > Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process > your personal data. > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere > afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette > e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen > og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og > andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages > og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget > ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge > e-mailen. > > Please note that this message may contain confidential information. If you > have received this message by mistake, please inform the sender of the > mistake by sending a reply, then delete the message from your system without > making, distributing or retaining any copies of it. Although we believe that > the message and any attachments are free from viruses and other errors that > might affect the computer or it-system where it is received and read, the > recipient opens the message at his or her own risk. We assume no > responsibility for any loss or damage arising from the receipt or use of this > message.

Re: data-import-handler for solr-7.5.0

2018-10-02 Thread Alexandre Rafalovitch

Admin UI for DIH will show you the config file read. So, if nothing is there, the path is most likely the issue You can also provide or update the configuration right in UI if you enable debug. Finally, the config file is reread on every invocation, so you don't need to restart the core after changing it. Hope this helps, Alex. On Tue, 2 Oct 2018 at 11:45, Jan Høydahl wrote: > > > url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > > Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > > > Hi, > > > > I am having some problems getting the data-import-handler in Solr to work. > > I have tried a lot of things but I simply get no response from Solr, not > > even an error. > > > > When calling the API: > > http://localhost:8983/solr/nh/dataimport?command=full-import > > { > > "responseHeader":{ > >"status":0, > >"QTime":38}, > > "initArgs":[ > >"defaults",[ > > "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], > > "command":"full-import", > > "status":"idle", > > "importResponse":"", > > "statusMessages":{}} > > > > The data looks like this: > > > > > > > > 2165432 > > 5 > > > > > > > > 28548113 > > 89 > > > > > > > > > > The data-config file looks like this: > > > > > > > > > > >name="xml" > >pk="id" > >processor="XPathEntityProcessor" > >stream="true" > >forEach="/journal/doc" > >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" > >transformer="RegexTransformer,TemplateTransformer" > >> > > > > > > > > > > > > > > > > And I referenced the jar files in the solr-config.xml as well as adding the > > request-handler by adding the following lines: > > > > > regex="solr-dataimporthandler-\d.*\.jar" /> > > > regex="solr-dataimporthandler-extras-\d.*\.jar" /> > > > > > > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > > > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml > > > > > > > > I am running a core residing in the folder > > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in > > “C:/Users/z6mhq/Documents/solr-7.5.0”. > > > > I really hope that someone can spot my mistake… > > > > Thanks in advance. > > > > Martin Frank Hansen > > > > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du > > KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der > > fortæller, hvordan vi behandler oplysninger om dig. > > > > Protection of your personal data is important to us. Here you can read > > KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we > > process your personal data. > > > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. > > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst > > informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi > > dig slette e-mailen i dit system uden at videresende eller kopiere den. > > Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri > > for virus og andre fejl, som kan påvirke computeren eller it-systemet, > > hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi > > påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse > > med at modtage og bruge e-mailen. > > > > Please note that this message may contain confidential information. If you > > have received this message by mistake, please inform the sender of the > > mistake by sending a reply, then delete the message from your system > > without making, distributing or retaining any copies of it. Although we > > believe that the message and any attachments are free from viruses and > > other errors that might affect the computer or it-system where it is > > received and read, the recipient opens the message at his or her own risk. > > We assume no responsibility for any loss or damage arising from the receipt > > or use of this message. >

Re: data-import-handler for solr-7.5.0

2018-10-02 Thread Jan Høydahl

> url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > Hi, > > I am having some problems getting the data-import-handler in Solr to work. I > have tried a lot of things but I simply get no response from Solr, not even > an error. > > When calling the API: > http://localhost:8983/solr/nh/dataimport?command=full-import > { > "responseHeader":{ >"status":0, >"QTime":38}, > "initArgs":[ >"defaults",[ > "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], > "command":"full-import", > "status":"idle", > "importResponse":"", > "statusMessages":{}} > > The data looks like this: > > > > 2165432 > 5 > > > > 28548113 > 89 > > > > > The data-config file looks like this: > > > > > name="xml" >pk="id" >processor="XPathEntityProcessor" >stream="true" >forEach="/journal/doc" >url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" >transformer="RegexTransformer,TemplateTransformer" >> > > > > > > > > And I referenced the jar files in the solr-config.xml as well as adding the > request-handler by adding the following lines: > > regex="solr-dataimporthandler-\d.*\.jar" /> > regex="solr-dataimporthandler-extras-\d.*\.jar" /> > > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > name="config">C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml > > > > I am running a core residing in the folder > “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in > “C:/Users/z6mhq/Documents/solr-7.5.0”. > > I really hope that someone can spot my mistake… > > Thanks in advance. > > Martin Frank Hansen > > > Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du > KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, > hvordan vi behandler oplysninger om dig. > > Protection of your personal data is important to us. Here you can read KMD’s > Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process > your personal data. > > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere > afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette > e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen > og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og > andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages > og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget > ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge > e-mailen. > > Please note that this message may contain confidential information. If you > have received this message by mistake, please inform the sender of the > mistake by sending a reply, then delete the message from your system without > making, distributing or retaining any copies of it. Although we believe that > the message and any attachments are free from viruses and other errors that > might affect the computer or it-system where it is received and read, the > recipient opens the message at his or her own risk. We assume no > responsibility for any loss or damage arising from the receipt or use of this > message.

data-import-handler for solr-7.5.0

2018-10-02 Thread Martin Frank Hansen (MHQ)

Hi, I am having some problems getting the data-import-handler in Solr to work. I have tried a lot of things but I simply get no response from Solr, not even an error. When calling the API: http://localhost:8983/solr/nh/dataimport?command=full-import { "responseHeader":{ "status":0, "QTime":38}, "initArgs":[ "defaults",[ "config","C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml"]], "command":"full-import", "status":"idle", "importResponse":"", "statusMessages":{}} The data looks like this: 2165432 5 28548113 89 The data-config file looks like this: And I referenced the jar files in the solr-config.xml as well as adding the request-handler by adding the following lines: C:/Users/z6mhq/Desktop/nh/nh/conf/data-config.xml I am running a core residing in the folder “C:/Users/z6mhq/Desktop/nh/nh/conf” while the Solr installation is in “C:/Users/z6mhq/Documents/solr-7.5.0”. I really hope that someone can spot my mistake… Thanks in advance. Martin Frank Hansen Beskyttelse af dine personlige oplysninger er vigtig for os. Her finder du KMD’s Privatlivspolitik<http://www.kmd.dk/Privatlivspolitik>, der fortæller, hvordan vi behandler oplysninger om dig. Protection of your personal data is important to us. Here you can read KMD’s Privacy Policy<http://www.kmd.net/Privacy-Policy> outlining how we process your personal data. Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information. Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere afsender om fejlen ved at bruge svarfunktionen. Samtidig beder vi dig slette e-mailen i dit system uden at videresende eller kopiere den. Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning er fri for virus og andre fejl, som kan påvirke computeren eller it-systemet, hvori den modtages og læses, åbnes den på modtagerens eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som er opstået i forbindelse med at modtage og bruge e-mailen. Please note that this message may contain confidential information. If you have received this message by mistake, please inform the sender of the mistake by sending a reply, then delete the message from your system without making, distributing or retaining any copies of it. Although we believe that the message and any attachments are free from viruses and other errors that might affect the computer or it-system where it is received and read, the recipient opens the message at his or her own risk. We assume no responsibility for any loss or damage arising from the receipt or use of this message.

Re: Data Import Handler with Solr Source behind Load Balancer

2018-09-14 Thread Emir Arnautović

Hi Thomas, Is this SolrCloud or Solr master-slave? Do you update index while indexing? Did you check if all your instances behind LB are in sync if you are using master-slave? My guess would be that DIH is using cursors to read data from another Solr. If you are using multiple Solr instances behind LB there might be some diffs in index that results in different documents being returned for the same cursor mark. Is num doc and max doc the same on new instance after import? HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 12 Sep 2018, at 05:53, Zimmermann, Thomas > wrote: > > We have a Solr v7 Instance sourcing data from a Data Import Handler with a > Solr data source running Solr v4. When it hits a single server in that > instance directly, all documents are read and written correctly to the v7. > When we hit the load balancer DNS entry, the resulting data import handler > json states that it read all the documents and skipped none, and all looks > fine, but the result set is missing ~20% of the documents in the v7 core. > This has happened multiple time on multiple environments. > > Any thoughts on whether this might be a bug in the underlying DIH code? I'll > also pass it along to the server admins on our side for input.

Data Import Handler with Solr Source behind Load Balancer

2018-09-11 Thread Zimmermann, Thomas

We have a Solr v7 Instance sourcing data from a Data Import Handler with a Solr data source running Solr v4. When it hits a single server in that instance directly, all documents are read and written correctly to the v7. When we hit the load balancer DNS entry, the resulting data import handler json states that it read all the documents and skipped none, and all looks fine, but the result set is missing ~20% of the documents in the v7 core. This has happened multiple time on multiple environments. Any thoughts on whether this might be a bug in the underlying DIH code? I'll also pass it along to the server admins on our side for input.

Re: Data Import from Command Line

2018-08-20 Thread Adam Blank

Thank you both for the responses. I was able to get the import working through telnet, and I'll see if I can get the post utility working as that seems like a better option. Thanks, Adam On Mon, Aug 20, 2018, 2:04 PM Alexandre Rafalovitch wrote: > Admin UI just hits Solr for a particular URL with specific parameters. > You could totally call it from the command line, but it _would_ need > to be an HTTP client of some sort. You could encode all of the > parameters into the DIH (or a new) handler, it is all defined in > solrconfig.xml (/dataimport is the default one). > > If you don't have curl, maybe you have wget? Or lynx? Or, just for > giggles, you could Telnet into port 80 and manually type the required > command ( > http://blog.tonycode.com/tech-stuff/http-notes/making-http-requests-via-telnet/ > ): > GET /dataimport?param=value HTTP/1.0 > > Regards, >Alex. > P.s. And yes, maybe bin/post could be used as well. Or the previous > direct java invocation of the posttool jar. May need to massage the > parameters a bit though. > > On 20 August 2018 at 13:45, Adam Blank wrote: > > Hi, > > > > I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way to > import > > the index from the command line instead of using the admin console? I > > don't have the ability to use a HTTP client such as cURL to connect to > the > > console. > > > > Thank you, > > Adam >

Re: Data Import from Command Line

2018-08-20 Thread Alexandre Rafalovitch

Admin UI just hits Solr for a particular URL with specific parameters. You could totally call it from the command line, but it _would_ need to be an HTTP client of some sort. You could encode all of the parameters into the DIH (or a new) handler, it is all defined in solrconfig.xml (/dataimport is the default one). If you don't have curl, maybe you have wget? Or lynx? Or, just for giggles, you could Telnet into port 80 and manually type the required command (http://blog.tonycode.com/tech-stuff/http-notes/making-http-requests-via-telnet/): GET /dataimport?param=value HTTP/1.0 Regards, Alex. P.s. And yes, maybe bin/post could be used as well. Or the previous direct java invocation of the posttool jar. May need to massage the parameters a bit though. On 20 August 2018 at 13:45, Adam Blank wrote: > Hi, > > I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way to import > the index from the command line instead of using the admin console? I > don't have the ability to use a HTTP client such as cURL to connect to the > console. > > Thank you, > Adam

Re: Data Import from Command Line

2018-08-20 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Adam, On 8/20/18 1:45 PM, Adam Blank wrote: > I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way > to import the index from the command line instead of using the > admin console? I don't have the ability to use a HTTP client such > as cURL to connect to the console. I'm not sure when it was added, but there is a program called "post" which comes with later versions of Solr that can be used to load data into an index. - -chris -BEGIN PGP SIGNATURE- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlt7AfcACgkQHPApP6U8 pFgtgw/7BTV7shvNcXKrpTB11g0wjYXAJOlqARlYgWFcQIhVcs1jfbJi8O6Yxh0x BIA/EAdob9zC/EgYbMfkM/duibr2A1/wF+CkhhTd6M/HcoSOXbI31L1LDo/xa0lg z6t3AO9WYYKnFmD2JIxdidH1zHpIz74cAc3q43PFVtLNW2fVT2cNlg7Vn6vdVmoi 79VLPnvdyxZRdQtxbhdvCribPdFP6YLC3dgxh1KeeZzdO0OcjQykSrssX/hd207z 9iuw2TusoUIgXQsMLRtnKqqVp38MYPppk49uGprhB8iTJjDAVlvgD3jURef7S7s/ w1KBPVZTGQFh6cvzjOOZHUkaj0hX4PuYkun/hQY3Uy5kBIw5fo0Y10bjVcRZGYrb SQDTUe0sdfU27qaY8DLqSf21to5K+wTIuOO28C1TkHkjKymg0w7THz583o0aOCzr 5fjNN00FevrWFLm+n7c2tToW3H1cAZkh5XRDDDUYnqzVzchSOHlFKM1X0gMOq8Lf If434uctruwsqBrkscTWcS5UALGLxuwtNk9trLLeRII8YapB6MI6xoUnCvWFv1sO fziqKXXwBmrI+v/1FqiR8Md3r32jm8Gy54acViJc9+szUEM26C+FSzvsdGnf5oVr tlsHVwLBPORS6hGJ+MvqMGkrxlO1WNm5MrJxHNoyQ5KqAL7WT+s= =+VTK -END PGP SIGNATURE-

Data Import from Command Line

2018-08-20 Thread Adam Blank

Hi, I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way to import the index from the command line instead of using the admin console? I don't have the ability to use a HTTP client such as cURL to connect to the console. Thank you, Adam

Re: Child=true does not work for data import handler

2018-08-08 Thread omp...@rediffmail.com

But in my case i see output as below 0 0 *:* on xml 1533734431931 IT 1 1 1608130338704326656 Data 1 2 1608130338704326656 omkar 1 1608130338704326656 ITI 2 3 1608130338712715264 Entry 2 4 1608130338712715264 ashwin 2 1608130338712715264 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Child=true does not work for data import handler

2018-08-08 Thread Mikhail Khludnev

This is how nested docs look like. These are document blocks with parent in the end. Block Join Queries work on these blocks. On Wed, Aug 8, 2018 at 12:47 PM omp...@rediffmail.com < omkar.pra...@gmail.com> wrote: > Thanks a lot Mikhail. But as per documentation below nested document > ingestion is possible. Is this limitation of DIH? > > > https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments > > > Also can block join query be used to get expect relationship for data i > have > ingested using DIH? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Sincerely yours Mikhail Khludnev

Re: Child=true does not work for data import handler

2018-08-08 Thread omp...@rediffmail.com

Thanks a lot Mikhail. But as per documentation below nested document ingestion is possible. Is this limitation of DIH? https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments Also can block join query be used to get expect relationship for data i have ingested using DIH? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Child=true does not work for data import handler

2018-08-06 Thread Mikhail Khludnev

It never works like you expect. You need to search for parents and then hook up [child]. I see some improvements are coming, but now that is. On Mon, Aug 6, 2018 at 9:11 PM omp...@rediffmail.com wrote: > Thanks Mikhail verbose did help. _root_ field was missing in schema also in > make some changes in child entity. Like i created id as alias to emp_id ( > in > child query) which is id column of parent table. > > query="SELECT id,name > FROM emp"> > > > name="child" query="SELECT dept,emp_id as id > FROM emp_details where emp_id='${parent.id}' "> > column="dept" name="dept" /> > > > > > Data seems to be returning correctly as below. but it show child documents > and parent documents are shown as individual document. i was expecting 2 > documents and 2 child document for each doc. > Any inputs will be helpful > > > "response":{"numFound":6,"start":0,"docs":[ > { > "dept":"IT", > "id":"1", > "_version_":1608073809653399552}, > { > "dept":"Data", > "id":"1", > "_version_":1608073809653399552}, > { > "name":"omkar", > "id":"1", > "_version_":1608073809653399552}, > { > "dept":"ITI", > "id":"2", > "_version_":1608073809667031040}, > { > "dept":"Entry", > "id":"2", > "_version_":1608073809667031040}, > { > "name":"ashwin", > "id":"2", > "_version_":1608073809667031040}] > }} > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Sincerely yours Mikhail Khludnev

Re: Child=true does not work for data import handler

2018-08-06 Thread omp...@rediffmail.com

Thanks Mikhail verbose did help. _root_ field was missing in schema also in make some changes in child entity. Like i created id as alias to emp_id ( in child query) which is id column of parent table. Data seems to be returning correctly as below. but it show child documents and parent documents are shown as individual document. i was expecting 2 documents and 2 child document for each doc. Any inputs will be helpful "response":{"numFound":6,"start":0,"docs":[ { "dept":"IT", "id":"1", "_version_":1608073809653399552}, { "dept":"Data", "id":"1", "_version_":1608073809653399552}, { "name":"omkar", "id":"1", "_version_":1608073809653399552}, { "dept":"ITI", "id":"2", "_version_":1608073809667031040}, { "dept":"Entry", "id":"2", "_version_":1608073809667031040}, { "name":"ashwin", "id":"2", "_version_":1608073809667031040}] }} -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Child=true does not work for data import handler

2018-08-06 Thread Mikhail Khludnev

DIH has debug&verbose modes. Have you tried to use them? On Mon, Aug 6, 2018 at 4:11 PM omp...@rediffmail.com wrote: > Thanks Mikhail, i tried changing conf but that did not help > > > driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://localhost:3306/test" > user="root" > password="" > session.group_concat_max_len = '7' > /> > > > > transformer="RegexTransformer" > query="SELECT id,name FROM emp"> > > > > > name="dept" /> > name="childpk" /> > > > > > > > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Sincerely yours Mikhail Khludnev

Re: Child=true does not work for data import handler

2018-08-06 Thread omp...@rediffmail.com

Thanks Mikhail, i tried changing conf but that did not help -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Child=true does not work for data import handler

2018-08-04 Thread Mikhail Khludnev

Hi, Omkar. Could it happen that child docs as well as parents are assigned same "id" field values implicitly and removed due to uniqueKey collision? On Sat, Aug 4, 2018 at 10:12 PM omkar.pra...@gmail.com < omkar.pra...@gmail.com> wrote: > I am using similar db-data config as below for indexing this parent-child > data. solr version 6.6.2 > > SELECT id as emp_id, name FROM emp; > +++ > | emp_id | name | > +++ > | 1 | omkar | > | 2 | ashwin | > +++ > 2 rows in set (0.00 sec) > > select * from emp_details ; > +--++---+ > | id | emp_id | dept | > +--++---+ > |1 | 1 | IT| > |2 | 1 | Data | > |3 | 2 | ITI | > |4 | 2 | Entry | > +--++---+ > 4 rows in set (0.00 sec) > > > driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://localhost:3306/test" > user="root" > password="" > session.group_concat_max_len = '7' > /> > > > > transformer="RegexTransformer" > query=" SELECT id, name FROM emp"> > > > > > > name="dept" /> > > > > > > > > > > > { > "responseHeader":{ > "status":0, > "QTime":0, > "params":{ > "q":"*:*", > "indent":"on", > "wt":"json", > "_":"1533325469162"}}, > "response":{"numFound":2,"start":0,"docs":[ > { > "name":"omkar", > "id":"1", > "dept":"IT", > "_version_":1607809693975052288}, > { > "name":"ashwin", > "id":"2", > "dept":"ITI", > "_version_":1607809693978198016}] > }} > > > I am expecting multi child documents. so i added child=true > > > > but output of indexing is as below and it does not process any doucment > > Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. > Requests: 3 , Fetched: 6 , Skipped: 0 , Processed: 0 > Started: less than a minute ago > > can you helping me if there is any issue with db or solr config > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > -- Sincerely yours Mikhail Khludnev

Child=true does not work for data import handler

2018-08-04 Thread omkar.pra...@gmail.com

I am using similar db-data config as below for indexing this parent-child data. solr version 6.6.2 SELECT id as emp_id, name FROM emp; +++ | emp_id | name | +++ | 1 | omkar | | 2 | ashwin | +++ 2 rows in set (0.00 sec) select * from emp_details ; +--++---+ | id | emp_id | dept | +--++---+ |1 | 1 | IT| |2 | 1 | Data | |3 | 2 | ITI | |4 | 2 | Entry | +--++---+ 4 rows in set (0.00 sec) { "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", "indent":"on", "wt":"json", "_":"1533325469162"}}, "response":{"numFound":2,"start":0,"docs":[ { "name":"omkar", "id":"1", "dept":"IT", "_version_":1607809693975052288}, { "name":"ashwin", "id":"2", "dept":"ITI", "_version_":1607809693978198016}] }} I am expecting multi child documents. so i added child=true but output of indexing is as below and it does not process any doucment Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. Requests: 3 , Fetched: 6 , Skipped: 0 , Processed: 0 Started: less than a minute ago can you helping me if there is any issue with db or solr config -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

How to use tika-OCR in data import handler?

2018-07-23 Thread Yasufumi Mizoguchi

Hi, I am trying to use tika-OCR(Tesseract) in data import handler and found that processing English documents was quite good. But I am struggling to process the other languages such as Japanese, Chinese, etc... So, I want to know how to switch Tesseract-OCR's processing language via data import handler config or tikaConfig param. Any points would be appreciated. Thanks, Yasufumi

Re: How to know the name(url) of documents that data import handler skipped

2018-07-08 Thread Yasufumi Mizoguchi

Hi, Rahul. Thank you for your reply. I already tried that, and I could see what files were read(via FileDataSource) and what files were added(via UpdateLog). So, by checking both, I could determine bad files. But I want to know bad files directly. Thanks, Yasufumi 2018年7月9日(月) 12:47 Rahul Singh : > Have you tried changing the log level > https://lucene.apache.org/solr/guide/7_2/configuring-logging.html > > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > On Jul 8, 2018, 8:54 PM -0500, Yasufumi Mizoguchi , > wrote: > > Hi, > > > > I am trying to indexing files into Solr 7.2 using data import handler > with > > onError=skip option. > > But, I am struggling with determining the skipped documents as logs do > not > > tell which file was bad. > > So, how can I know those files? > > > > Thanks, > > Yasufumi >

Re: How to know the name(url) of documents that data import handler skipped

2018-07-08 Thread Rahul Singh

Have you tried changing the log level https://lucene.apache.org/solr/guide/7_2/configuring-logging.html -- Rahul Singh rahul.si...@anant.us Anant Corporation On Jul 8, 2018, 8:54 PM -0500, Yasufumi Mizoguchi , wrote: > Hi, > > I am trying to indexing files into Solr 7.2 using data import handler with > onError=skip option. > But, I am struggling with determining the skipped documents as logs do not > tell which file was bad. > So, how can I know those files? > > Thanks, > Yasufumi

How to know the name(url) of documents that data import handler skipped

2018-07-08 Thread Yasufumi Mizoguchi

Hi, I am trying to indexing files into Solr 7.2 using data import handler with onError=skip option. But, I am struggling with determining the skipped documents as logs do not tell which file was bad. So, how can I know those files? Thanks, Yasufumi

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-26 Thread msaunier

I have add debug and I have this error: null:java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:429) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:183) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1629) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:530) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:347) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:256) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:382) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626) at java.lang.Thread.run(Thread.java:748) what mysql jdbc connector version I need ? -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : jeudi 26 avril 2018 13:13 À : solr-user@lucene.apache.org Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404 Hello, Where I add that? In the Solr start command? I have add -verbose:class in the /etc/default/solr.in.sh file but they logs are they sames. Thanks, -Message d'origine- De : Mikhail Khludnev [mailto:m...@apache.org] Envoyé : mercredi 25 avril 2018 15:40 À : solr-user Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404 Can you share more log lines around this odd NPE? It might be necessary to restart jvm with -verbose:class and look through its' output to find why it can't load this class. On Wed, Apr 25, 2018 at 11:42 AM, msaunier wrote: > Hello Shawn, > > I have install SolrCloud 7.3 on an other server and the problem not apear. > I create a Jira Ticket ? > > But I have an other problem: > > Full Import > failed:org.apache.solr.handler.dataimport.DataImp

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-26 Thread msaunier

Hello, Where I add that? In the Solr start command? I have add -verbose:class in the /etc/default/solr.in.sh file but they logs are they sames. Thanks, -Message d'origine- De : Mikhail Khludnev [mailto:m...@apache.org] Envoyé : mercredi 25 avril 2018 15:40 À : solr-user Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404 Can you share more log lines around this odd NPE? It might be necessary to restart jvm with -verbose:class and look through its' output to find why it can't load this class. On Wed, Apr 25, 2018 at 11:42 AM, msaunier wrote: > Hello Shawn, > > I have install SolrCloud 7.3 on an other server and the problem not apear. > I create a Jira Ticket ? > > But I have an other problem: > > Full Import > failed:org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to PropertyWriter implementation:ZKPropertiesWriter > at org.apache.solr.handler.dataimport.DataImporter. > createPropertyWriter(DataImporter.java:330) > at org.apache.solr.handler.dataimport.DataImporter. > doFullImport(DataImporter.java:411) > at org.apache.solr.handler.dataimport.DataImporter. > runCmd(DataImporter.java:474) > at org.apache.solr.handler.dataimport.DataImporter. > lambda$runAsync$0(DataImporter.java:457) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.solr.handler.dataimport.DocBuilder. > loadClass(DocBuilder.java:935) > at org.apache.solr.handler.dataimport.DataImporter. > createPropertyWriter(DataImporter.java:326) > ... 4 more > > I regard to solved the problem. > > Cordialement, > > > > > > -Message d'origine- > De : Shawn Heisey [mailto:elyog...@elyograg.org] Envoyé : mardi 24 > avril 2018 17:39 À : solr-user@lucene.apache.org Objet : Re: SolrCloud > DIH (Data Import Handler) MySQL 404 > > On 4/24/2018 2:03 AM, msaunier wrote: > > If I access to the interface, I have a null pointer exception: > > > > null:java.lang.NullPointerException > > at > > org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandler > > Ba > > se.java:233) > > The line of code where this exception occurred uses fundamental Java > methods. Based on the error, either the getClass method common to all > java objects, or the getPackage method on the class, is returning > null. That shouldn't be possible. This has me wondering whether there > is something broken in your particular Solr installation -- corrupt > jars, or something like that. Or maybe something broken in your Java. > > Thanks, > Shawn > > > -- Sincerely yours Mikhail Khludnev

Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-25 Thread Mikhail Khludnev

Can you share more log lines around this odd NPE? It might be necessary to restart jvm with -verbose:class and look through its' output to find why it can't load this class. On Wed, Apr 25, 2018 at 11:42 AM, msaunier wrote: > Hello Shawn, > > I have install SolrCloud 7.3 on an other server and the problem not apear. > I create a Jira Ticket ? > > But I have an other problem: > > Full Import > failed:org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to PropertyWriter implementation:ZKPropertiesWriter > at org.apache.solr.handler.dataimport.DataImporter. > createPropertyWriter(DataImporter.java:330) > at org.apache.solr.handler.dataimport.DataImporter. > doFullImport(DataImporter.java:411) > at org.apache.solr.handler.dataimport.DataImporter. > runCmd(DataImporter.java:474) > at org.apache.solr.handler.dataimport.DataImporter. > lambda$runAsync$0(DataImporter.java:457) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at org.apache.solr.handler.dataimport.DocBuilder. > loadClass(DocBuilder.java:935) > at org.apache.solr.handler.dataimport.DataImporter. > createPropertyWriter(DataImporter.java:326) > ... 4 more > > I regard to solved the problem. > > Cordialement, > > > > > > -Message d'origine- > De : Shawn Heisey [mailto:elyog...@elyograg.org] > Envoyé : mardi 24 avril 2018 17:39 > À : solr-user@lucene.apache.org > Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404 > > On 4/24/2018 2:03 AM, msaunier wrote: > > If I access to the interface, I have a null pointer exception: > > > > null:java.lang.NullPointerException > > at > > org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBa > > se.java:233) > > The line of code where this exception occurred uses fundamental Java > methods. Based on the error, either the getClass method common to all java > objects, or the getPackage method on the class, is returning null. That > shouldn't be possible. This has me wondering whether there is something > broken in your particular Solr installation -- corrupt jars, or something > like that. Or maybe something broken in your Java. > > Thanks, > Shawn > > > -- Sincerely yours Mikhail Khludnev

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-25 Thread msaunier

Hello Shawn, I have install SolrCloud 7.3 on an other server and the problem not apear. I create a Jira Ticket ? But I have an other problem: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to PropertyWriter implementation:ZKPropertiesWriter at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:330) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935) at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:326) ... 4 more I regard to solved the problem. Cordialement, -Message d'origine- De : Shawn Heisey [mailto:elyog...@elyograg.org] Envoyé : mardi 24 avril 2018 17:39 À : solr-user@lucene.apache.org Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404 On 4/24/2018 2:03 AM, msaunier wrote: > If I access to the interface, I have a null pointer exception: > > null:java.lang.NullPointerException > at > org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBa > se.java:233) The line of code where this exception occurred uses fundamental Java methods. Based on the error, either the getClass method common to all java objects, or the getPackage method on the class, is returning null. That shouldn't be possible. This has me wondering whether there is something broken in your particular Solr installation -- corrupt jars, or something like that. Or maybe something broken in your Java. Thanks, Shawn

Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-24 Thread Shawn Heisey

On 4/24/2018 2:03 AM, msaunier wrote: If I access to the interface, I have a null pointer exception: null:java.lang.NullPointerException at org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBase.java:233) The line of code where this exception occurred uses fundamental Java methods. Based on the error, either the getClass method common to all java objects, or the getPackage method on the class, is returning null. That shouldn't be possible. This has me wondering whether there is something broken in your particular Solr installation -- corrupt jars, or something like that. Or maybe something broken in your Java. Thanks, Shawn

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-24 Thread msaunier

I have modify DIH definition to simplify but sames errors: ## indexation_events.xml ## Maxence, -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : mardi 24 avril 2018 10:04 À : solr-user@lucene.apache.org Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404 If I access to the interface, I have a null pointer exception: null:java.lang.NullPointerException at org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBase.java:233) at org.apache.solr.handler.admin.SolrInfoMBeanHandler.addMBean(SolrInfoMBeanHandler.java:187) at org.apache.solr.handler.admin.SolrInfoMBeanHandler.getMBeanInfo(SolrInfoMBeanHandler.java:163) at org.apache.solr.handler.admin.SolrInfoMBeanHandler.handleRequestBody(SolrInfoMBeanHandler.java:80) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : mardi 24 avril 2018 09:25 À : solr-user@lucene.apache.org Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404 Hello Shawn, Thanks for your answers. # So, indexation_events.xml file is: # And the config file is the configoverlay.xml, it's in cloud: { "updateProcessor":{},

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-24 Thread msaunier

If I access to the interface, I have a null pointer exception: null:java.lang.NullPointerException at org.apache.solr.handler.RequestHandlerBase.getVersion(RequestHandlerBase.java:233) at org.apache.solr.handler.admin.SolrInfoMBeanHandler.addMBean(SolrInfoMBeanHandler.java:187) at org.apache.solr.handler.admin.SolrInfoMBeanHandler.getMBeanInfo(SolrInfoMBeanHandler.java:163) at org.apache.solr.handler.admin.SolrInfoMBeanHandler.handleRequestBody(SolrInfoMBeanHandler.java:80) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : mardi 24 avril 2018 09:25 À : solr-user@lucene.apache.org Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404 Hello Shawn, Thanks for your answers. # So, indexation_events.xml file is: # And the config file is the configoverlay.xml, it's in cloud: { "updateProcessor":{}, "runtimeLib":{ "mysql-connector-java":{ "name":"mysql-connector-java", "version":1}, "data-import-handler":{ "name":"data-import-handler", "version":1}}, "requestHandler":{"/test_dih":{ "name":"/test_dih", "class":"org.apache.so

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-24 Thread msaunier

Hello Shawn, Thanks for your answers. # So, indexation_events.xml file is: # And the config file is the configoverlay.xml, it's in cloud: { "updateProcessor":{}, "runtimeLib":{ "mysql-connector-java":{ "name":"mysql-connector-java", "version":1}, "data-import-handler":{ "name":"data-import-handler", "version":1}}, "requestHandler":{"/test_dih":{ "name":"/test_dih", "class":"org.apache.solr.handler.dataimport.DataImportHandler", "runtimeLib":true, "version":1, "defaults":{"config":"DIH/indexation_events.xml"}}} } I go to regard the solr.log Thanks, Maxence -Message d'origine- De : Shawn Heisey [mailto:apa...@elyograg.org] Envoyé : lundi 23 avril 2018 18:28 À : solr-user@lucene.apache.org Objet : Re: SolrCloud DIH (Data Import Handler) MySQL 404 On 4/23/2018 8:30 AM, msaunier wrote: > I have add debug: > > curl > "http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=f > ull-im > port&commit=true&debug=true" >name="responseHeader">500 name="QTime">588 name="runtimeLib">true1 name="defaults"> name="config">DIH/indexation_events.xml

Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread Mikhail Khludnev

t; > curl > "http://srv-formation-solr:8983/solr/arguments_test/test_ > dih?command=full-im > port&commit=true&debug=true&command=reload-config" > > > 500 name="QTime">647 name="msg">java.util.Arrays$ArrayList cannot be cast to > java.lang.Stringjava.lang.ClassCastException: > java.util.Arrays$ArrayList cannot be cast to java.lang.String > at > org.apache.solr.handler.dataimport.RequestInfo.< > init>(RequestInfo.java > :52) > at > org.apache.solr.handler.dataimport.DataImportHandler. > handleRequestBody(DataI > mportHandler.java:128) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest( > RequestHandlerBase. > java:173) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) > at > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) > at org.apache.solr.servlet.HttpSolrCall.call( > HttpSolrCall.java:529) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java: > 361) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java: > 305) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain. > doFilter(ServletHandler > .java:1691) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:143 > ) > at > org.eclipse.jetty.security.SecurityHandler.handle( > SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler. > doHandle(SessionHandler.java > :226) > at > org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java > :1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler. > doScope(SessionHandler.java: > 185) > at > org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java: > 1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:141 > ) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection. > handle(ContextHand > lerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection. > java:119) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:1 > 34) > at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle( > RewriteHandler.java: > 335) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:1 > 34) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle( > HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable( > HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded( > AbstractConne > ction.java:273) > at org.eclipse.jetty.io.FillInterest.fillable( > FillInterest.java:95) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run( > SelectChannelEndPoint.java: > 93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > executeProduceC > onsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > produceConsume( > ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run( > ExecuteProd > uceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java: > 671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run( > QueuedThreadPool.java:5 > 89) > at java.lang.Thread.run(Thread.java:748) > 500 > > > > -Message d'origine- > De : msaunier [mailto:msaun...@citya.com] > Envoyé : lundi 23 avril 2018 14:47 > À : solr-user@lucene.apache.org > Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404 > > I have correct url to : curl > http://srv-formation-solr:8983/solr/arguments_test/test_ > dih?command=full-imp > ort > > And change overlay config > "/configs/arguments_test/DIH/indexation_events.xml" to " > DIH/indexation_events.xml" > > But I have a new error: > > Full Import > failed:org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable > to PropertyWriter implementation:ZKPropertiesWriter > at > org.apache.solr.handle

Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread Shawn Heisey

On 4/23/2018 8:30 AM, msaunier wrote: I have add debug: curl "http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-im port&commit=true&debug=true" 500588true1DIH/indexation_events.xml This is looking like a really nasty error that I cannot understand, possibly caused by an error in configuration. Can you share your dataimport handler config (will likely be in solrconfig.xml) and the contents of DIH/indexation_events.xml? There is probably a database password in that file, you'll want to redact that. You should look at solr.log and see if there are other errors happening that didn't make it into the response. Thanks, Shawn

Re: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread Shawn Heisey

On 4/23/2018 6:12 AM, msaunier wrote: I have a problem with DIH in SolrCloud. I don't understand why, so I need your help. Solr 6.6 in Cloud. ## COMMAND: curl http://srv-formation-solr:8983/solr/test_dih?command=full-import RESULT: Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/test_dih. Reason: Not Found This looks like an incomplete URL. What exactly is test-dih? If it is the name of your collection, then you are missing the handler, which is usually "/dataimport". If "/test-dih" is the name if your handler, then you are missing the name of the core or the collection. With SolrCloud, it's actually better to direct your request to a specific core for DIH, something like collection_shard1_replica1. If you direct it to the collection you never know which core will actually end up with the request, and will have a hard time getting the status of the import if the status request ends up on a different core than the full-import command. A correct full URL should look something like this: http://host:port/solr/test_shard1_replica2/dataimport?command=full-import Looking at later messages, you may have figured this out at least partially. The exception in your second message looks really odd. (and I really have no idea what you are talking about with an overlay) Thanks, Shawn

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread msaunier

er.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java :226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java :1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java: 185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java: 1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141 ) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection. java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 34) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java: 335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 34) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConne ction.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java: 93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceC onsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume( ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProd uceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java: 671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:5 89) at java.lang.Thread.run(Thread.java:748) 500 -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : lundi 23 avril 2018 14:47 À : solr-user@lucene.apache.org Objet : RE: SolrCloud DIH (Data Import Handler) MySQL 404 I have correct url to : curl http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-imp ort And change overlay config "/configs/arguments_test/DIH/indexation_events.xml" to " DIH/indexation_events.xml" But I have a new error: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to PropertyWriter implementation:ZKPropertiesWriter at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp orter.java:330) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja va:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474 ) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport er.java:457) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935) at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp orter.java:326) ... 4 more Cordialement, -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : lundi 23 avril 2018 14:12 À : solr-user@lucene.apache.org Objet : SolrCloud DIH (Data Import Handler) MySQL 404 Hello, I have a problem with DIH in SolrCloud. I don't understand why, so I need your help. Solr 6.6 in Cloud. ## COMMAND: curl http://srv-formation-solr:8983/solr/test_dih?command=full-import RESULT: Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/test_dih. Reason: Not Found ## CONFIG: 1. I have create with the command the .system collection 2. I have post in the blob the DataImportHandler jar file and the MySQL connector jar 3. I have add data-import-handler and mysql-connector-java runtimeLib on the configoverlay.json file with the API 4. I have create the DIH folder on the cloud with zkcli.sh script 5. I have push with zkcli the DIH .xml configuration file CONFIGOVERLAY CONTENT : { "runtimeLib":{ "mysql-connector-java":{ "name":"mysql-connector-java", "version":1}, "data-import-handler":{ "name":"data-import-handler", "version":1}}, "requestHandler":{"/test_dih":{ "name":"/test_dih", "class":"org.apache.solr.handler.dataimport.DataImportHandler", "runtimeLib":true, "version":1, "defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}} } Thanks for your help

RE: SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread msaunier

I have correct url to : curl http://srv-formation-solr:8983/solr/arguments_test/test_dih?command=full-imp ort And change overlay config "/configs/arguments_test/DIH/indexation_events.xml" to " DIH/indexation_events.xml" But I have a new error: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to PropertyWriter implementation:ZKPropertiesWriter at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp orter.java:330) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja va:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474 ) at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImport er.java:457) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:935) at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImp orter.java:326) ... 4 more Cordialement, -Message d'origine- De : msaunier [mailto:msaun...@citya.com] Envoyé : lundi 23 avril 2018 14:12 À : solr-user@lucene.apache.org Objet : SolrCloud DIH (Data Import Handler) MySQL 404 Hello, I have a problem with DIH in SolrCloud. I don't understand why, so I need your help. Solr 6.6 in Cloud. ## COMMAND: curl http://srv-formation-solr:8983/solr/test_dih?command=full-import RESULT: Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/test_dih. Reason: Not Found ## CONFIG: 1. I have create with the command the .system collection 2. I have post in the blob the DataImportHandler jar file and the MySQL connector jar 3. I have add data-import-handler and mysql-connector-java runtimeLib on the configoverlay.json file with the API 4. I have create the DIH folder on the cloud with zkcli.sh script 5. I have push with zkcli the DIH .xml configuration file CONFIGOVERLAY CONTENT : { "runtimeLib":{ "mysql-connector-java":{ "name":"mysql-connector-java", "version":1}, "data-import-handler":{ "name":"data-import-handler", "version":1}}, "requestHandler":{"/test_dih":{ "name":"/test_dih", "class":"org.apache.solr.handler.dataimport.DataImportHandler", "runtimeLib":true, "version":1, "defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}} } Thanks for your help

SolrCloud DIH (Data Import Handler) MySQL 404

2018-04-23 Thread msaunier

Hello, I have a problem with DIH in SolrCloud. I don't understand why, so I need your help. Solr 6.6 in Cloud. ## COMMAND: curl http://srv-formation-solr:8983/solr/test_dih?command=full-import RESULT: Error 404 Not Found HTTP ERROR 404 Problem accessing /solr/test_dih. Reason: Not Found ## CONFIG: 1. I have create with the command the .system collection 2. I have post in the blob the DataImportHandler jar file and the MySQL connector jar 3. I have add data-import-handler and mysql-connector-java runtimeLib on the configoverlay.json file with the API 4. I have create the DIH folder on the cloud with zkcli.sh script 5. I have push with zkcli the DIH .xml configuration file CONFIGOVERLAY CONTENT : { "runtimeLib":{ "mysql-connector-java":{ "name":"mysql-connector-java", "version":1}, "data-import-handler":{ "name":"data-import-handler", "version":1}}, "requestHandler":{"/test_dih":{ "name":"/test_dih", "class":"org.apache.solr.handler.dataimport.DataImportHandler", "runtimeLib":true, "version":1, "defaults":{"config":"/configs/arguments_test/DIH/indexation_events.xml"}}} } Thanks for your help

Re: Data import batch mode for delta

2018-04-17 Thread Shawn Heisey

On 4/16/2018 7:32 PM, gadelkareem wrote: I cannot complain cuz it actually worked well for me so far but.. I still do not understand if Solr already paginates the results from the full import, why not do the same for the delta. It is almost the same query: `select id from t where t.lastmod > ${solrTime}` `select * from t where id IN ${dataimporter.ids} limit 1000 offset 0` and so on.. Solr does not paginate SQL queries made by the dataimport handler (DIH). It sends the query exactly as it is configured in the DIH config. Thanks, Shawn

Re: Data import batch mode for delta

2018-04-16 Thread gadelkareem

Thanks Shawn. I cannot complain cuz it actually worked well for me so far but.. I still do not understand if Solr already paginates the results from the full import, why not do the same for the delta. It is almost the same query: `select id from t where t.lastmod > ${solrTime}` `select * from t where id IN ${dataimporter.ids} limit 1000 offset 0` and so on.. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Data import batch mode for delta

2018-04-05 Thread Shawn Heisey

On 4/5/2018 7:31 PM, gadelkareem wrote: Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of something like where id IN ('${dataimporter.id})' Because there's only one value for that property. If the deltaQuery returns a million rows, then deltaImportQuery is going to be executed a million times. Once for each row returned by the deltaQuery. That IS as inefficient as it sounds. Think of the dataimport handler as a stop-gap solution -- to help you get started with loading data from a database, until you can write a proper application to do your indexing. Thanks, Shawn

Data import batch mode for delta

2018-04-05 Thread gadelkareem

Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of something like where id IN ('${dataimporter.id})' -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: data import class not found

2017-08-31 Thread Steve Pruitt

I just tried putting the solr-dataimporthandler-6.6.0.jar in server/solr/lib and I got past the problem. I still don't understand why not found in /dist -Original Message- From: Steve Pruitt [mailto:bpru...@opentext.com] Sent: Thursday, August 31, 2017 3:05 PM To: solr-user@lucene.apache.org Subject: [EXTERNAL] - data import class not found I still can't understand how Solr establishes the classpath. I have a custom entity processor that subclasses EntityProcessorBase. When I execute the /dataimport call I get java.lang.NoClassDefFoundError: org/apache/solr/handler/dataimport/EntityProcessorBase no matter how I state in solrconfig.xml to locate the solr-dataimporthandler jar. I have tried: from the existing libs in solrconfig.xml from the Ref Guide try anything But, I always get the class not found error. The DataImportHandler class is found when Solr starts, since EntityProcessorBase is in the same jar why is it not found. I have not tried putting in the core's lib thinking the above should work. Of course, the 3rd choice is only an experiment. Thanks. -S

data import class not found

2017-08-31 Thread Steve Pruitt

I still can't understand how Solr establishes the classpath. I have a custom entity processor that subclasses EntityProcessorBase. When I execute the /dataimport call I get java.lang.NoClassDefFoundError: org/apache/solr/handler/dataimport/EntityProcessorBase no matter how I state in solrconfig.xml to locate the solr-dataimporthandler jar. I have tried: from the existing libs in solrconfig.xml from the Ref Guide try anything But, I always get the class not found error. The DataImportHandler class is found when Solr starts, since EntityProcessorBase is in the same jar why is it not found. I have not tried putting in the core's lib thinking the above should work. Of course, the 3rd choice is only an experiment. Thanks. -S

SOLR 4.10 Data import error

2017-06-29 Thread Ghorpade, Parinita

Hi, I am getting following error , when I index data using Dataimporter. I am using File Data source in the data config file here is the config file The error is: ERROR org.apache.solr.handler.dataimport.DocBuilder: Exception while processing: f document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml (resolved to: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:286) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:502) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml (resolved to: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:127) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:86) at org.apache.solr.handler.dataimport.FileDataSource.getData(FileDataSource.java:48) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283) ... 11 more Caused by: java.io.FileNotFoundException: Could not find file: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml (resolved to: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:123) ... 14 more ERROR org.apache.solr.handler.dataimport.DataImporter: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml (resolved to: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml (resolved to: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_3ALianhuanhua-?_3A1.xml at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.RuntimeException: java.io.FileNotFoundException: Could not find file: /opt/xml_content/prrla/dc/moana/lianhuanhua_storybook/oai_

Re: Solr Search Problem with Multiple Data-Import Handler

2017-06-22 Thread Josh Lincoln

I suspect Erik's right that clean=true is the problem. That's the default in the DIH interface. I find that when using DIH, it's best to set preImportDeleteQuery for every entity. This safely scopes the clean variable to just that entity. It doesn't look like the docs have examples of using preImportDeleteQuery, so I put one here: On Wed, Jun 21, 2017 at 7:48 PM Erick Erickson wrote: > First place I'd look is whether the jobs have clean=true set. If so the > first thing DIH does is delete all documents. > > Best, > Erick > > On Wed, Jun 21, 2017 at 3:52 PM, Pandey Brahmdev > wrote: > > > Hi, > > I have setup Apache Solr 6.6.0 on Windows 10, 64-bit. > > > > I have created a simple core & configured DataImport Handlers. > > I have configured 2 dataImport handlers in the Solr-config.xml file. > > > > First for to connect to DB & have data from DB Tables. > > And Second for to have data from all pdf files using TikaEntityProcessor. > > > > Now the problem is there is no error in the console or anywhere but > > whenever I want to search using "Query" tab it gives me the result of > Data > > Import. > > > > So let's say if I last Imported data for Tables then it gives me to > result > > from the table and if I imported PDF Files then it searches inside PDF > > Files. > > > > But now when I again want to search for DB Tables values then It doesn't > > give me the result instead I again need to Import Data for > > DataImportHandler for File & vice-versa. > > > > Can you please help me out here? > > Very sorry if I am doing anything wrong as I have started using Apache > Solr > > only 2 days back. > > > > Thanks & Regards, > > Brahmdev Pandey > > +46 767086309 <+46%2076%20708%2063%2009> > > >

Re: Solr Search Problem with Multiple Data-Import Handler

2017-06-21 Thread Erick Erickson

First place I'd look is whether the jobs have clean=true set. If so the first thing DIH does is delete all documents. Best, Erick On Wed, Jun 21, 2017 at 3:52 PM, Pandey Brahmdev wrote: > Hi, > I have setup Apache Solr 6.6.0 on Windows 10, 64-bit. > > I have created a simple core & configured DataImport Handlers. > I have configured 2 dataImport handlers in the Solr-config.xml file. > > First for to connect to DB & have data from DB Tables. > And Second for to have data from all pdf files using TikaEntityProcessor. > > Now the problem is there is no error in the console or anywhere but > whenever I want to search using "Query" tab it gives me the result of Data > Import. > > So let's say if I last Imported data for Tables then it gives me to result > from the table and if I imported PDF Files then it searches inside PDF > Files. > > But now when I again want to search for DB Tables values then It doesn't > give me the result instead I again need to Import Data for > DataImportHandler for File & vice-versa. > > Can you please help me out here? > Very sorry if I am doing anything wrong as I have started using Apache Solr > only 2 days back. > > Thanks & Regards, > Brahmdev Pandey > +46 767086309 >

Solr Search Problem with Multiple Data-Import Handler

2017-06-21 Thread Pandey Brahmdev

Hi, I have setup Apache Solr 6.6.0 on Windows 10, 64-bit. I have created a simple core & configured DataImport Handlers. I have configured 2 dataImport handlers in the Solr-config.xml file. First for to connect to DB & have data from DB Tables. And Second for to have data from all pdf files using TikaEntityProcessor. Now the problem is there is no error in the console or anywhere but whenever I want to search using "Query" tab it gives me the result of Data Import. So let's say if I last Imported data for Tables then it gives me to result from the table and if I imported PDF Files then it searches inside PDF Files. But now when I again want to search for DB Tables values then It doesn't give me the result instead I again need to Import Data for DataImportHandler for File & vice-versa. Can you please help me out here? Very sorry if I am doing anything wrong as I have started using Apache Solr only 2 days back. Thanks & Regards, Brahmdev Pandey +46 767086309

Data import handler and no status in web-ui

2017-06-06 Thread Thomas Porschberg

Hi, I use DIH in solr-cloud mode (implicit route) in solr6.5.1. When I start the import it works fine and I see the progress in the logfile. However, when I click the "Refresh Status" button in the web-ui while the import is running I only see "No information available (idle)". So I have to look in the logfile the observe when the import was finished. In the old solr, non-cloud and non-partitioned, there was a hourglass while the import was running. Any idea? Best regards Thomas

RE: Using the Data Import Handler with SQLite

2017-05-23 Thread Dheeraj Kumar Karnati

Hi Zac, I think you have added entity closing tag 2 times. that might be causing an issue. It been a long time . not sure whether you are still working on it or not. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-the-Data-Import-Handler-with-SQLite-tp2765655p4336690.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Version conflict during data import from another Solr instance into clean Solr

2017-04-25 Thread deansg

Hi, I ran into the same problem. Chris' first solution worked for us, however the second solution on its own doesn't work, as the conflict error arises before the update processors' code is even reached. However, creating an alias for the _version_ field in the dataconfig file, together with an update processor that removes the temporary field (and possibly other unwanted fields) seemed to work great for us. -- View this message in context: http://lucene.472066.n3.nabble.com/Version-conflict-during-data-import-from-another-Solr-instance-into-clean-Solr-tp4046937p4331876.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data Import

2017-03-17 Thread Mike Thomsen

If Solr is down, then adding through SolrJ would fail as well. Kafka's new API has some great features for this sort of thing. The new client API is designed to be run in a long-running loop where you poll for new messages with a certain amount of defined timeout (ex: consumer.poll(1000) for 1s) So if Solr becomes unstable or goes down, it's easy to have the consumer just stop and either wait until Solr comes back up or save the data to disk/commit the Kafka offsets to ZK and stop running. On Fri, Mar 17, 2017 at 1:24 PM, OTH wrote: > Are Kafka and SQS interchangeable? (The latter does not seem to be free.) > > @Wunder: > I'm assuming, that updating to Solr would fail if Solr is unavailable not > just if posting via say a DB trigger, but probably also if trying to post > through SolrJ? (Which is what I'm using for now.) So, even if using > SolrJ, it would be a good idea to use a queuing software? > > Thanks > > On Fri, Mar 17, 2017 at 10:12 PM, vishal jain wrote: > > > Streaming the data through kafka would be a good option if near real time > > data indexing is the key requirement. > > In our application the RDBMS data is populated by an ETL job periodically > > so we don't need real time data indexing for now. > > > > Cheers, > > Vishal > > > > On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > Or set a trigger on your RDBMS's main table to put the relevant > > > information in a different table (call it EVENTS) and have your SolrJ > > > consult the EVENTS table periodically. Essentially you're using the > > > EVENTS table as a queue where the trigger is the producer and the > > > SolrJ program is the consumer. > > > > > > It's a polling solution though, so not event-driven. There's no > > > mechanism that I know of have, say, your RDBMS push an event to DIH > > > for instance. > > > > > > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka) > > > for this kind of problem.. > > > > > > Best, > > > Erick > > > > > > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch > > > wrote: > > > > One assumes by hooking into the same code that updates RDBMS, as > > > > opposed to be reverse engineering the changes from looking at the DB > > > > content. This would be especially the case for Delete changes. > > > > > > > > Regards, > > > >Alex. > > > > > > > > http://www.solr-start.com/ - Resources for Solr users, new and > > > experienced > > > > > > > > > > > > On 17 March 2017 at 11:37, OTH wrote: > > > >>> > > > >>> Also, solrj is good when you want your RDBMS updates make > immediately > > > >>> available in solr. > > > >> > > > >> How can SolrJ be used to make RDBMS updates immediately available? > > > >> Thanks > > > >> > > > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar < > > > sujaybawas...@gmail.com> > > > >> wrote: > > > >> > > > >>> Hi Vishal, > > > >>> > > > >>> As per my experience DIH is the best for RDBMS to solr index. DIH > > with > > > >>> caching has best performance. DIH nested entities allow you to > define > > > >>> simple queries. > > > >>> Also, solrj is good when you want your RDBMS updates make > immediately > > > >>> available in solr. DIH full import can be used for index all data > > first > > > >>> time or restore index in case index is corrupted. > > > >>> > > > >>> Thanks, > > > >>> Sujay > > > >>> > > > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain > > > wrote: > > > >>> > > > >>> > Hi, > > > >>> > > > > >>> > > > > >>> > I am new to Solr and am trying to move data from my RDBMS to > Solr. > > I > > > know > > > >>> > the available options are: > > > >>> > 1) Post Tool > > > >>> > 2) DIH > > > >>> > 3) SolrJ (as ours is a J2EE application). > > > >>> > > > > >>> > I want to know what is the recommended way for Data import in > > > production > > > >>> > environment. > > > >>> > Will sending data via SolrJ in batches be faster than posting a > csv > > > using > > > >>> > POST tool? > > > >>> > > > > >>> > > > > >>> > Thanks, > > > >>> > Vishal > > > >>> > > > > >>> > > > >>> > > > >>> > > > >>> -- > > > >>> Thanks, > > > >>> Sujay P Bawaskar > > > >>> M:+91-77091 53669 > > > >>> > > > > > >

Re: Data Import

2017-03-17 Thread OTH

Are Kafka and SQS interchangeable? (The latter does not seem to be free.) @Wunder: I'm assuming, that updating to Solr would fail if Solr is unavailable not just if posting via say a DB trigger, but probably also if trying to post through SolrJ? (Which is what I'm using for now.) So, even if using SolrJ, it would be a good idea to use a queuing software? Thanks On Fri, Mar 17, 2017 at 10:12 PM, vishal jain wrote: > Streaming the data through kafka would be a good option if near real time > data indexing is the key requirement. > In our application the RDBMS data is populated by an ETL job periodically > so we don't need real time data indexing for now. > > Cheers, > Vishal > > On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson > wrote: > > > Or set a trigger on your RDBMS's main table to put the relevant > > information in a different table (call it EVENTS) and have your SolrJ > > consult the EVENTS table periodically. Essentially you're using the > > EVENTS table as a queue where the trigger is the producer and the > > SolrJ program is the consumer. > > > > It's a polling solution though, so not event-driven. There's no > > mechanism that I know of have, say, your RDBMS push an event to DIH > > for instance. > > > > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka) > > for this kind of problem.. > > > > Best, > > Erick > > > > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch > > wrote: > > > One assumes by hooking into the same code that updates RDBMS, as > > > opposed to be reverse engineering the changes from looking at the DB > > > content. This would be especially the case for Delete changes. > > > > > > Regards, > > >Alex. > > > > > > http://www.solr-start.com/ - Resources for Solr users, new and > > experienced > > > > > > > > > On 17 March 2017 at 11:37, OTH wrote: > > >>> > > >>> Also, solrj is good when you want your RDBMS updates make immediately > > >>> available in solr. > > >> > > >> How can SolrJ be used to make RDBMS updates immediately available? > > >> Thanks > > >> > > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar < > > sujaybawas...@gmail.com> > > >> wrote: > > >> > > >>> Hi Vishal, > > >>> > > >>> As per my experience DIH is the best for RDBMS to solr index. DIH > with > > >>> caching has best performance. DIH nested entities allow you to define > > >>> simple queries. > > >>> Also, solrj is good when you want your RDBMS updates make immediately > > >>> available in solr. DIH full import can be used for index all data > first > > >>> time or restore index in case index is corrupted. > > >>> > > >>> Thanks, > > >>> Sujay > > >>> > > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain > > wrote: > > >>> > > >>> > Hi, > > >>> > > > >>> > > > >>> > I am new to Solr and am trying to move data from my RDBMS to Solr. > I > > know > > >>> > the available options are: > > >>> > 1) Post Tool > > >>> > 2) DIH > > >>> > 3) SolrJ (as ours is a J2EE application). > > >>> > > > >>> > I want to know what is the recommended way for Data import in > > production > > >>> > environment. > > >>> > Will sending data via SolrJ in batches be faster than posting a csv > > using > > >>> > POST tool? > > >>> > > > >>> > > > >>> > Thanks, > > >>> > Vishal > > >>> > > > >>> > > >>> > > >>> > > >>> -- > > >>> Thanks, > > >>> Sujay P Bawaskar > > >>> M:+91-77091 53669 > > >>> > > >

RE: Data Import

2017-03-17 Thread Liu, Daphne

NO, I use the free version. I have the driver from someone else. I can share it if you want to use Cassandra. They have modified it for me since the free JDBC driver I found will timeout when the document is greater than 16mb. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / daphne@cevalogistics.com -Original Message- From: vishal jain [mailto:jain02...@gmail.com] Sent: Friday, March 17, 2017 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Data Import Hi Daphne, Are you using DSE? Thanks & Regards, Vishal On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne wrote: > I just want to share my recent project. I have successfully sent all > our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import > JDBC Cassandra connector indexing our documents. > Since Cassandra is so fast for writing, compression rate is around 13% > and all my documents can be keep in my Cassandra clusters' memory, we > are very happy with the result. > > > Kind regards, > > Daphne Liu > BI Architect - Matrix SCM > > CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL > 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / > daphne@cevalogistics.com > > > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Friday, March 17, 2017 9:54 AM > To: solr-user > Subject: Re: Data Import > > I feel DIH is much better for prototyping, even though people do use > it in production. If you do want to use DIH, you may benefit from > reviewing the DIH-DB example I am currently rewriting in > https://issues.apache.org/jira/browse/SOLR-10312 (may need to change > luceneMatchVersion in solrconfig.xml first). > > CSV, etc, could be useful if you want to keep history of past imports, > again useful during development, as you evolve schema. > > SolrJ may actually be easiest/best for production since you already > have Java stack. > > The choice is yours in the end. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > On 17 March 2017 at 08:56, Shawn Heisey wrote: > > On 3/17/2017 3:04 AM, vishal jain wrote: > >> I am new to Solr and am trying to move data from my RDBMS to Solr. > >> I > know the available options are: > >> 1) Post Tool > >> 2) DIH > >> 3) SolrJ (as ours is a J2EE application). > >> > >> I want to know what is the recommended way for Data import in > >> production environment. Will sending data via SolrJ in batches be > faster than posting a csv using POST tool? > > > > I've heard that CSV import runs EXTREMELY fast, but I have never > > tested it. The same threading problem that I discuss below would > > apply to indexing this way. > > > > DIH is extremely powerful, but it has one glaring problem: It's > > single-threaded, which means that only one stream of data is going > > into Solr, and each batch of documents to be inserted must wait for > > the previous one to finish inserting before it can start. I do not > > know if DIH batches documents or sends them in one at a time. If > > you have a manually sharded index, you can run DIH on each shard in > > parallel, but each one will be single-threaded. That single thread > > is pretty efficient, but it's still only one thread. > > > > Sending multiple index updates to Solr in parallel (multi-threading) > > is how you radically speed up the Solr part of indexing. This is > > usually done with a custom indexing program, which might be written > > with SolrJ or even in a completely different language. > > > > One thing to keep in mind with ANY indexing method: Once the > > situation is examined closely, most people find that it's not Solr > > that makes their indexing slow. The bottleneck is usually the > > source system -- how quickly the data can be retrieved. It usually > > takes a lot longer to obtain the data than it does for Solr to index it. > > > > Thanks, > > Shawn > > > This e-mail message is intended for the above named recipient(s) only. > It may contain confidential information that is privileged. If you are > not the intended recipient, you are hereby notified that any > dissemination, distribution or copying of this e-mail and any > attachment(s) is strictly prohibited. If you have received this e-mail > by error, please immediately notify the sender by replying to this > e-mail and deleting the message including

Re: Data Import

2017-03-17 Thread vishal jain

Streaming the data through kafka would be a good option if near real time data indexing is the key requirement. In our application the RDBMS data is populated by an ETL job periodically so we don't need real time data indexing for now. Cheers, Vishal On Fri, Mar 17, 2017 at 10:30 PM, Erick Erickson wrote: > Or set a trigger on your RDBMS's main table to put the relevant > information in a different table (call it EVENTS) and have your SolrJ > consult the EVENTS table periodically. Essentially you're using the > EVENTS table as a queue where the trigger is the producer and the > SolrJ program is the consumer. > > It's a polling solution though, so not event-driven. There's no > mechanism that I know of have, say, your RDBMS push an event to DIH > for instance. > > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka) > for this kind of problem.. > > Best, > Erick > > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch > wrote: > > One assumes by hooking into the same code that updates RDBMS, as > > opposed to be reverse engineering the changes from looking at the DB > > content. This would be especially the case for Delete changes. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > > On 17 March 2017 at 11:37, OTH wrote: > >>> > >>> Also, solrj is good when you want your RDBMS updates make immediately > >>> available in solr. > >> > >> How can SolrJ be used to make RDBMS updates immediately available? > >> Thanks > >> > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar < > sujaybawas...@gmail.com> > >> wrote: > >> > >>> Hi Vishal, > >>> > >>> As per my experience DIH is the best for RDBMS to solr index. DIH with > >>> caching has best performance. DIH nested entities allow you to define > >>> simple queries. > >>> Also, solrj is good when you want your RDBMS updates make immediately > >>> available in solr. DIH full import can be used for index all data first > >>> time or restore index in case index is corrupted. > >>> > >>> Thanks, > >>> Sujay > >>> > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain > wrote: > >>> > >>> > Hi, > >>> > > >>> > > >>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I > know > >>> > the available options are: > >>> > 1) Post Tool > >>> > 2) DIH > >>> > 3) SolrJ (as ours is a J2EE application). > >>> > > >>> > I want to know what is the recommended way for Data import in > production > >>> > environment. > >>> > Will sending data via SolrJ in batches be faster than posting a csv > using > >>> > POST tool? > >>> > > >>> > > >>> > Thanks, > >>> > Vishal > >>> > > >>> > >>> > >>> > >>> -- > >>> Thanks, > >>> Sujay P Bawaskar > >>> M:+91-77091 53669 > >>> >

Re: Data Import

2017-03-17 Thread Walter Underwood

That fails if Solr is not available. To avoid dropping updates, you need some kind of persistent queue. We use Amazon SQS for our incremental updates. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 17, 2017, at 10:09 AM, OTH wrote: > > Could the database trigger not just post the change to solr? > > On Fri, Mar 17, 2017 at 10:00 PM, Erick Erickson > wrote: > >> Or set a trigger on your RDBMS's main table to put the relevant >> information in a different table (call it EVENTS) and have your SolrJ >> consult the EVENTS table periodically. Essentially you're using the >> EVENTS table as a queue where the trigger is the producer and the >> SolrJ program is the consumer. >> >> It's a polling solution though, so not event-driven. There's no >> mechanism that I know of have, say, your RDBMS push an event to DIH >> for instance. >> >> Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka) >> for this kind of problem.. >> >> Best, >> Erick >> >> On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch >> wrote: >>> One assumes by hooking into the same code that updates RDBMS, as >>> opposed to be reverse engineering the changes from looking at the DB >>> content. This would be especially the case for Delete changes. >>> >>> Regards, >>> Alex. >>> >>> http://www.solr-start.com/ - Resources for Solr users, new and >> experienced >>> >>> >>> On 17 March 2017 at 11:37, OTH wrote: >>>>> >>>>> Also, solrj is good when you want your RDBMS updates make immediately >>>>> available in solr. >>>> >>>> How can SolrJ be used to make RDBMS updates immediately available? >>>> Thanks >>>> >>>> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar < >> sujaybawas...@gmail.com> >>>> wrote: >>>> >>>>> Hi Vishal, >>>>> >>>>> As per my experience DIH is the best for RDBMS to solr index. DIH with >>>>> caching has best performance. DIH nested entities allow you to define >>>>> simple queries. >>>>> Also, solrj is good when you want your RDBMS updates make immediately >>>>> available in solr. DIH full import can be used for index all data first >>>>> time or restore index in case index is corrupted. >>>>> >>>>> Thanks, >>>>> Sujay >>>>> >>>>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain >> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> I am new to Solr and am trying to move data from my RDBMS to Solr. I >> know >>>>>> the available options are: >>>>>> 1) Post Tool >>>>>> 2) DIH >>>>>> 3) SolrJ (as ours is a J2EE application). >>>>>> >>>>>> I want to know what is the recommended way for Data import in >> production >>>>>> environment. >>>>>> Will sending data via SolrJ in batches be faster than posting a csv >> using >>>>>> POST tool? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Vishal >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Sujay P Bawaskar >>>>> M:+91-77091 53669 >>>>> >>

Re: Data Import

2017-03-17 Thread vishal jain

Hi Daphne, Are you using DSE? Thanks & Regards, Vishal On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne wrote: > I just want to share my recent project. I have successfully sent all our > EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import JDBC > Cassandra connector indexing our documents. > Since Cassandra is so fast for writing, compression rate is around 13% and > all my documents can be keep in my Cassandra clusters' memory, we are very > happy with the result. > > > Kind regards, > > Daphne Liu > BI Architect - Matrix SCM > > CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL > 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / > daphne@cevalogistics.com > > > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Friday, March 17, 2017 9:54 AM > To: solr-user > Subject: Re: Data Import > > I feel DIH is much better for prototyping, even though people do use it in > production. If you do want to use DIH, you may benefit from reviewing the > DIH-DB example I am currently rewriting in > https://issues.apache.org/jira/browse/SOLR-10312 (may need to change > luceneMatchVersion in solrconfig.xml first). > > CSV, etc, could be useful if you want to keep history of past imports, > again useful during development, as you evolve schema. > > SolrJ may actually be easiest/best for production since you already have > Java stack. > > The choice is yours in the end. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 17 March 2017 at 08:56, Shawn Heisey wrote: > > On 3/17/2017 3:04 AM, vishal jain wrote: > >> I am new to Solr and am trying to move data from my RDBMS to Solr. I > know the available options are: > >> 1) Post Tool > >> 2) DIH > >> 3) SolrJ (as ours is a J2EE application). > >> > >> I want to know what is the recommended way for Data import in > >> production environment. Will sending data via SolrJ in batches be > faster than posting a csv using POST tool? > > > > I've heard that CSV import runs EXTREMELY fast, but I have never > > tested it. The same threading problem that I discuss below would > > apply to indexing this way. > > > > DIH is extremely powerful, but it has one glaring problem: It's > > single-threaded, which means that only one stream of data is going > > into Solr, and each batch of documents to be inserted must wait for > > the previous one to finish inserting before it can start. I do not > > know if DIH batches documents or sends them in one at a time. If you > > have a manually sharded index, you can run DIH on each shard in > > parallel, but each one will be single-threaded. That single thread is > > pretty efficient, but it's still only one thread. > > > > Sending multiple index updates to Solr in parallel (multi-threading) > > is how you radically speed up the Solr part of indexing. This is > > usually done with a custom indexing program, which might be written > > with SolrJ or even in a completely different language. > > > > One thing to keep in mind with ANY indexing method: Once the > > situation is examined closely, most people find that it's not Solr > > that makes their indexing slow. The bottleneck is usually the source > > system -- how quickly the data can be retrieved. It usually takes a > > lot longer to obtain the data than it does for Solr to index it. > > > > Thanks, > > Shawn > > > This e-mail message is intended for the above named recipient(s) only. It > may contain confidential information that is privileged. If you are not the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this e-mail and any attachment(s) is strictly > prohibited. If you have received this e-mail by error, please immediately > notify the sender by replying to this e-mail and deleting the message > including any attachment(s) from your system. Thank you in advance for your > cooperation and assistance. Although the company has taken reasonable > precautions to ensure no viruses are present in this email, the company > cannot accept responsibility for any loss or damage arising from the use of > this email or attachments. >

Re: Data Import

2017-03-17 Thread OTH

Could the database trigger not just post the change to solr? On Fri, Mar 17, 2017 at 10:00 PM, Erick Erickson wrote: > Or set a trigger on your RDBMS's main table to put the relevant > information in a different table (call it EVENTS) and have your SolrJ > consult the EVENTS table periodically. Essentially you're using the > EVENTS table as a queue where the trigger is the producer and the > SolrJ program is the consumer. > > It's a polling solution though, so not event-driven. There's no > mechanism that I know of have, say, your RDBMS push an event to DIH > for instance. > > Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka) > for this kind of problem.. > > Best, > Erick > > On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch > wrote: > > One assumes by hooking into the same code that updates RDBMS, as > > opposed to be reverse engineering the changes from looking at the DB > > content. This would be especially the case for Delete changes. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and > experienced > > > > > > On 17 March 2017 at 11:37, OTH wrote: > >>> > >>> Also, solrj is good when you want your RDBMS updates make immediately > >>> available in solr. > >> > >> How can SolrJ be used to make RDBMS updates immediately available? > >> Thanks > >> > >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar < > sujaybawas...@gmail.com> > >> wrote: > >> > >>> Hi Vishal, > >>> > >>> As per my experience DIH is the best for RDBMS to solr index. DIH with > >>> caching has best performance. DIH nested entities allow you to define > >>> simple queries. > >>> Also, solrj is good when you want your RDBMS updates make immediately > >>> available in solr. DIH full import can be used for index all data first > >>> time or restore index in case index is corrupted. > >>> > >>> Thanks, > >>> Sujay > >>> > >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain > wrote: > >>> > >>> > Hi, > >>> > > >>> > > >>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I > know > >>> > the available options are: > >>> > 1) Post Tool > >>> > 2) DIH > >>> > 3) SolrJ (as ours is a J2EE application). > >>> > > >>> > I want to know what is the recommended way for Data import in > production > >>> > environment. > >>> > Will sending data via SolrJ in batches be faster than posting a csv > using > >>> > POST tool? > >>> > > >>> > > >>> > Thanks, > >>> > Vishal > >>> > > >>> > >>> > >>> > >>> -- > >>> Thanks, > >>> Sujay P Bawaskar > >>> M:+91-77091 53669 > >>> >

Re: Data Import

2017-03-17 Thread Erick Erickson

Or set a trigger on your RDBMS's main table to put the relevant information in a different table (call it EVENTS) and have your SolrJ consult the EVENTS table periodically. Essentially you're using the EVENTS table as a queue where the trigger is the producer and the SolrJ program is the consumer. It's a polling solution though, so not event-driven. There's no mechanism that I know of have, say, your RDBMS push an event to DIH for instance. Hmmm, I do wonder if anyone's done anything with queueing (e.g. Kafka) for this kind of problem.. Best, Erick On Fri, Mar 17, 2017 at 8:41 AM, Alexandre Rafalovitch wrote: > One assumes by hooking into the same code that updates RDBMS, as > opposed to be reverse engineering the changes from looking at the DB > content. This would be especially the case for Delete changes. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 17 March 2017 at 11:37, OTH wrote: >>> >>> Also, solrj is good when you want your RDBMS updates make immediately >>> available in solr. >> >> How can SolrJ be used to make RDBMS updates immediately available? >> Thanks >> >> On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar >> wrote: >> >>> Hi Vishal, >>> >>> As per my experience DIH is the best for RDBMS to solr index. DIH with >>> caching has best performance. DIH nested entities allow you to define >>> simple queries. >>> Also, solrj is good when you want your RDBMS updates make immediately >>> available in solr. DIH full import can be used for index all data first >>> time or restore index in case index is corrupted. >>> >>> Thanks, >>> Sujay >>> >>> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain wrote: >>> >>> > Hi, >>> > >>> > >>> > I am new to Solr and am trying to move data from my RDBMS to Solr. I know >>> > the available options are: >>> > 1) Post Tool >>> > 2) DIH >>> > 3) SolrJ (as ours is a J2EE application). >>> > >>> > I want to know what is the recommended way for Data import in production >>> > environment. >>> > Will sending data via SolrJ in batches be faster than posting a csv using >>> > POST tool? >>> > >>> > >>> > Thanks, >>> > Vishal >>> > >>> >>> >>> >>> -- >>> Thanks, >>> Sujay P Bawaskar >>> M:+91-77091 53669 >>>

Re: Data Import

2017-03-17 Thread vishal jain

Thanks to all of you for the valuable inputs. Being on J2ee platform I also felt using solrJ in a multi threaded environment would be a better choice to index RDBMS data into SolrCloud. I will try with a scheduler triggered micro service to do the job using SolrJ. Regards, Vishal On Fri, Mar 17, 2017 at 9:11 PM, Alexandre Rafalovitch wrote: > One assumes by hooking into the same code that updates RDBMS, as > opposed to be reverse engineering the changes from looking at the DB > content. This would be especially the case for Delete changes. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 17 March 2017 at 11:37, OTH wrote: > >> > >> Also, solrj is good when you want your RDBMS updates make immediately > >> available in solr. > > > > How can SolrJ be used to make RDBMS updates immediately available? > > Thanks > > > > On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar > > > wrote: > > > >> Hi Vishal, > >> > >> As per my experience DIH is the best for RDBMS to solr index. DIH with > >> caching has best performance. DIH nested entities allow you to define > >> simple queries. > >> Also, solrj is good when you want your RDBMS updates make immediately > >> available in solr. DIH full import can be used for index all data first > >> time or restore index in case index is corrupted. > >> > >> Thanks, > >> Sujay > >> > >> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain > wrote: > >> > >> > Hi, > >> > > >> > > >> > I am new to Solr and am trying to move data from my RDBMS to Solr. I > know > >> > the available options are: > >> > 1) Post Tool > >> > 2) DIH > >> > 3) SolrJ (as ours is a J2EE application). > >> > > >> > I want to know what is the recommended way for Data import in > production > >> > environment. > >> > Will sending data via SolrJ in batches be faster than posting a csv > using > >> > POST tool? > >> > > >> > > >> > Thanks, > >> > Vishal > >> > > >> > >> > >> > >> -- > >> Thanks, > >> Sujay P Bawaskar > >> M:+91-77091 53669 > >> >

Re: Data Import

2017-03-17 Thread Alexandre Rafalovitch

One assumes by hooking into the same code that updates RDBMS, as opposed to be reverse engineering the changes from looking at the DB content. This would be especially the case for Delete changes. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 17 March 2017 at 11:37, OTH wrote: >> >> Also, solrj is good when you want your RDBMS updates make immediately >> available in solr. > > How can SolrJ be used to make RDBMS updates immediately available? > Thanks > > On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar > wrote: > >> Hi Vishal, >> >> As per my experience DIH is the best for RDBMS to solr index. DIH with >> caching has best performance. DIH nested entities allow you to define >> simple queries. >> Also, solrj is good when you want your RDBMS updates make immediately >> available in solr. DIH full import can be used for index all data first >> time or restore index in case index is corrupted. >> >> Thanks, >> Sujay >> >> On Fri, Mar 17, 2017 at 2:34 PM, vishal jain wrote: >> >> > Hi, >> > >> > >> > I am new to Solr and am trying to move data from my RDBMS to Solr. I know >> > the available options are: >> > 1) Post Tool >> > 2) DIH >> > 3) SolrJ (as ours is a J2EE application). >> > >> > I want to know what is the recommended way for Data import in production >> > environment. >> > Will sending data via SolrJ in batches be faster than posting a csv using >> > POST tool? >> > >> > >> > Thanks, >> > Vishal >> > >> >> >> >> -- >> Thanks, >> Sujay P Bawaskar >> M:+91-77091 53669 >>

Re: Data Import

2017-03-17 Thread OTH

> > Also, solrj is good when you want your RDBMS updates make immediately > available in solr. How can SolrJ be used to make RDBMS updates immediately available? Thanks On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar wrote: > Hi Vishal, > > As per my experience DIH is the best for RDBMS to solr index. DIH with > caching has best performance. DIH nested entities allow you to define > simple queries. > Also, solrj is good when you want your RDBMS updates make immediately > available in solr. DIH full import can be used for index all data first > time or restore index in case index is corrupted. > > Thanks, > Sujay > > On Fri, Mar 17, 2017 at 2:34 PM, vishal jain wrote: > > > Hi, > > > > > > I am new to Solr and am trying to move data from my RDBMS to Solr. I know > > the available options are: > > 1) Post Tool > > 2) DIH > > 3) SolrJ (as ours is a J2EE application). > > > > I want to know what is the recommended way for Data import in production > > environment. > > Will sending data via SolrJ in batches be faster than posting a csv using > > POST tool? > > > > > > Thanks, > > Vishal > > > > > > -- > Thanks, > Sujay P Bawaskar > M:+91-77091 53669 >

RE: Data Import

2017-03-17 Thread Liu, Daphne

I just want to share my recent project. I have successfully sent all our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Import JDBC Cassandra connector indexing our documents. Since Cassandra is so fast for writing, compression rate is around 13% and all my documents can be keep in my Cassandra clusters' memory, we are very happy with the result. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / daphne@cevalogistics.com -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, March 17, 2017 9:54 AM To: solr-user Subject: Re: Data Import I feel DIH is much better for prototyping, even though people do use it in production. If you do want to use DIH, you may benefit from reviewing the DIH-DB example I am currently rewriting in https://issues.apache.org/jira/browse/SOLR-10312 (may need to change luceneMatchVersion in solrconfig.xml first). CSV, etc, could be useful if you want to keep history of past imports, again useful during development, as you evolve schema. SolrJ may actually be easiest/best for production since you already have Java stack. The choice is yours in the end. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 17 March 2017 at 08:56, Shawn Heisey wrote: > On 3/17/2017 3:04 AM, vishal jain wrote: >> I am new to Solr and am trying to move data from my RDBMS to Solr. I know >> the available options are: >> 1) Post Tool >> 2) DIH >> 3) SolrJ (as ours is a J2EE application). >> >> I want to know what is the recommended way for Data import in >> production environment. Will sending data via SolrJ in batches be faster >> than posting a csv using POST tool? > > I've heard that CSV import runs EXTREMELY fast, but I have never > tested it. The same threading problem that I discuss below would > apply to indexing this way. > > DIH is extremely powerful, but it has one glaring problem: It's > single-threaded, which means that only one stream of data is going > into Solr, and each batch of documents to be inserted must wait for > the previous one to finish inserting before it can start. I do not > know if DIH batches documents or sends them in one at a time. If you > have a manually sharded index, you can run DIH on each shard in > parallel, but each one will be single-threaded. That single thread is > pretty efficient, but it's still only one thread. > > Sending multiple index updates to Solr in parallel (multi-threading) > is how you radically speed up the Solr part of indexing. This is > usually done with a custom indexing program, which might be written > with SolrJ or even in a completely different language. > > One thing to keep in mind with ANY indexing method: Once the > situation is examined closely, most people find that it's not Solr > that makes their indexing slow. The bottleneck is usually the source > system -- how quickly the data can be retrieved. It usually takes a > lot longer to obtain the data than it does for Solr to index it. > > Thanks, > Shawn > This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

Re: Data Import

2017-03-17 Thread Alexandre Rafalovitch

I feel DIH is much better for prototyping, even though people do use it in production. If you do want to use DIH, you may benefit from reviewing the DIH-DB example I am currently rewriting in https://issues.apache.org/jira/browse/SOLR-10312 (may need to change luceneMatchVersion in solrconfig.xml first). CSV, etc, could be useful if you want to keep history of past imports, again useful during development, as you evolve schema. SolrJ may actually be easiest/best for production since you already have Java stack. The choice is yours in the end. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 17 March 2017 at 08:56, Shawn Heisey wrote: > On 3/17/2017 3:04 AM, vishal jain wrote: >> I am new to Solr and am trying to move data from my RDBMS to Solr. I know >> the available options are: >> 1) Post Tool >> 2) DIH >> 3) SolrJ (as ours is a J2EE application). >> >> I want to know what is the recommended way for Data import in production >> environment. Will sending data via SolrJ in batches be faster than posting a >> csv using POST tool? > > I've heard that CSV import runs EXTREMELY fast, but I have never tested > it. The same threading problem that I discuss below would apply to > indexing this way. > > DIH is extremely powerful, but it has one glaring problem: It's > single-threaded, which means that only one stream of data is going into > Solr, and each batch of documents to be inserted must wait for the > previous one to finish inserting before it can start. I do not know if > DIH batches documents or sends them in one at a time. If you have a > manually sharded index, you can run DIH on each shard in parallel, but > each one will be single-threaded. That single thread is pretty > efficient, but it's still only one thread. > > Sending multiple index updates to Solr in parallel (multi-threading) is > how you radically speed up the Solr part of indexing. This is usually > done with a custom indexing program, which might be written with SolrJ > or even in a completely different language. > > One thing to keep in mind with ANY indexing method: Once the situation > is examined closely, most people find that it's not Solr that makes > their indexing slow. The bottleneck is usually the source system -- how > quickly the data can be retrieved. It usually takes a lot longer to > obtain the data than it does for Solr to index it. > > Thanks, > Shawn >

Re: Data Import

2017-03-17 Thread Shawn Heisey

On 3/17/2017 3:04 AM, vishal jain wrote: > I am new to Solr and am trying to move data from my RDBMS to Solr. I know the > available options are: > 1) Post Tool > 2) DIH > 3) SolrJ (as ours is a J2EE application). > > I want to know what is the recommended way for Data import in production > environment. Will sending data via SolrJ in batches be faster than posting a > csv using POST tool? I've heard that CSV import runs EXTREMELY fast, but I have never tested it. The same threading problem that I discuss below would apply to indexing this way. DIH is extremely powerful, but it has one glaring problem: It's single-threaded, which means that only one stream of data is going into Solr, and each batch of documents to be inserted must wait for the previous one to finish inserting before it can start. I do not know if DIH batches documents or sends them in one at a time. If you have a manually sharded index, you can run DIH on each shard in parallel, but each one will be single-threaded. That single thread is pretty efficient, but it's still only one thread. Sending multiple index updates to Solr in parallel (multi-threading) is how you radically speed up the Solr part of indexing. This is usually done with a custom indexing program, which might be written with SolrJ or even in a completely different language. One thing to keep in mind with ANY indexing method: Once the situation is examined closely, most people find that it's not Solr that makes their indexing slow. The bottleneck is usually the source system -- how quickly the data can be retrieved. It usually takes a lot longer to obtain the data than it does for Solr to index it. Thanks, Shawn

Re: Data Import

2017-03-17 Thread Sujay Bawaskar

Hi Vishal, As per my experience DIH is the best for RDBMS to solr index. DIH with caching has best performance. DIH nested entities allow you to define simple queries. Also, solrj is good when you want your RDBMS updates make immediately available in solr. DIH full import can be used for index all data first time or restore index in case index is corrupted. Thanks, Sujay On Fri, Mar 17, 2017 at 2:34 PM, vishal jain wrote: > Hi, > > > I am new to Solr and am trying to move data from my RDBMS to Solr. I know > the available options are: > 1) Post Tool > 2) DIH > 3) SolrJ (as ours is a J2EE application). > > I want to know what is the recommended way for Data import in production > environment. > Will sending data via SolrJ in batches be faster than posting a csv using > POST tool? > > > Thanks, > Vishal > -- Thanks, Sujay P Bawaskar M:+91-77091 53669

Data Import

2017-03-17 Thread vishal jain

Hi, I am new to Solr and am trying to move data from my RDBMS to Solr. I know the available options are: 1) Post Tool 2) DIH 3) SolrJ (as ours is a J2EE application). I want to know what is the recommended way for Data import in production environment. Will sending data via SolrJ in batches be faster than posting a csv using POST tool? Thanks, Vishal

Solr data Import

2017-03-17 Thread vishal jain

Hi, I am new to Solr and am trying to move data from my RDBMS to Solr. I know the available options are: 1) Post Tool 2) DIH 3) SolrJ (as ours is a J2EE application). I want to know what is the recommended way for Data import in production environment. Will sending data via SolrJ in batches be faster than posting a csv using POST tool? Thanks, Vishal

Re: Data Import Handler on 6.4.1

2017-03-15 Thread Walter Underwood

Also, upgrade to 6.4.2. There are serious performance problems in 6.4.0 and 6.4.1. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 15, 2017, at 12:05 PM, Liu, Daphne > wrote: > > For Solr 6.3, I have to move mine to > ../solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib. If you are using jetty. > > Kind regards, > > Daphne Liu > BI Architect - Matrix SCM > > CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 > USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / > daphne@cevalogistics.com > > > -Original Message- > From: Michael Tobias [mailto:mtob...@btinternet.com] > Sent: Wednesday, March 15, 2017 2:36 PM > To: solr-user@lucene.apache.org > Subject: Data Import Handler on 6.4.1 > > I am sure I am missing something simple but > > I am running Solr 4.8.1 and trialling 6.4.1 on another computer. > > I have had to manually modify the automatic 6.4.1 scheme config as we use a > set of specialised field types. They work fine. > > I am now trying to populate my core with data and having problems. > > Exactly what names/paths should I be using in the solrconfig.xml file to get > this working - I don’t recall doing ANYTHING for 4.8.1 > > regex=".*\.jar" /> > regex="solr-dataimporthandler-.*\.jar" /> ? > > And where do I put the mysql-connector-java-5.1.29-bin.jar file and how do I > reference it to get it loaded? > > > ?? > > And then later in the solrconfig.xml I have: > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > >db-data-config.xml > > > > > Any help much appreciated. > > Regards > > Michael > > > -Original Message- > From: David Hastings [mailto:hastings.recurs...@gmail.com] > Sent: 15 March 2017 17:47 > To: solr-user@lucene.apache.org > Subject: Re: Get handler not working > > from your previous email: > "There is no "id" > field defined in the schema." > > you need an id field to use the get handler > > On Wed, Mar 15, 2017 at 1:45 PM, Chris Ulicny wrote: > >> I thought that "id" and "ids" were fixed parameters for the get >> handler, but I never remember, so I've already tried both. Each time >> it comes back with the same response of no document. >> >> On Wed, Mar 15, 2017 at 1:31 PM Alexandre Rafalovitch >> >> wrote: >> >>> Actually. >>> >>> I think Real Time Get handler has "id" as a magical parameter, not >>> as a field name. It maps to the real id field via the uniqueKey >>> definition: >>> https://cwiki.apache.org/confluence/display/solr/RealTime+Get >>> >>> So, if you have not, could you try the way you originally wrote it. >>> >>> Regards, >>> Alex. >>> >>> http://www.solr-start.com/ - Resources for Solr users, new and >> experienced >>> >>> >>> On 15 March 2017 at 13:22, Chris Ulicny wrote: >>>> Sorry, that is a typo. The get is using the iqdocid field. There >>>> is no >>> "id" >>>> field defined in the schema. >>>> >>>> solr/TestCollection/get?iqdocid=2957-TV-201604141900 >>>> >>>> solr/TestCollection/select?q=*:*&fq=iqdocid:2957-TV-201604141900 >>>> >>>> On Wed, Mar 15, 2017 at 1:15 PM Erick Erickson < >> erickerick...@gmail.com> >>>> wrote: >>>> >>>>> Is this a typo or are you trying to use get with an "id" field >>>>> and your filter query uses "iqdocid"? >>>>> >>>>> Best, >>>>> Erick >>>>> >>>>> On Wed, Mar 15, 2017 at 8:31 AM, Chris Ulicny >> wrote: >>>>>> Yes, we're using a fixed schema with the iqdocid field set as >>>>>> the >>>>> uniqueKey. >>>>>> >>>>>> On Wed, Mar 15, 2017 at 11:28 AM Alexandre Rafalovitch < >>>>> arafa...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> What is your uniqueKey? Is it iqdocid? >>>>>>> >>>>>>> Regards, >>>>>>> Alex. >>>>>>> >>>>>>> http://www.solr-start.com/ - Resources for Solr users, new and >>>>> experienced >>>>&g

1 2 3 4 5 6 7 >

1 2 3 4 5 6 7 >

1 - 100 of 691 matches

Mail list logo