Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-10 Thread Hup Chen

There was another error which I think it should be an indexing error.
The listprice below is a pdouble filed, the update process didn't ignore the 
error when it was sent wrong data.

Response: {
  "responseHeader":{
"status":400,
"QTime":133551},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","java.lang.NumberFormatException"],
"msg":"ERROR: [doc=978194537913] Error adding field 
'listprice'='106Chapter' msg=For input string: \"106Chapter\"",
"code":400}}



From: Shawn Heisey 
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

On 6/9/2020 12:44 AM, Hup Chen wrote:
> Thanks for your reply, this is one of the example where it fail.  POST by 
> using  charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error 
> found in the title field,  I hope solr can simply skip this record and go 
> ahead to index the rest data.
>
> 
> 
>   9780373773244
>   9780373773244
> Missing: Innocent By Association^Zachary's Law (Hqn 
> Romance) 
>   Lisa_Jackson 
> 
> 
>
> curl 
> "http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100";
>  -H 'Content-Type: text/xml; charset=utf-8' -d @data
>
>
> 
> 
>
> 
>
>100
>400
>0
> 
> 
>
>  org.apache.solr.common.SolrException
>   name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException
>
>Illegal character ((CTRL-CHAR, code 26))
>   at [row,col {unknown-source}]: [1,225]
>400
> 
> 

I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update"; -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn


Re: How to determine why solr stops running?

2020-06-10 Thread Hup Chen
I will check "dmesg" first, to find out any hardware error message.
Then use some system admin tools to monitor that server,
for instance, top, vmstat, lsof, iostat ... or simply install some nice
free monitoring tool into this system, like monit, monitorix, nagios.
Good luck!


From: Ryan W 
Sent: Thursday, June 11, 2020 2:13 AM
To: solr-user@lucene.apache.org 
Subject: Re: How to determine why solr stops running?

Hi all,

People keep suggesting I check the logs for errors.  What do those errors
look like?  Does anyone have examples of the text of a Solr oom error?  Or
the text of any other errors I should be looking for the next time solr
fails?  Are there phrases I should grep for in the logs?  Should I be
looking in the Solr logs for an OOM error, or in the Apache logs?

There is nothing failing on the server except for solr -- at least not that
I can see.  There is no apparent problem with the hardware or anything else
on the server.  The OS is Red Hat Enterprise Linux. The server has 16 GB of
RAM and hosts one website that does not get a huge amount of traffic.

When the start command is given to solr, does it first check to see if solr
is running, or does it always start solr whether it is already running or
not?

Many thanks!
Ryan


On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson 
wrote:

> To add to what Dave said, if you have a particular machine that’s prone to
> suddenly stopping, that’s usually a red flag that you should seriously
> think about hardware issues.
>
> If the problem strikes different machines, then I agree with Shawn that
> the first thing I’d be suspicious of is OOM errors.
>
> FWIW,
> Erick
>
> > On Jun 9, 2020, at 6:05 AM, Dave  wrote:
> >
> > I’ll add that whenever I’ve had a solr instance shut down, for me it’s
> been a hardware failure. Either the ram or the disk got a “glitch” and both
> of these are relatively fragile and wear and tear type parts of the
> machine, and should be expected to fail and be replaced from time to time.
> Solr is pretty aggressive with its logging so there are a lot of writes
> always happening and of course reads, if the disk has any issues or the
> memory it can lock it up and bring her down, more so if you have any
> spellcheck dictionaries or suggesters being built on start up.
> >
> > Just my experience with this, could be wrong (most likely wrong) but we
> always have extra drives and memory around the server room for this
> reason.  At least once or twice a year we will have a disk failure in the
> raid and need to swap in a new one.
> >
> > Good luck though, also solr should be logging it’s failures so it would
> be good to look there too
> >
> >> On Jun 9, 2020, at 2:35 AM, Shawn Heisey  wrote:
> >>
> >> On 5/14/2020 7:22 AM, Ryan W wrote:
> >>> I manage a site where solr has stopped running a couple times in the
> past
> >>> week. The server hasn't been rebooted, so that's not the reason.  What
> else
> >>> causes solr to stop running?  How can I investigate why this is
> happening?
> >>
> >> Any situation where Solr stops running and nobody requested the stop is
> a result of a serious problem that must be thoroughly investigated.  I
> think it's a bad idea for Solr to automatically restart when it stops
> unexpectedly.  Chances are that whatever caused the crash is going to
> simply make the crash happen again until the problem is solved.
> Automatically restarting could hide problems from the system administrator.
> >>
> >> The only way a Solr auto-restart would be acceptable to me is if it
> sends a high priority alert to the sysadmin EVERY time it executes an
> auto-restart.  It really is that bad of a problem.
> >>
> >> The causes of Solr crashes (that I can think of) include the following.
> I believe I have listed these four options from most likely to least likely:
> >>
> >> * Java OutOfMemoryError exceptions.  On non-windows systems, the
> "bin/solr" script starts Solr with an option that results in Solr's death
> anytime one of these exceptions occurs.  We do this because program
> operation is indeterminate and completely unpredictable when OOME occurs,
> so it's far safer to stop running.  That exception can be caused by several
> things, some of which actually do not involve memory at all.  If you're
> running on Windows via the bin\solr.cmd command, then this will not happen
> ... but OOME could still cause a crash, because as I already mentioned,
> program operation is unpredictable when OOME occurs.
> >>
> >> * The OS kills Solr because system memory is completely exhausted and
> Solr is the process using the most memory.  Linux calls this the
> "oom-killer" ... I am pretty sure something like it exists on most
> operating systems.
> >>
> >> * Corruption somewhere in the system.  Could be in Java, the OS, Solr,
> or data used by any of those.
> >>
> >> * A very serious bug in Solr's code that we haven't discovered yet.
> >>
> >> I included that last one simply for completeness.  A bug tha

Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-09 Thread Hup Chen
Oh I got it, that's not indexing error!
Seem like I need to remove all the characters between [\x0-\x1F] (except \x9 
TAB, \xA LF, \xD CR) first.

Thanks a lot!





From: Shawn Heisey 
Sent: Tuesday, June 9, 2020 3:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning


I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble.  I wasn't even
using the tolerant update processor.   I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:

curl "http://localhost:8983/solr/techproducts/update"; -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character.  When I did that, I got exactly the same
error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*.  This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.

Thanks,
Shawn


Re: Fw: TolerantUpdateProcessorFactory not functioning

2020-06-08 Thread Hup Chen
Thanks for your reply, this is one of the example where it fail.  POST by using 
 charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in 
the title field,  I hope solr can simply skip this record and go ahead to index 
the rest data.



 9780373773244
 9780373773244
Missing: Innocent By Association^Zachary's Law (Hqn 
Romance) 
 Lisa_Jackson 





curl 
"http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100";
 -H 'Content-Type: text/xml; charset=utf-8' -d @data






  
  100
  400
  0


  
org.apache.solr.common.SolrException
com.ctc.wstx.exc.WstxUnexpectedCharException
  
  Illegal character ((CTRL-CHAR, code 26))
 at [row,col {unknown-source}]: [1,225]
  400




From: Thomas Corthals 
Sent: Tuesday, June 9, 2020 2:12 PM
To: solr-user@lucene.apache.org 
Subject: Re: Fw: TolerantUpdateProcessorFactory not functioning

If your XML or JSON can't be parsed, your content never makes it to the
update chain.

It looks like you're trying to index non-UTF-8 data. You can set the
encoding of your XML in the Content-Type header of your POST request.

-H 'Content-Type: text/xml; charset=GB18030'

JSON only allows UTF-8, UTF-16 or UTF-32.

Best,

Thomas

Op di 9 jun. 2020 07:11 schreef Hup Chen :

> Any idea?
> I still won't be able to get TolerantUpdateProcessorFactory working, solr
> exited at any error without any tolerance, any suggestions will be
> appreciated.
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100";
> -d @data.xml
>
> 
> 
>
> 
>   
>   100
>   400
>   1
> 
> 
>   
> org.apache.solr.common.SolrException
> com.ctc.wstx.exc.WstxEOFException
>   
>   Unexpected EOF; was expecting a close tag for element
> <field>
>  at [row,col {unknown-source}]: [1,8191]
>   400
> 
> 
>
>
> 
> From: Hup Chen
> Sent: Friday, May 29, 2020 7:29 PM
> To: solr-user@lucene.apache.org 
> Subject: TolerantUpdateProcessorFactory not functioning
>
> Hi,
>
> My solr indexing did not tolerate bad record but simply exited even I have
> configured TolerantUpdateProcessorFactory  in solrconfig.xml.
> Please advise how could I get TolerantUpdateProcessorFactory  to be
> working?
>
> solrconfig.xml:
>
>  
>
>  100
>
>
>  
>
> restarted solr before indexing:
> service solr stop
> service solr start
>
> curl "
> http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100";
> -d @test.json
>
> The first record is a bad record in test.json, the rest were not indexed.
>
> {
>   "responseHeader":{
> "errors":[{
> "type":"ADD",
> "id":"0007264097",
> "message":"ERROR: [doc=0007264097] Error adding field
> 'usedshipping'='' msg=empty String"}],
> "maxErrors":100,
> "status":400,
> "QTime":0},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Cannot parse provided JSON: Expected key,value separator ':':
> char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\",
> \"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko
> OÃtomo\", ãã, \"ima'",
> "code":400}}
>
>


Fw: TolerantUpdateProcessorFactory not functioning

2020-06-08 Thread Hup Chen
Any idea?
I still won't be able to get TolerantUpdateProcessorFactory working, solr 
exited at any error without any tolerance, any suggestions will be appreciated.
curl 
"http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100";
 -d @data.xml





  
  100
  400
  1


  
org.apache.solr.common.SolrException
com.ctc.wstx.exc.WstxEOFException
  
  Unexpected EOF; was expecting a close tag for element 
<field>
 at [row,col {unknown-source}]: [1,8191]
  400




________
From: Hup Chen
Sent: Friday, May 29, 2020 7:29 PM
To: solr-user@lucene.apache.org 
Subject: TolerantUpdateProcessorFactory not functioning

Hi,

My solr indexing did not tolerate bad record but simply exited even I have 
configured TolerantUpdateProcessorFactory  in solrconfig.xml.
Please advise how could I get TolerantUpdateProcessorFactory  to be working?

solrconfig.xml:

 
   
 100
   
   
 

restarted solr before indexing:
service solr stop
service solr start

curl 
"http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100";
 -d @test.json

The first record is a bad record in test.json, the rest were not indexed.

{
  "responseHeader":{
"errors":[{
"type":"ADD",
"id":"0007264097",
"message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' 
msg=empty String"}],
"maxErrors":100,
"status":400,
"QTime":0},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Cannot parse provided JSON: Expected key,value separator ':': 
char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", 
\"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", 
ãã, \"ima'",
"code":400}}



TolerantUpdateProcessorFactory not functioning

2020-05-29 Thread Hup Chen
Hi,

My solr indexing did not tolerate bad record but simply exited even I have 
configured TolerantUpdateProcessorFactory  in solrconfig.xml.
Please advise how could I get TolerantUpdateProcessorFactory  to be working?

solrconfig.xml:

 
   
 100
   
   
 

restarted solr before indexing:
service solr stop
service solr start

curl 
"http://localhost:7070/solr/mycore/update?update.chain=tolerant-chain&maxErrors=100";
 -d @test.json

The first record is a bad record in test.json, the rest were not indexed.

{
  "responseHeader":{
"errors":[{
"type":"ADD",
"id":"0007264097",
"message":"ERROR: [doc=0007264097] Error adding field 'usedshipping'='' 
msg=empty String"}],
"maxErrors":100,
"status":400,
"QTime":0},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"Cannot parse provided JSON: Expected key,value separator ':': 
char=\",position=1240 AFTER='isbn\":\"4032171203\", \"sku\":\"\", 
\"title\":\"ãã³ãã¡ã¡ããã³ã \"author\"' BEFORE=':\"Sachiko OÃtomo\", 
ãã, \"ima'",
"code":400}}



Re: Solr Atomic update change value and field name

2020-05-22 Thread Hup Chen

> Try adding -format solr to your bin/post command. By default the post command 
> will treat input as arbitrary json, not solr-format json.
Yes, it works!  Thanks a lot!

From: Jan Høydahl 
Sent: Friday, May 22, 2020 4:46 AM
To: solr-user@lucene.apache.org 
Subject: Re: Solr Atomic update change value and field name

Try adding -format solr to your bin/post command. By default the post command 
will treat input as arbitrary json, not solr-format json.

Jan Høydahl

> 21. mai 2020 kl. 02:50 skrev Hup Chen :
>
> I am new to Solr. I tried to do Atomic update by using .json file update. 
> $SOLR/bin/post not only changing field values, but field name also has become 
> "fieldname.set", for instance, "price" become "price.set".  Update by curl 
> /update handler was working well but since I have several millions of 
> records, I can't update by calling curl several million times, that will be 
> extremely slow.
>
> Any help will be appreciated.
>
>
># /usr/local/solr/bin/solr version
>8.5.1
>
># curl http://localhost:8983/solr/books/select?q=id%3A0371558727
>"response":{"numFound":1,"start":0,"docs":[
>  {
>"id":"0371558727",
>"price":19.0,
>"_version_":1667214802265571328}]
>}
>
># cat test.json
>[
>{"id":"0371558727",
> "price":{"set":19.95}
>}
>]
>
># /usr/local/solr/bin/post -p 8983 -c books test.json
>
># curl http://localhost:8983/solr/books/select?q=id%3A0371558727
>"response":{"numFound":1,"start":0,"docs":[
>  {
>"id":"0371558727",
>"price.set":[19.95],
>"_version_":1667214933776924672}]
>}
>
>


Solr Atomic update change value and field name

2020-05-20 Thread Hup Chen
I am new to Solr. I tried to do Atomic update by using .json file update. 
$SOLR/bin/post not only changing field values, but field name also has become 
"fieldname.set", for instance, "price" become "price.set".  Update by curl 
/update handler was working well but since I have several millions of records, 
I can't update by calling curl several million times, that will be extremely 
slow.

Any help will be appreciated.


# /usr/local/solr/bin/solr version
8.5.1

# curl http://localhost:8983/solr/books/select?q=id%3A0371558727
"response":{"numFound":1,"start":0,"docs":[
  {
"id":"0371558727",
"price":19.0,
"_version_":1667214802265571328}]
}

# cat test.json
[
{"id":"0371558727",
 "price":{"set":19.95}
}
]

# /usr/local/solr/bin/post -p 8983 -c books test.json

# curl http://localhost:8983/solr/books/select?q=id%3A0371558727
"response":{"numFound":1,"start":0,"docs":[
  {
"id":"0371558727",
"price.set":[19.95],
"_version_":1667214933776924672}]
}