Solr 7.7 Indexing issue

2020-09-30 Thread Manisha Rahatadkar
Hello all

We are using Apache Solr 7.7 on Windows platform. The data is synced to Solr 
using Solr.Net commit. The data is being synced to SOLR in batches. The 
document size is very huge (~0.5GB average) and solr indexing is taking long 
time. Total document size is ~200GB. As the solr commit is done as a part of 
API, the API calls are failing as document indexing is not completed.

  1.  What is your advise on syncing such a large volume of data to Solr KB.
  2.  Because of the search fields requirements, almost 8 fields are defined as 
Text fields.
  3.  Currently Solr_JAVA_MEM is set to 2gb. Is that enough for such a large 
volume of data? ( IF "%SOLR_JAVA_MEM%"=="" set SOLR_JAVA_MEM=-Xms2g -Xmx2g)
  4.  How to set up Solr in production on Windows? Currently it's set up as a 
standalone engine and client is requested to take the backup of the drive. Is 
there any other better way to do? How to set up for the disaster recovery?

Thanks in advance.

Regards
Manisha Rahatadkar


Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


Re: Solr 8.5.2 indexing issue

2020-07-02 Thread gnandre
It seems that the issue is not with reference_url field itself. There is
one copy field which has the reference_url field as source and another
field called url_path as destination.
This destination field url_path has the following field type definition.

  

  
  
 
  
  
  
  
 
  
  


  
  
  
 
  
  
  
  

  

If I remove  SynonymGraphFilterFactory and FlattenGraphFilterFactory in
above field type definition then it works otherwise it throws the
same error (IndexOutOfBoundsException) .

On Sun, Jun 28, 2020 at 9:06 AM Erick Erickson 
wrote:

> How are you sending this to Solr? I just tried 8.5, submitting that doc
> through the admin UI and it works fine.
> I defined “asset_id” with as the same type as your reference_url field.
>
> And does the log on the Solr node that tries to index this give any more
> info?
>
> Best,
> Erick
>
> > On Jun 27, 2020, at 10:45 PM, gnandre  wrote:
> >
> > {
> >"asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",
> >
> >
> "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}
>
>


Re: Solr 8.5.2 indexing issue

2020-06-28 Thread Erick Erickson
How are you sending this to Solr? I just tried 8.5, submitting that doc through 
the admin UI and it works fine. 
I defined “asset_id” with as the same type as your reference_url field.

And does the log on the Solr node that tries to index this give any more info?

Best,
Erick

> On Jun 27, 2020, at 10:45 PM, gnandre  wrote:
> 
> {
>"asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",
> 
> "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}



Solr 8.5.2 indexing issue

2020-06-27 Thread gnandre
Hi,

I have the following document which fails to get indexed.

{
"asset_id":"add-ons:576deefef7453a9189aa039b66500eb2",

"reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"}

I am not sure what is so special about the content in the reference_url
field.

reference_url field is defined as follows in schema:



It throws the following error.

Status: 
{"data":{"responseHeader":{"status":400,"QTime":18},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.IndexOutOfBoundsException"],"msg":"Exception
writing document id add-ons:576deefef7453a9189aa039b66500eb2 to the index;
possible analysis
error.","code":400}},"status":400,"config":{"method":"POST","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","headers":{"Content-type":"application/json","Accept":"application/json,
text/plain, */*","X-Requested-With":"XMLHttpRequest"},"data":"[{\n
\"asset_id\":\"add-ons:576deefef7453a9189aa039b66500eb2\",\n
\"reference_url\":\"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html\"}]","url":"add-ons/update","params":{"wt":"json","_":1593304427428,"commitWithin":1000,"overwrite":true},"timeout":1},"statusText":"Bad
Request","xhrStatus":"complete","resource":{"0":"[","1":"{","2":"\n","3":"
","4":" ","5":" ","6":" ","7":" ","8":" ","9":" ","10":"
","11":"\"","12":"a","13":"s","14":"s","15":"e","16":"t","17":"_","18":"i","19":"d","20":"\"","21":":","22":"\"","23":"a","24":"d","25":"d","26":"-","27":"o","28":"n","29":"s","30":":","31":"5","32":"7","33":"6","34":"d","35":"e","36":"e","37":"f","38":"e","39":"f","40":"7","41":"4","42":"5","43":"3","44":"a","45":"9","46":"1","47":"8","48":"9","49":"a","50":"a","51":"0","52":"3","53":"9","54":"b","55":"6","56":"6","57":"5","58":"0","59":"0","60":"e","61":"b","62":"2","63":"\"","64":",","65":"\n","66":"
","67":" ","68":" ","69":" ","70":" ","71":" ","72":" ","73":"
","74":"\"","75":"r","76":"e","77":"f","78":"e","79":"r","80":"e","81":"n","82":"c","83":"e","84":"_","85":"u","86":"r","87":"l","88":"\"","89":":","90":"\"","91":"m","92":"o","93":"d","94":"e","95":"l","96":"i","97":"n","98":"g","99":"-","100":"a","101":"-","102":"h","103":"i","104":"g","105":"h","106":"-","107":"s","108":"p","109":"e","110":"e","111":"d","112":"-","113":"b","114":"a","115":"c","116":"k","117":"p","118":"l","119":"a","120":"n","121":"e","122":"-","123":"p","124":"a","125":"r","126":"t","127":"-","128":"3","129":"-","130":"4","131":"-","132":"p","133":"o","134":"r","135":"t","136":"-","137":"s","138":"-","139":"p","140":"a","141":"r","142":"a","143":"m","144":"e","145":"t","146":"e","147":"r","148":"s","149":"-","150":"t","151":"o","152":"-","153":"d","154":"i","155":"f","156":"f","157":"e","158":"r","159":"e","160":"n","161":"t","162":"i","163":"a","164":"l","165":"-","166":"t","167":"d","168":"r","169":"-","170":"a","171":"n","172":"d","173":"-","174":"t","175":"d","176":"t","177":".","178":"h","179":"t","180":"m","181":"l","182":"\"","183":"}","184":"]"}}


Re: Migration: SOLR8-Java8 -> SOLR8-JAVA11 indexing issue.

2019-10-24 Thread anup.junagade
Thanks Shawn for checking.

As advised we will execute the indexing with the new settings as mentioned
and will update the results.

Here are the links to missing attachments:

Attachment 1:  OpenJDK 11 vs OpenJDK 8 key metrics
  
Attachment 2:   OpenJDK 11 vs OpenJDK 8 waiting QTP Threads
  
Attachment 3:  OpenJDK 11 Thread dump
  



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Migration: SOLR8-Java8 -> SOLR8-JAVA11 indexing issue.

2019-10-24 Thread Shawn Heisey

On 10/24/2019 11:50 AM, Junagade, Anup wrote:

   *   Attachment 1: OpenJDK 8 vs OpenJDK 8 key metrics
   *   Attachment 2:  OpenJDK 8 vs OpenJDK 8 waiting QTP Threads
   *   Attachment 3: OpenJDK 11 Thread dump


There are no attachments.  Apache mailing lists swallow almost all 
attachments.  You will need to use a file sharing website to 
successfully get files to us.



Heap allocated: 32 GB


If you set your heap to 31GB, you'll actually have more memory available 
to Java than a heap size of 32GB.  This is because at 32GB, longer 
pointers are required.  Solr has a tendency to create a very large 
number of small objects, so the pointer size increase ends up using a 
lot of memory.




   100
   150



These numbers are huge.  Without setting maxMergeAtOnceExplicit, you're 
not getting the full benefit of increasing these settings beyond the 
defaults of 10.  Set maxMergeAtOnce and segmentsPerTier to the same 
number and then use three times that number for maxMergeAtOnceExplicit. 
The Explicit setting is not mentioned in Solr documentation.  Numbers as 
big as you have chosen will result in Solr keeping a LOT of files open, 
becasue the index will end up with a large number of segments.  The OS 
will definitely need to have its "max open files" limit increased.



GC and ZK Settings

-DzkClientTimeout=30
-DzkHost= ,,
-XX:+PrintGCDetails
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:ConcGCThreads=8
-XX:InitiatingHeapOccupancyPercent=70
-XX:MaxGCPauseMillis=200
-XX:ParallelGCThreads=32
-XX:PermSize=512m
-Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,
uptime:filecount=9,
filesize=20M
-Xms32g-Xmx32g
-Xss256k-Xss256k
-verbose:gc


It looks like you have used your own GC settings instead of those that 
Solr comes with.  Your settings are missing one of the most important 
parameters for good GC performance.  You should let Solr's start script 
handle GC tuning and GC logging without interference.


Thanks,
Shawn


Solr 8.1.1 Indexing issue while migrating Java8 -> Java11

2019-10-24 Thread anup.junagade
 
We are trying to migrate our SOLR 8.1.1 cluster from OpenJDK Java 8 to
OpenJDK Java 11 and are facing issues with Indexing. While our indexing is
happening flawlessly on Java 8, it crawls or maybe I should say it stalls
with Java 11.
Any pointers/help is appreciated.
 
*Symptoms*
 
With OpenJDK 11 and SOLR 8.1.1 we see that for the first 30 minutes response
times for updates similar to our current implementation (OpenJDK 8 and SOLR
8.1.1). It has to be noted that there are no read queries being executed at
the time of indexing
On the OpenJDK 11 implementation, the qtp active threads continuously
increasing to thousands while OpenJDK 8 implementation stops after
approximately going up to 150.
On the On the OpenJDK 11 implementation, the classes loaded start at a very
high number and stay there as opposed to  OpenJDK 8  implementation where
the number of classes loaded are small to begin with and remains under
control. I believe the qtp threads in wait state mentioned above are causing
this symptom
Attachment 1:  OpenJDK 8 vs OpenJDK 8 key metrics
  
Attachment 2:   OpenJDK 8 vs OpenJDK 8 waiting QTP Threads
  
Attachment 3:  OpenJDK 11 Thread dump
  
 
 
*Following are the key configuration of our application.*
 
Index Size: 8 GB/shard
Total no of Documents in Solr cluster: 70 Million
Average Document size: 15 KB
JSON Payload for each update contains: 50 docs
Average Time Taken to post for 50 Docs: 300 milli seconds
Average Rate at which documents are being posted to SOLR: 7500 requests per
second
No of shards in the Cluster: 10 (No Replicas)
CPUs: 32
Memory: 128 GB
Heap allocated: 32 GB
SOLR Client: 8.1.1
ZK Ensemble: 3
 

  100
  150

 

48


18
false

 
GC and ZK Settings
 
-DzkClientTimeout=30
-DzkHost= ,,
-XX:+PrintGCDetails
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:ConcGCThreads=8
-XX:InitiatingHeapOccupancyPercent=70
-XX:MaxGCPauseMillis=200
-XX:ParallelGCThreads=32
-XX:PermSize=512m
-Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,
uptime:filecount=9,
filesize=20M
-Xms32g-Xmx32g
-Xss256k-Xss256k
-verbose:gc
 

 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Migration: SOLR8-Java8 -> SOLR8-JAVA11 indexing issue.

2019-10-24 Thread Junagade, Anup
We are trying to migrate our SOLR 8.1.1 cluster from OpenJDK Java 8 to OpenJDK 
Java 11 and are facing issues with Indexing. While our indexing is happening 
flawlessly on Java 8, it crawls or maybe I should say it stalls with Java 11.
Any pointers/help is appreciated.

Symptoms


  *   With OpenJDK 11 and SOLR 8.1.1 we see that for the first 30 minutes 
response times for updates similar to our current implementation (OpenJDK 8 and 
SOLR 8.1.1). It has to be noted that there are no read queries being executed 
at the time of indexing
  *   On the OpenJDK 11 implementation, the qtp active threads continuously 
increasing to thousands while OpenJDK 8 implementation stops after 
approximately going up to 150.
  *   On the On the OpenJDK 11 implementation, the classes loaded start at a 
very high number and stay there as opposed to  OpenJDK 8  implementation where 
the number of classes loaded are small to begin with and remains under control. 
I believe the qtp threads in wait state mentioned above are causing this symptom
  *   Attachment 1: OpenJDK 8 vs OpenJDK 8 key metrics
  *   Attachment 2:  OpenJDK 8 vs OpenJDK 8 waiting QTP Threads
  *   Attachment 3: OpenJDK 11 Thread dump


Following are the key metrics/configuration of our application.

Index Size: 8 GB/shard
Total no of Documents in Solr cluster: 70 Million
Average Document size: 15 KB
JSON Payload for each update contains: 50 docs
Average Time Taken to post for 50 Docs: 300 milli seconds
Average Rate at which documents are being posted to SOLR: 7500 requests per 
second
No of shards in the Cluster: 10 (No Replicas)
CPUs: 32
Memory: 128 GB
Heap allocated: 32 GB
SOLR Client: 8.1.1
ZK Ensemble: 3


  100
  150



48


18
false


GC and ZK Settings

-DzkClientTimeout=30
-DzkHost= ,,
-XX:+PrintGCDetails
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:ConcGCThreads=8
-XX:InitiatingHeapOccupancyPercent=70
-XX:MaxGCPauseMillis=200
-XX:ParallelGCThreads=32
-XX:PermSize=512m
-Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,
uptime:filecount=9,
filesize=20M
-Xms32g-Xmx32g
-Xss256k-Xss256k
-verbose:gc

Thanks,
Anup

This message, including any attachments, is the property of Transform HoldCo 
LLC and/or one of its subsidiaries. It is confidential and may contain 
proprietary or legally privileged information. If you are not the intended 
recipient, please delete it without reading the contents. Thank you.


Re: Regarding pdf indexing issue

2018-07-11 Thread Terry Steichen
Walter,

Well said.  (And I love the hamburger conversion analogy - very apt.)

The only thing I will add is that when you have a collection of similar
rich text documents, you might be able to construct queries to respect
internal structures within the documents.  If all/most of your documents
have a unique line like "subject:", you might be able to be selective.

Also, if your documents are organized on disk in some categorical way,
you can include in your query, a reference to that categorical
information (via the id:*pattern* field).

Finally, there *might* be useful information in the metadata that you
can use in refining your searches.

Terry


On 07/11/2018 11:42 AM, Walter Underwood wrote:
> PDF is not a structured document format. It is a printer control format.
>
> PDF does not have a paragraph marker. Instead, it says to move
> to this spot on the page, choose this font, and print this letter. For a
> paragraph, it moves farther. For the next letter in a word, it moves a 
> little bit. Extracting paragraphs from that is a difficult pattern recognition
> problem.
>
> I worked with a PDF of a two-column magazine article that printed
> the first line of column 1, then the first line of column 2, then the 
> second line of column 1, and so on. If a line ended with a hyphenated
> word, too bad.
>
> Extracting structure from a PDF document is somewhere between 
> very hard and impossible. Someone I worked with said that getting
> structured text from PDF was like turning hamburger back into a cow.
>
> Since Acrobat 5, there is “tagged PDF”. I’m not sure how widely that
> is used. It appears to be an accessibility feature, so it still might not
> be useful for search.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Jul 11, 2018, at 8:07 AM, Erick Erickson  wrote:
>>
>> Solr will not do this automatically, the Extracting Request Handler
>> simply indexes the entire contents of the doc without regard to things
>> like paragraphs etc. Ditto with HTML. This is actually a task that
>> requires getting into Tika and using all the bells and whistles there.
>>
>> I'd recommend two things:
>>
>> 1> Take the PDF parsing offline, i.e. in a separate client. There are
>> many reasons for this, in particular you can attempt to do what you're
>> asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/
>>
>> 2> Talk to the Tika folks about the best ways to make Tika return the
>> information such that you can index them and get what you'd like.
>>
>> Best,
>> Erick
>>
>> On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
>>  wrote:
>>> Hello Team,
>>>
>>> I am using the Solr for indexing and searching for pdf document
>>>
>>> I have go through with your website document and installed solr but unable
>>> to index and search the document.
>>>
>>> For example: Suppose we have a PDF file which have no of paragraph with
>>> separate heading.
>>>
>>> So If I search for the title on indexed pdf the result should be contain
>>> the paragraph from where the title belongs.
>>>
>>> I am unable to perform this task.
>>>
>>> I have run the below command for upload the pdf
>>>
>>> *bin/post -c gettingstarted pdf-sample.pdf*
>>>
>>> and for searching I am running the command
>>>
>>> *curl http://localhost:8983/solr/gettingstarted/select?q='*
>>> >>
>>> Please suggest me anything and let me know if I am missing anything
>>>
>>> Thanks,
>>>
>>> Rahul
>



Re: Regarding pdf indexing issue

2018-07-11 Thread Shamik Sinha
You may try to use tesseract tool to check data extraction from pdf or
images and then go forward accordingly. As far as I understand the PDF is
an image and not data. The searchable PDF actually overlays the selectable
text as hidden text over the PDF image. These PDFs can be indexed and
extracted. These are mostly supported in english and other latin
derivatives. You may face problems to extract/index text based on any other
language. Handwritten text converted to PDFs are next to impossible to
index/extract. Apache Tika may be the solution you are looking for
On Wed 11 Jul, 2018, 9:12 PM Walter Underwood, 
wrote:

> PDF is not a structured document format. It is a printer control format.
>
> PDF does not have a paragraph marker. Instead, it says to move
> to this spot on the page, choose this font, and print this letter. For a
> paragraph, it moves farther. For the next letter in a word, it moves a
> little bit. Extracting paragraphs from that is a difficult pattern
> recognition
> problem.
>
> I worked with a PDF of a two-column magazine article that printed
> the first line of column 1, then the first line of column 2, then the
> second line of column 1, and so on. If a line ended with a hyphenated
> word, too bad.
>
> Extracting structure from a PDF document is somewhere between
> very hard and impossible. Someone I worked with said that getting
> structured text from PDF was like turning hamburger back into a cow.
>
> Since Acrobat 5, there is “tagged PDF”. I’m not sure how widely that
> is used. It appears to be an accessibility feature, so it still might not
> be useful for search.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jul 11, 2018, at 8:07 AM, Erick Erickson 
> wrote:
> >
> > Solr will not do this automatically, the Extracting Request Handler
> > simply indexes the entire contents of the doc without regard to things
> > like paragraphs etc. Ditto with HTML. This is actually a task that
> > requires getting into Tika and using all the bells and whistles there.
> >
> > I'd recommend two things:
> >
> > 1> Take the PDF parsing offline, i.e. in a separate client. There are
> > many reasons for this, in particular you can attempt to do what you're
> > asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/
> >
> > 2> Talk to the Tika folks about the best ways to make Tika return the
> > information such that you can index them and get what you'd like.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
> >  wrote:
> >> Hello Team,
> >>
> >> I am using the Solr for indexing and searching for pdf document
> >>
> >> I have go through with your website document and installed solr but
> unable
> >> to index and search the document.
> >>
> >> For example: Suppose we have a PDF file which have no of paragraph with
> >> separate heading.
> >>
> >> So If I search for the title on indexed pdf the result should be contain
> >> the paragraph from where the title belongs.
> >>
> >> I am unable to perform this task.
> >>
> >> I have run the below command for upload the pdf
> >>
> >> *bin/post -c gettingstarted pdf-sample.pdf*
> >>
> >> and for searching I am running the command
> >>
> >> *curl http://localhost:8983/solr/gettingstarted/select?q='*
> >>  >>
> >> Please suggest me anything and let me know if I am missing anything
> >>
> >> Thanks,
> >>
> >> Rahul
>
>


Re: Regarding pdf indexing issue

2018-07-11 Thread Walter Underwood
PDF is not a structured document format. It is a printer control format.

PDF does not have a paragraph marker. Instead, it says to move
to this spot on the page, choose this font, and print this letter. For a
paragraph, it moves farther. For the next letter in a word, it moves a 
little bit. Extracting paragraphs from that is a difficult pattern recognition
problem.

I worked with a PDF of a two-column magazine article that printed
the first line of column 1, then the first line of column 2, then the 
second line of column 1, and so on. If a line ended with a hyphenated
word, too bad.

Extracting structure from a PDF document is somewhere between 
very hard and impossible. Someone I worked with said that getting
structured text from PDF was like turning hamburger back into a cow.

Since Acrobat 5, there is “tagged PDF”. I’m not sure how widely that
is used. It appears to be an accessibility feature, so it still might not
be useful for search.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 11, 2018, at 8:07 AM, Erick Erickson  wrote:
> 
> Solr will not do this automatically, the Extracting Request Handler
> simply indexes the entire contents of the doc without regard to things
> like paragraphs etc. Ditto with HTML. This is actually a task that
> requires getting into Tika and using all the bells and whistles there.
> 
> I'd recommend two things:
> 
> 1> Take the PDF parsing offline, i.e. in a separate client. There are
> many reasons for this, in particular you can attempt to do what you're
> asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/
> 
> 2> Talk to the Tika folks about the best ways to make Tika return the
> information such that you can index them and get what you'd like.
> 
> Best,
> Erick
> 
> On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
>  wrote:
>> Hello Team,
>> 
>> I am using the Solr for indexing and searching for pdf document
>> 
>> I have go through with your website document and installed solr but unable
>> to index and search the document.
>> 
>> For example: Suppose we have a PDF file which have no of paragraph with
>> separate heading.
>> 
>> So If I search for the title on indexed pdf the result should be contain
>> the paragraph from where the title belongs.
>> 
>> I am unable to perform this task.
>> 
>> I have run the below command for upload the pdf
>> 
>> *bin/post -c gettingstarted pdf-sample.pdf*
>> 
>> and for searching I am running the command
>> 
>> *curl http://localhost:8983/solr/gettingstarted/select?q='*
>> > 
>> Please suggest me anything and let me know if I am missing anything
>> 
>> Thanks,
>> 
>> Rahul



Re: Regarding pdf indexing issue

2018-07-11 Thread Erick Erickson
Solr will not do this automatically, the Extracting Request Handler
simply indexes the entire contents of the doc without regard to things
like paragraphs etc. Ditto with HTML. This is actually a task that
requires getting into Tika and using all the bells and whistles there.

I'd recommend two things:

1> Take the PDF parsing offline, i.e. in a separate client. There are
many reasons for this, in particular you can attempt to do what you're
asking. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/

2> Talk to the Tika folks about the best ways to make Tika return the
information such that you can index them and get what you'd like.

Best,
Erick

On Wed, Jul 11, 2018 at 6:35 AM, Rahul Prasad Dwivedi
 wrote:
> Hello Team,
>
> I am using the Solr for indexing and searching for pdf document
>
> I have go through with your website document and installed solr but unable
> to index and search the document.
>
> For example: Suppose we have a PDF file which have no of paragraph with
> separate heading.
>
> So If I search for the title on indexed pdf the result should be contain
> the paragraph from where the title belongs.
>
> I am unable to perform this task.
>
> I have run the below command for upload the pdf
>
> *bin/post -c gettingstarted pdf-sample.pdf*
>
> and for searching I am running the command
>
> *curl http://localhost:8983/solr/gettingstarted/select?q='*
> 
> Please suggest me anything and let me know if I am missing anything
>
> Thanks,
>
> Rahul


Regarding pdf indexing issue

2018-07-11 Thread Rahul Prasad Dwivedi
Hello Team,

I am using the Solr for indexing and searching for pdf document

I have go through with your website document and installed solr but unable
to index and search the document.

For example: Suppose we have a PDF file which have no of paragraph with
separate heading.

So If I search for the title on indexed pdf the result should be contain
the paragraph from where the title belongs.

I am unable to perform this task.

I have run the below command for upload the pdf

*bin/post -c gettingstarted pdf-sample.pdf*

and for searching I am running the command

*curl http://localhost:8983/solr/gettingstarted/select?q='*


Re: Indexing issue - index get deleted

2015-06-11 Thread Alessandro Benedetti
Hi Chris,
Amazing Analysis !
I did actually not investigated the log, because I was first trying to get
more information from the user.
We are running full import and delta import crons .

Fulll index once a day

delta index : every 10 mins


last night my index automatically deleted(numdocs=0).

attaching logs for review .

Reading better the user initial mail , he does a full import as well ( and
at this point, cleaning the Index) .
Not sure is there any practical reason to do that, the user will clarify
that to us.

So after the clean happened, something prevented the full import to
proceed, and we had the weird behaviour monitored in the logs.

Really curious of understanding this better :)


2015-06-11 1:36 GMT+01:00 Chris Hostetter hossman_luc...@fucit.org:


 : The guys was using delta import anyway, so maybe the problem is
 : different and not related to the clean.

 that's not what the logs say.

 Here's what i see...

 Log begins with server startup @ Jun 10, 2015 11:14:56 AM

 The DeletionPolicy for the shopclue_prod core is initialized at Jun
 10, 2015 11:15:04 AM and we see a few interesting things here we note
 for the future as we keep reading...

 1) There is currently commits:num=1 commits on disk
 2) the current index dir in use is index.20150311161021822
 3) the current segment  generation are segFN=segments_1a,generation=46

 Immediately after this, we see some searcher warming using a searcher with
 this same segments file, and then this searcher is registered (Jun 10,
 2015 11:15:05 AM) and the core is registered.

 Next we see some replication polling, and we see what look like some
 simple monitoring requests for q=* which return hits=85898 being
 repeated over and over.

 At Jun 10, 2015 11:16:30 AM we see some requests for /dataimport that
 look like they are coming from the UI. and then at Jun 10, 2015 11:17:01
 AM we see a request for a full import started.

 We have no idea what the data import configuration file looks like, so we
 have no idea if clean=false is being used or not.  it's certianly not
 specified in the URL.

 We see some more monitoring URLs returning hits=85898 and some more
 /repliation status calls, and then @ Jun 10, 2015 11:18:02 AM we see the
 first commit executed since hte server started up.

 there's no indication that this commit came from an external request (eg
 /update) so probably was made by some internal request.  One
 possiblility is that it came from DIH finishing -- but i doubt it, i'm
 fairly sure that would have involved more logging then this.  A more
 probably scenerio is that it came from an autoCommit setting -- the fact
 that it is almost exactly 60 seconds after DIH started -- and almost
 exactly 60 seconds after DIH may have done a deleteAll query due to
 clean=true -- makes it seem very likely that this was a 1 minute
 autoCommit)

 (but since we don't have either hte data import config, or the
 solrconfig.xml, we have no way of knowing -- it's all just guess work.)

 Very importantly, note that this commit is not opening a new searcher...

 Jun 10, 2015 11:18:02 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 Here are some other interesting things to note from the logging
 that comes from the DeletionPolicy when this commit happens...

 1) it now notes that there are commits:num=2 on disk
 2) the current index dir hasn't changed (index.20150311161021822) so
 some weird replication command didn't swap the world out from under us
 3) the newest segment/generation are segFN=segments_1b,generation=47
 4) the newest commit has no other files in it besides the segments file.

 this means, with out a doubt, there are no documents in this commits view
 of the index.  they have all been deleted by something.


 At this point the *old* searcher (for commit generation 46) is still in
 use however -- nothing has done an openSearcher=true.

 we see more /dataimport status requests, and other requests that appear to
 come from the Solr UI, and more monitoring queries that still return
 hits=85898 because the same searcher is in use.

 At Jun 10, 2015 11:27:04 AM we see another commit happen -- again, no
 indication that this came from an outside /update request, so it might be
 from DIH, or it might be from an autoCommit setting.  the fact that it is
 nearly exactly 10 minutes after DIH started (and probably did a clean=true
 deleteAll query) makes it seem extremely likely this is an autoSoftCommit
 setting kicking in.

 Very importantly, note that this softCommit *does* open a new searcher...

 Jun 10, 2015 11:27:04 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start

 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}


 In less then a second, this new searcher is warmed up and the next time we
 see a q=* monitoring query get 

Re: Indexing issue - index get deleted

2015-06-11 Thread Midas A
Thanks . for replying ..

please find the data-config



On Thu, Jun 11, 2015 at 6:06 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : The guys was using delta import anyway, so maybe the problem is
 : different and not related to the clean.

 that's not what the logs say.

 Here's what i see...

 Log begins with server startup @ Jun 10, 2015 11:14:56 AM

 The DeletionPolicy for the shopclue_prod core is initialized at Jun
 10, 2015 11:15:04 AM and we see a few interesting things here we note
 for the future as we keep reading...

 1) There is currently commits:num=1 commits on disk
 2) the current index dir in use is index.20150311161021822
 3) the current segment  generation are segFN=segments_1a,generation=46

 Immediately after this, we see some searcher warming using a searcher with
 this same segments file, and then this searcher is registered (Jun 10,
 2015 11:15:05 AM) and the core is registered.

 Next we see some replication polling, and we see what look like some
 simple monitoring requests for q=* which return hits=85898 being
 repeated over and over.

 At Jun 10, 2015 11:16:30 AM we see some requests for /dataimport that
 look like they are coming from the UI. and then at Jun 10, 2015 11:17:01
 AM we see a request for a full import started.

 We have no idea what the data import configuration file looks like, so we
 have no idea if clean=false is being used or not.  it's certianly not
 specified in the URL.

 We see some more monitoring URLs returning hits=85898 and some more
 /repliation status calls, and then @ Jun 10, 2015 11:18:02 AM we see the
 first commit executed since hte server started up.

 there's no indication that this commit came from an external request (eg
 /update) so probably was made by some internal request.  One
 possiblility is that it came from DIH finishing -- but i doubt it, i'm
 fairly sure that would have involved more logging then this.  A more
 probably scenerio is that it came from an autoCommit setting -- the fact
 that it is almost exactly 60 seconds after DIH started -- and almost
 exactly 60 seconds after DIH may have done a deleteAll query due to
 clean=true -- makes it seem very likely that this was a 1 minute
 autoCommit)

 (but since we don't have either hte data import config, or the
 solrconfig.xml, we have no way of knowing -- it's all just guess work.)

 Very importantly, note that this commit is not opening a new searcher...

 Jun 10, 2015 11:18:02 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

 Here are some other interesting things to note from the logging
 that comes from the DeletionPolicy when this commit happens...

 1) it now notes that there are commits:num=2 on disk
 2) the current index dir hasn't changed (index.20150311161021822) so
 some weird replication command didn't swap the world out from under us
 3) the newest segment/generation are segFN=segments_1b,generation=47
 4) the newest commit has no other files in it besides the segments file.

 this means, with out a doubt, there are no documents in this commits view
 of the index.  they have all been deleted by something.


 At this point the *old* searcher (for commit generation 46) is still in
 use however -- nothing has done an openSearcher=true.

 we see more /dataimport status requests, and other requests that appear to
 come from the Solr UI, and more monitoring queries that still return
 hits=85898 because the same searcher is in use.

 At Jun 10, 2015 11:27:04 AM we see another commit happen -- again, no
 indication that this came from an outside /update request, so it might be
 from DIH, or it might be from an autoCommit setting.  the fact that it is
 nearly exactly 10 minutes after DIH started (and probably did a clean=true
 deleteAll query) makes it seem extremely likely this is an autoSoftCommit
 setting kicking in.

 Very importantly, note that this softCommit *does* open a new searcher...

 Jun 10, 2015 11:27:04 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start

 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}


 In less then a second, this new searcher is warmed up and the next time we
 see a q=* monitoring query get logged, it returns hits=0.

 Note that at no point in the logs, after the DataImporter is started, do
 we see it log anything other then that it has initiated the request to
 MySQL -- we do see some logs starting ~ Jun 10, 2015 11:41:19 AM
 indicating that someone was using the Web UI to look at the dataimport
 handler's status report.  it would be really nice to know what that person
 saw at that point -- because my guess is DIH was still running and was
 staled waiting for MySql, and hadn't even started adding docs to Solr (if
 it had, i'm certian there would have been some log of it).

 So instead, the combination of a 

Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me answer in line, to get more info :

2015-06-10 10:59 GMT+01:00 Midas A test.mi...@gmail.com:

 Hi Alessandro,

 Please find the answers inline and help me out to figure out this problem.

 1) Solr version : *4.2.1*
 2) Solr architecture :* Master -slave/ Replication with requestHandler*



Where happened the issue ?
Have you read this :
The SQL Entity Processor

The SqlEntityProcessor is the default processor. The associated data source
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-JdbcDataSource
should
be a JDBC URL.

The entity attributes specific to this processor are shown in the table
below.

Attribute

Use

query

Required. The SQL query used to select rows.

deltaQuery

SQL query used if the operation is delta-import. This query selects the
primary keys of the rows which will be parts of the delta-update. The pks
will be available to the deltaImportQuery through the variable
${dataimporter.delta.column-name}.

parentDeltaQuery

SQL query used if the operation is delta-import.

deletedPkQuery

SQL query used if the operation is delta-import.

deltaImportQuery

SQL query used if the operation is delta-import. If this is not present,
DIH tries to construct the import query by(after identifying the delta)
modifying the 'query' (this is error prone). There is a namespace
${dataimporter.delta.column-name} which can be used in this query. For
example, select * from tbl where id=${dataimporter.delta.id}.

It is from Solr official wiki.
You should be sure you adhere to the proper configurations.

 3) Kind of data source indexed : *Mysql *

what about your delta query ? that one is the responsible for the delta
indexing

 4) What happened to the datasource ? any change in there ? : *No change *

Nothing relevant happened there ? any deletion or weird update to the
database ?

 5) Was the index actually deleted ? All docs deleted ? Index file segments
 deleted ? Index corrupted ? : *all docs deleted , segment files  are there.
 index file is also there .*

So a deletion + commit happened, but still no merge purging the index
deleted content ?


 6) What about system resources ?
 * JVM: 30 GB*
 * RAM: 48 GB*

 *CPU : 8 core*


eheheh not interested in your current resources, I have no indication of
the size of your data, My question was more related to check if the system
was healthy from the system resource point of view.

Cheers


 On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  Let me try to help you, first of all I would like to encourage people to
  post more information about their scenario than This is my log, index
  deleted, help me :)
 
  This kind of Info can be really useful :
 
  1) Solr version
  2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
  Sharding ? Manual Replication ? where the problem happened ? )
  3) Kind of data source indexed
  4) What happened to the datasource ? any change in there ?
  5) Was the index actually deleted ? All docs deleted ? Index file
 segments
  deleted ? Index corrupted ?
  6) What about system resources ?
 
  These questions are only few example one that everyone should always post
  along their mysterious problem !
 
  Hope this helps,
 
  Cheers
 
 
  2015-06-10 9:15 GMT+01:00 Midas A test.mi...@gmail.com:
 
  
   We are running full import and delta import crons .
  
   Fulll index once a day
  
   delta index : every 10 mins
  
  
   last night my index automatically deleted(numdocs=0).
  
   attaching logs for review .
  
   please suggest to resolve the issue.
  
  
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Wow, Upaya, I didn't know that clean was default=true in the delta import
as well!
I did know it was default in the full import, but I agree with you that
having a default to true for delta import is very dangerous !

But assuming the user was using the delta import so far, if cleaning every
time, how was possible to have a coherent index ?

Using a delta import with clean=true should produce a non consistent index
with only a subset ( the latest modified) of the entire data set !

Cheers

2015-06-10 11:46 GMT+01:00 Upayavira u...@odoko.co.uk:

 Note the clean= parameter to the DIH. It defaults to true. It will wipe
 your index before it runs. Perhaps it succeeded at wiping, but failed to
 connect to your database. Hence an empty DB?

 clean=true is, IMO, a very dangerous default option.

 Upayavira

 On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
  Hi Alessandro,
 
  Please find the answers inline and help me out to figure out this
  problem.
 
  1) Solr version : *4.2.1*
  2) Solr architecture :* Master -slave/ Replication with requestHandler*
 
  3) Kind of data source indexed : *Mysql *
  4) What happened to the datasource ? any change in there ? : *No change *
  5) Was the index actually deleted ? All docs deleted ? Index file
  segments
  deleted ? Index corrupted ? : *all docs deleted , segment files  are
  there.
  index file is also there .*
  6) What about system resources ?
  * JVM: 30 GB*
  * RAM: 48 GB*
 
  *CPU : 8 core*
 
 
  On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
   Let me try to help you, first of all I would like to encourage people
 to
   post more information about their scenario than This is my log, index
   deleted, help me :)
  
   This kind of Info can be really useful :
  
   1) Solr version
   2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
   Sharding ? Manual Replication ? where the problem happened ? )
   3) Kind of data source indexed
   4) What happened to the datasource ? any change in there ?
   5) Was the index actually deleted ? All docs deleted ? Index file
 segments
   deleted ? Index corrupted ?
   6) What about system resources ?
  
   These questions are only few example one that everyone should always
 post
   along their mysterious problem !
  
   Hope this helps,
  
   Cheers
  
  
   2015-06-10 9:15 GMT+01:00 Midas A test.mi...@gmail.com:
  
   
We are running full import and delta import crons .
   
Fulll index once a day
   
delta index : every 10 mins
   
   
last night my index automatically deleted(numdocs=0).
   
attaching logs for review .
   
please suggest to resolve the issue.
   
   
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Midas A
Hi Alessandro,

Please find the answers inline and help me out to figure out this problem.

1) Solr version : *4.2.1*
2) Solr architecture :* Master -slave/ Replication with requestHandler*

3) Kind of data source indexed : *Mysql *
4) What happened to the datasource ? any change in there ? : *No change *
5) Was the index actually deleted ? All docs deleted ? Index file segments
deleted ? Index corrupted ? : *all docs deleted , segment files  are there.
index file is also there .*
6) What about system resources ?
* JVM: 30 GB*
* RAM: 48 GB*

*CPU : 8 core*


On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 Let me try to help you, first of all I would like to encourage people to
 post more information about their scenario than This is my log, index
 deleted, help me :)

 This kind of Info can be really useful :

 1) Solr version
 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
 Sharding ? Manual Replication ? where the problem happened ? )
 3) Kind of data source indexed
 4) What happened to the datasource ? any change in there ?
 5) Was the index actually deleted ? All docs deleted ? Index file segments
 deleted ? Index corrupted ?
 6) What about system resources ?

 These questions are only few example one that everyone should always post
 along their mysterious problem !

 Hope this helps,

 Cheers


 2015-06-10 9:15 GMT+01:00 Midas A test.mi...@gmail.com:

 
  We are running full import and delta import crons .
 
  Fulll index once a day
 
  delta index : every 10 mins
 
 
  last night my index automatically deleted(numdocs=0).
 
  attaching logs for review .
 
  please suggest to resolve the issue.
 
 


 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
Note the clean= parameter to the DIH. It defaults to true. It will wipe
your index before it runs. Perhaps it succeeded at wiping, but failed to
connect to your database. Hence an empty DB?

clean=true is, IMO, a very dangerous default option.

Upayavira

On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
 Hi Alessandro,
 
 Please find the answers inline and help me out to figure out this
 problem.
 
 1) Solr version : *4.2.1*
 2) Solr architecture :* Master -slave/ Replication with requestHandler*
 
 3) Kind of data source indexed : *Mysql *
 4) What happened to the datasource ? any change in there ? : *No change *
 5) Was the index actually deleted ? All docs deleted ? Index file
 segments
 deleted ? Index corrupted ? : *all docs deleted , segment files  are
 there.
 index file is also there .*
 6) What about system resources ?
 * JVM: 30 GB*
 * RAM: 48 GB*
 
 *CPU : 8 core*
 
 
 On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:
 
  Let me try to help you, first of all I would like to encourage people to
  post more information about their scenario than This is my log, index
  deleted, help me :)
 
  This kind of Info can be really useful :
 
  1) Solr version
  2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
  Sharding ? Manual Replication ? where the problem happened ? )
  3) Kind of data source indexed
  4) What happened to the datasource ? any change in there ?
  5) Was the index actually deleted ? All docs deleted ? Index file segments
  deleted ? Index corrupted ?
  6) What about system resources ?
 
  These questions are only few example one that everyone should always post
  along their mysterious problem !
 
  Hope this helps,
 
  Cheers
 
 
  2015-06-10 9:15 GMT+01:00 Midas A test.mi...@gmail.com:
 
  
   We are running full import and delta import crons .
  
   Fulll index once a day
  
   delta index : every 10 mins
  
  
   last night my index automatically deleted(numdocs=0).
  
   attaching logs for review .
  
   please suggest to resolve the issue.
  
  
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 


Re: Indexing issue - index get deleted

2015-06-10 Thread Upayavira
I was only speaking about full import regarding the default of
clean=true. However, looking at the source code, it doesn't seem to
differentiate especially between a full and a delta in relation to the
default of clean=true, which would be pretty crappy. However, I'd need
to try it.

Upayavira

On Wed, Jun 10, 2015, at 11:57 AM, Alessandro Benedetti wrote:
 Wow, Upaya, I didn't know that clean was default=true in the delta import
 as well!
 I did know it was default in the full import, but I agree with you that
 having a default to true for delta import is very dangerous !
 
 But assuming the user was using the delta import so far, if cleaning
 every
 time, how was possible to have a coherent index ?
 
 Using a delta import with clean=true should produce a non consistent
 index
 with only a subset ( the latest modified) of the entire data set !
 
 Cheers
 
 2015-06-10 11:46 GMT+01:00 Upayavira u...@odoko.co.uk:
 
  Note the clean= parameter to the DIH. It defaults to true. It will wipe
  your index before it runs. Perhaps it succeeded at wiping, but failed to
  connect to your database. Hence an empty DB?
 
  clean=true is, IMO, a very dangerous default option.
 
  Upayavira
 
  On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
   Hi Alessandro,
  
   Please find the answers inline and help me out to figure out this
   problem.
  
   1) Solr version : *4.2.1*
   2) Solr architecture :* Master -slave/ Replication with requestHandler*
  
   3) Kind of data source indexed : *Mysql *
   4) What happened to the datasource ? any change in there ? : *No change *
   5) Was the index actually deleted ? All docs deleted ? Index file
   segments
   deleted ? Index corrupted ? : *all docs deleted , segment files  are
   there.
   index file is also there .*
   6) What about system resources ?
   * JVM: 30 GB*
   * RAM: 48 GB*
  
   *CPU : 8 core*
  
  
   On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
Let me try to help you, first of all I would like to encourage people
  to
post more information about their scenario than This is my log, index
deleted, help me :)
   
This kind of Info can be really useful :
   
1) Solr version
2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
Sharding ? Manual Replication ? where the problem happened ? )
3) Kind of data source indexed
4) What happened to the datasource ? any change in there ?
5) Was the index actually deleted ? All docs deleted ? Index file
  segments
deleted ? Index corrupted ?
6) What about system resources ?
   
These questions are only few example one that everyone should always
  post
along their mysterious problem !
   
Hope this helps,
   
Cheers
   
   
2015-06-10 9:15 GMT+01:00 Midas A test.mi...@gmail.com:
   

 We are running full import and delta import crons .

 Fulll index once a day

 delta index : every 10 mins


 last night my index automatically deleted(numdocs=0).

 attaching logs for review .

 please suggest to resolve the issue.


   
   
--
--
   
Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti
   
Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?
   
William Blake - Songs of Experience -1794 England
   
 
 
 
 
 -- 
 --
 
 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti
 
 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?
 
 William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Let me try to help you, first of all I would like to encourage people to
post more information about their scenario than This is my log, index
deleted, help me :)

This kind of Info can be really useful :

1) Solr version
2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ? Manual
Sharding ? Manual Replication ? where the problem happened ? )
3) Kind of data source indexed
4) What happened to the datasource ? any change in there ?
5) Was the index actually deleted ? All docs deleted ? Index file segments
deleted ? Index corrupted ?
6) What about system resources ?

These questions are only few example one that everyone should always post
along their mysterious problem !

Hope this helps,

Cheers


2015-06-10 9:15 GMT+01:00 Midas A test.mi...@gmail.com:


 We are running full import and delta import crons .

 Fulll index once a day

 delta index : every 10 mins


 last night my index automatically deleted(numdocs=0).

 attaching logs for review .

 please suggest to resolve the issue.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Alessandro Benedetti
Just taking a look to the code :


if (requestParams.containsKey(clean)) {
  clean = StrUtils.parseBool( (String) requestParams.get(clean), true);
} else if (DataImporter.DELTA_IMPORT_CMD.equals(command) ||
DataImporter.IMPORT_CMD.equals(command)) {
  clean = false;
} else  {
  clean = debug ? false : true;
}


Which make sense, as I would be surprised to see a delta import with a
default cleaning.

The guys was using delta import anyway, so maybe the problem is
different and not related to the clean.

But he needs definitely to give us more information .

Cheers


2015-06-10 12:11 GMT+01:00 Upayavira u...@odoko.co.uk:

 I was only speaking about full import regarding the default of
 clean=true. However, looking at the source code, it doesn't seem to
 differentiate especially between a full and a delta in relation to the
 default of clean=true, which would be pretty crappy. However, I'd need
 to try it.

 Upayavira

 On Wed, Jun 10, 2015, at 11:57 AM, Alessandro Benedetti wrote:
  Wow, Upaya, I didn't know that clean was default=true in the delta import
  as well!
  I did know it was default in the full import, but I agree with you that
  having a default to true for delta import is very dangerous !
 
  But assuming the user was using the delta import so far, if cleaning
  every
  time, how was possible to have a coherent index ?
 
  Using a delta import with clean=true should produce a non consistent
  index
  with only a subset ( the latest modified) of the entire data set !
 
  Cheers
 
  2015-06-10 11:46 GMT+01:00 Upayavira u...@odoko.co.uk:
 
   Note the clean= parameter to the DIH. It defaults to true. It will wipe
   your index before it runs. Perhaps it succeeded at wiping, but failed
 to
   connect to your database. Hence an empty DB?
  
   clean=true is, IMO, a very dangerous default option.
  
   Upayavira
  
   On Wed, Jun 10, 2015, at 10:59 AM, Midas A wrote:
Hi Alessandro,
   
Please find the answers inline and help me out to figure out this
problem.
   
1) Solr version : *4.2.1*
2) Solr architecture :* Master -slave/ Replication with
 requestHandler*
   
3) Kind of data source indexed : *Mysql *
4) What happened to the datasource ? any change in there ? : *No
 change *
5) Was the index actually deleted ? All docs deleted ? Index file
segments
deleted ? Index corrupted ? : *all docs deleted , segment files  are
there.
index file is also there .*
6) What about system resources ?
* JVM: 30 GB*
* RAM: 48 GB*
   
*CPU : 8 core*
   
   
On Wed, Jun 10, 2015 at 2:13 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:
   
 Let me try to help you, first of all I would like to encourage
 people
   to
 post more information about their scenario than This is my log,
 index
 deleted, help me :)

 This kind of Info can be really useful :

 1) Solr version
 2) Solr architecture ( Solr Cloud ? Solr Cloud configuration ?
 Manual
 Sharding ? Manual Replication ? where the problem happened ? )
 3) Kind of data source indexed
 4) What happened to the datasource ? any change in there ?
 5) Was the index actually deleted ? All docs deleted ? Index file
   segments
 deleted ? Index corrupted ?
 6) What about system resources ?

 These questions are only few example one that everyone should
 always
   post
 along their mysterious problem !

 Hope this helps,

 Cheers


 2015-06-10 9:15 GMT+01:00 Midas A test.mi...@gmail.com:

 
  We are running full import and delta import crons .
 
  Fulll index once a day
 
  delta index : every 10 mins
 
 
  last night my index automatically deleted(numdocs=0).
 
  attaching logs for review .
 
  please suggest to resolve the issue.
 
 


 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England

  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Indexing issue - index get deleted

2015-06-10 Thread Chris Hostetter

: The guys was using delta import anyway, so maybe the problem is
: different and not related to the clean.

that's not what the logs say.

Here's what i see...

Log begins with server startup @ Jun 10, 2015 11:14:56 AM

The DeletionPolicy for the shopclue_prod core is initialized at Jun 
10, 2015 11:15:04 AM and we see a few interesting things here we note 
for the future as we keep reading...

1) There is currently commits:num=1 commits on disk
2) the current index dir in use is index.20150311161021822
3) the current segment  generation are segFN=segments_1a,generation=46

Immediately after this, we see some searcher warming using a searcher with 
this same segments file, and then this searcher is registered (Jun 10, 
2015 11:15:05 AM) and the core is registered.

Next we see some replication polling, and we see what look like some 
simple monitoring requests for q=* which return hits=85898 being 
repeated over and over.

At Jun 10, 2015 11:16:30 AM we see some requests for /dataimport that 
look like they are coming from the UI. and then at Jun 10, 2015 11:17:01 
AM we see a request for a full import started.

We have no idea what the data import configuration file looks like, so we 
have no idea if clean=false is being used or not.  it's certianly not 
specified in the URL.

We see some more monitoring URLs returning hits=85898 and some more 
/repliation status calls, and then @ Jun 10, 2015 11:18:02 AM we see the 
first commit executed since hte server started up.  

there's no indication that this commit came from an external request (eg 
/update) so probably was made by some internal request.  One 
possiblility is that it came from DIH finishing -- but i doubt it, i'm 
fairly sure that would have involved more logging then this.  A more 
probably scenerio is that it came from an autoCommit setting -- the fact 
that it is almost exactly 60 seconds after DIH started -- and almost 
exactly 60 seconds after DIH may have done a deleteAll query due to 
clean=true -- makes it seem very likely that this was a 1 minute 
autoCommit)

(but since we don't have either hte data import config, or the 
solrconfig.xml, we have no way of knowing -- it's all just guess work.)

Very importantly, note that this commit is not opening a new searcher...

Jun 10, 2015 11:18:02 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

Here are some other interesting things to note from the logging 
that comes from the DeletionPolicy when this commit happens...

1) it now notes that there are commits:num=2 on disk
2) the current index dir hasn't changed (index.20150311161021822) so 
some weird replication command didn't swap the world out from under us
3) the newest segment/generation are segFN=segments_1b,generation=47
4) the newest commit has no other files in it besides the segments file.

this means, with out a doubt, there are no documents in this commits view 
of the index.  they have all been deleted by something.


At this point the *old* searcher (for commit generation 46) is still in 
use however -- nothing has done an openSearcher=true.

we see more /dataimport status requests, and other requests that appear to 
come from the Solr UI, and more monitoring queries that still return 
hits=85898 because the same searcher is in use.

At Jun 10, 2015 11:27:04 AM we see another commit happen -- again, no 
indication that this came from an outside /update request, so it might be 
from DIH, or it might be from an autoCommit setting.  the fact that it is 
nearly exactly 10 minutes after DIH started (and probably did a clean=true 
deleteAll query) makes it seem extremely likely this is an autoSoftCommit 
setting kicking in.

Very importantly, note that this softCommit *does* open a new searcher...

Jun 10, 2015 11:27:04 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}


In less then a second, this new searcher is warmed up and the next time we 
see a q=* monitoring query get logged, it returns hits=0.

Note that at no point in the logs, after the DataImporter is started, do 
we see it log anything other then that it has initiated the request to 
MySQL -- we do see some logs starting ~ Jun 10, 2015 11:41:19 AM 
indicating that someone was using the Web UI to look at the dataimport 
handler's status report.  it would be really nice to know what that person 
saw at that point -- because my guess is DIH was still running and was 
staled waiting for MySql, and hadn't even started adding docs to Solr (if 
it had, i'm certian there would have been some log of it).

So instead, the combination of a (probable) DIH clean=true option and a 
(near certainty) autoCommit=60sec and autoSoftCommit=10min ment that a new 
commit was created after the clean, and that commit was 

Re: indexing issue

2015-06-04 Thread Midas A
sorry Shawn ,

a) Total docs solr is handling is 3 million .
b) index size is only 5 GB



On Thu, Jun 4, 2015 at 9:35 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 6/4/2015 7:38 AM, Midas A wrote:
  On Thu, Jun 4, 2015 at 6:48 PM, Shawn Heisey apa...@elyograg.org
 wrote:
 
  On 6/4/2015 5:15 AM, Midas A wrote:
  I have some indexing issue . While indexing IOwait is high in solr
 server
  and load also.
  My first suspect here is that you don't have enough RAM for your index
  size.
 
  * How many total docs is Solr handling (all cores)?
 
   --30,0 dos
 
  * What is the total size on disk of all your cores?
 
   --  600 GB
 
  * How much RAM does the machine have?
 
   --48 GB
 
  * What is the java max heap?
  --30 GB(jvm)

 Is that 3 million docs or 30 million docs?  The actual numbers are 3
 million, but you put a single comma in the number after the 30, so I am
 not sure which you meant.  Either way, those documents must be quite
 large, to make a 600GB index.  30 million docs in my index would only be
 about 30GB.

 With 48 GB of RAM, 30 GB allocated to Solr, and a 600GB index, you don't
 have anywhere even close to enough RAM to cache your index effectively.
 There's only 18GB of RAM left over for the OS disk cache.  That's only 3
 percent of the index data that can fit in the OS disk cache.  I would
 imagine that you're going to need to be able to fit somewhere between 25
 and 50 percent of the index into RAM, which would mean that you're going
 to want around 256GB of RAM for that index. 128GB *might* be enough.
 Alternatively, you could work on making your index smaller -- but be
 aware that to improve performance with low memory, you need to reduce
 the *indexed* part, the *stored* part makes little difference.

 Another potential problem with a 30GB heap is related to garbage
 collection tuning.  If you haven't tuned your GC at all, then
 performance will be terrible on a heap that large, especially when you
 are indexing.  The wiki page I linked on my previous reply contains a
 link to my personal page, which covers GC tuning:

 https://wiki.apache.org/solr/ShawnHeisey

 Thanks,
 Shawn




Re: indexing issue

2015-06-04 Thread Shawn Heisey
On 6/4/2015 11:12 AM, Midas A wrote:
 sorry Shawn ,

 a) Total docs solr is handling is 3 million .
 b) index size is only 5 GB

If your total index size is only 5GB, then there should be no need for a
30GB heap.  For that much index, I'd start with 4GB, and implement GC
tuning.

A high iowait doesn't make any sense for that situation, but it WOULD
make sense with 600 GB of total index.

Thanks,
Shawn



Re: indexing issue

2015-06-04 Thread Midas A
Shwan,

Please find the log . give me some sense what is happening

On Thu, Jun 4, 2015 at 10:56 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 6/4/2015 11:12 AM, Midas A wrote:
  sorry Shawn ,
 
  a) Total docs solr is handling is 3 million .
  b) index size is only 5 GB

 If your total index size is only 5GB, then there should be no need for a
 30GB heap.  For that much index, I'd start with 4GB, and implement GC
 tuning.

 A high iowait doesn't make any sense for that situation, but it WOULD
 make sense with 600 GB of total index.

 Thanks,
 Shawn


2015-06-04 18:44:56
Full thread dump OpenJDK 64-Bit Server VM (24.45-b08 mixed mode):

qtp1122335225-81 prio=10 tid=0x2ab280f92800 nid=0x44e4 waiting on condition [0x40293000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  0x2aaab8aa0c00 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
	at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
	at java.lang.Thread.run(Thread.java:744)

qtp1122335225-80 prio=10 tid=0x2ab280f8e800 nid=0x44e3 runnable [0x43151000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:375)
	at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
	at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1035)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:744)

Attach Listener daemon prio=10 tid=0x139c7800 nid=0x44e2 waiting on condition [0x]
   java.lang.Thread.State: RUNNABLE

qtp1122335225-77 prio=10 tid=0x2ab280224000 nid=0x3196 runnable [0x41eac000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:375)
	at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
	at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1035)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:744)

qtp1122335225-76 prio=10 tid=0x2ab280f7f000 nid=0x3195 runnable [0x40691000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:375)
	at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
	at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1035)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
	at 

Re: indexing issue

2015-06-04 Thread Midas A
we are indexing around 5 docs par 10 min .

On Thu, Jun 4, 2015 at 11:02 PM, Midas A test.mi...@gmail.com wrote:

 Shwan,

 Please find the log . give me some sense what is happening

 On Thu, Jun 4, 2015 at 10:56 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 6/4/2015 11:12 AM, Midas A wrote:
  sorry Shawn ,
 
  a) Total docs solr is handling is 3 million .
  b) index size is only 5 GB

 If your total index size is only 5GB, then there should be no need for a
 30GB heap.  For that much index, I'd start with 4GB, and implement GC
 tuning.

 A high iowait doesn't make any sense for that situation, but it WOULD
 make sense with 600 GB of total index.

 Thanks,
 Shawn





Re: indexing issue

2015-06-04 Thread Shawn Heisey
On 6/4/2015 7:38 AM, Midas A wrote:
 On Thu, Jun 4, 2015 at 6:48 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 6/4/2015 5:15 AM, Midas A wrote:
 I have some indexing issue . While indexing IOwait is high in solr server
 and load also.
 My first suspect here is that you don't have enough RAM for your index
 size.

 * How many total docs is Solr handling (all cores)?

  --30,0 dos

 * What is the total size on disk of all your cores?

  --  600 GB

 * How much RAM does the machine have?

  --48 GB

 * What is the java max heap?
 --30 GB(jvm)

Is that 3 million docs or 30 million docs?  The actual numbers are 3
million, but you put a single comma in the number after the 30, so I am
not sure which you meant.  Either way, those documents must be quite
large, to make a 600GB index.  30 million docs in my index would only be
about 30GB.

With 48 GB of RAM, 30 GB allocated to Solr, and a 600GB index, you don't
have anywhere even close to enough RAM to cache your index effectively. 
There's only 18GB of RAM left over for the OS disk cache.  That's only 3
percent of the index data that can fit in the OS disk cache.  I would
imagine that you're going to need to be able to fit somewhere between 25
and 50 percent of the index into RAM, which would mean that you're going
to want around 256GB of RAM for that index. 128GB *might* be enough. 
Alternatively, you could work on making your index smaller -- but be
aware that to improve performance with low memory, you need to reduce
the *indexed* part, the *stored* part makes little difference.

Another potential problem with a 30GB heap is related to garbage
collection tuning.  If you haven't tuned your GC at all, then
performance will be terrible on a heap that large, especially when you
are indexing.  The wiki page I linked on my previous reply contains a
link to my personal page, which covers GC tuning:

https://wiki.apache.org/solr/ShawnHeisey

Thanks,
Shawn



indexing issue

2015-06-04 Thread Midas A
I have some indexing issue . While indexing IOwait is high in solr server
and load also.


Re: indexing issue

2015-06-04 Thread Toke Eskildsen
On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
 I have some indexing issue . While indexing IOwait is high in solr server
 and load also.

Might be because you commit too frequently. How often do you do that?

- Toke Eskildsen, State and University Library, Denmark




Re: indexing issue

2015-06-04 Thread Alessandro Benedetti
I think this mail is really poor in term of details.
Which version of Solr are you using ?
Architecture ?
Load expected ?
Indexing approach ?
When does your problem happens ?

More detail we give, easier will be to provide help.

Cheers

2015-06-04 12:19 GMT+01:00 Toke Eskildsen t...@statsbiblioteket.dk:

 On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
  I have some indexing issue . While indexing IOwait is high in solr server
  and load also.

 Might be because you commit too frequently. How often do you do that?

 - Toke Eskildsen, State and University Library, Denmark





-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: indexing issue

2015-06-04 Thread Midas A
Thanks for replying below is commit frequency

autoCommit maxTime6/maxTime !-- currently 1 min, old value is
15000 -- openSearcherfalse/openSearcher /autoCommit autoSoftCommit
maxTime60/maxTime !-- currently 10 min, old value is 500 -- /
autoSoftCommit


On Thu, Jun 4, 2015 at 4:49 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
  I have some indexing issue . While indexing IOwait is high in solr server
  and load also.

 Might be because you commit too frequently. How often do you do that?

 - Toke Eskildsen, State and University Library, Denmark





Re: indexing issue

2015-06-04 Thread Midas A
Thanks Alessandro,

Please find the info inline .

Which version of Solr are you using : 4.2.1

   - Architecture : Master -slave

Load expected : currently it is 7- 15 should be below 1
Indexing approach : Using DIH
When does your problem happens :  we run delta import every 10 mins full
index once a day .. some time it goes to 7-15


On Thu, Jun 4, 2015 at 4:52 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 I think this mail is really poor in term of details.
 Which version of Solr are you using ?
 Architecture ?
 Load expected ?
 Indexing approach ?
 When does your problem happens ?

 More detail we give, easier will be to provide help.

 Cheers

 2015-06-04 12:19 GMT+01:00 Toke Eskildsen t...@statsbiblioteket.dk:

  On Thu, 2015-06-04 at 16:45 +0530, Midas A wrote:
   I have some indexing issue . While indexing IOwait is high in solr
 server
   and load also.
 
  Might be because you commit too frequently. How often do you do that?
 
  - Toke Eskildsen, State and University Library, Denmark
 
 
 


 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: indexing issue

2015-06-04 Thread Alessandro Benedetti
Honestly your auto-commit configuration seems not alarming at all!
Can you give me more details regarding :

Load expected : currently it is 7- 15 should be below 1

What does this mean ? Without a unit of measure i find hard to understand
plain numbers :)
 was expecting the number of documents per unit of time you index, and an
average size of these docs.
Which kind of DIH processor ? Where is your data coming from ? A database ?

Let's try to improve the understanding of the situation and then evaluate
an approach.

Cheers

​


Re: indexing issue

2015-06-04 Thread Midas A
Hi Alessandro,



On Thu, Jun 4, 2015 at 5:19 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 Honestly your auto-commit configuration seems not alarming at all!
 Can you give me more details regarding :

 Load expected : currently it is 7- 15 should be below 1
 *[Abhishek] :  solr server load average.*
 What does this mean ? Without a unit of measure i find hard to understand
 plain numbers :)



  was expecting the number of documents per unit of time you index, and an
 average size of these docs.

*   [Abhishek] :  avg size of doc : 250 kb *
autoCommit maxTime6/maxTime !-- currently 1 min, old value is
15000 -- openSearcherfalse/openSearcher /autoCommit
we have not specified Max docs limit

Which kind of DIH processor ? Where is your data coming from ? A database ?
 *  [Abhishek] :  Using mysql data base and inbuilt  Solr DIH  (Data import
 handler)*



 Let's try to improve the understanding of the situation and then evaluate
 an approach.

 Cheers

 ​



Re: indexing issue

2015-06-04 Thread Shawn Heisey
On 6/4/2015 5:15 AM, Midas A wrote:
 I have some indexing issue . While indexing IOwait is high in solr server
 and load also.

My first suspect here is that you don't have enough RAM for your index size.

* How many total docs is Solr handling (all cores)?
* What is the total size on disk of all your cores?
* How much RAM does the machine have?
* What is the java max heap?

Here is some additional information on memory requirements for Solr:

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

When Alessandro asked about the load on Solr, the hope was to find out
your *rate* of indexing and querying, not the load average from the
operating system.  Indexing requires a fair amount of heap memory and
CPU resources.  If your heap is too small, then Java might have to work
extremely hard to free up memory for normal operation.

Thanks,
Shawn



Re: indexing issue

2015-06-04 Thread Midas A
Hi shawn,

Please find comment in line.

On Thu, Jun 4, 2015 at 6:48 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 6/4/2015 5:15 AM, Midas A wrote:
  I have some indexing issue . While indexing IOwait is high in solr server
  and load also.

 My first suspect here is that you don't have enough RAM for your index
 size.

 * How many total docs is Solr handling (all cores)?

 --30,0 dos

 * What is the total size on disk of all your cores?

 --  600 GB

 * How much RAM does the machine have?

 --48 GB

 * What is the java max heap?
 --30 GB(jvm)
 Here is some additional information on memory requirements for Solr:

 https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

 When Alessandro asked about the load on Solr, the hope was to find out
 your *rate* of indexing and querying, not the load average from the
 operating system.  Indexing requires a fair amount of heap memory and
 CPU resources.  If your heap is too small, then Java might have to work
 extremely hard to free up memory for normal operation.

 Thanks,
 Shawn




solr parallel update and total indexing Issue

2014-04-23 Thread ~$alpha`
There is a bis issue in solr parallel update and total indexing 

Total Import syntax (working) 
dataimport?command=full-importcommit=trueoptimize=true 

Update syntax(working) 
solr/update?softCommit=true' -H 'Content-type:application/json' -d
'[{id:1870719,column:{set:11}}]' 


Issue: If both are run in parallel, then commit in b/w take place. 

Example: i have 10k in total indexes i fire an solr query to update 1000
records and in between i fire a total import(full indexer) what's
happening is that in between commit is taken place... i.e untill total
indexer runs i got limited records(1000). 

How to solve this ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-parallel-update-and-total-indexing-Issue-tp4132652.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Full Indexing issue (solr-user@lucene.apache.org)

2014-04-21 Thread Candygram Mongo (Google Drive)

I've shared an item with you:

Solr Full Indexing issue
https://drive.google.com/folderview?id=0B7UpFqsS5lSjWEhxRE1NN2tMNTQusp=sharinginvite=CJXE8q4O

It's not an attachment -- it's stored online. To open this item, just click  
the link above.




solr parallel update and total indexing Issue

2014-04-18 Thread ~$alpha`
There is a bis issue in solr parallel update and total indexing

Total Import syntax (working)
dataimport?command=full-importcommit=trueoptimize=true 

Update syntax(working)
solr/update?softCommit=true' -H 'Content-type:application/json' -d
'[{id:1870719,column:{set:11}}]'


Issue: If both are run in parallel, then commit in b/w take place.

Example: i have 10k in total indexes i fire an solr query to update 1000
records and in between i fire a total import(full indexer) what's
happening is that in between commit is taken place... i.e untill total
indexer runs i got limited records(1000).

How to solve this ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-parallel-update-and-total-indexing-Issue-tp4131935.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr parallel update and total indexing Issue

2014-04-18 Thread Erick Erickson
try not setting softCommit=true, that's going to take the current
state of your index and make it visible. If your DIH process has
deleted all your records, then that's the current state.

Personally I wouldn't try to mix-n-match like this, the results will
take forever to get right. If you absolutely must do something like
this, I'd use collection aliasing to rebuild my index in a different
collection then switch from the old to new one in a controlled
fashion.

Best,
Erick

On Thu, Apr 17, 2014 at 11:37 PM, ~$alpha` lavesh.ra...@gmail.com wrote:
 There is a bis issue in solr parallel update and total indexing

 Total Import syntax (working)
 dataimport?command=full-importcommit=trueoptimize=true

 Update syntax(working)
 solr/update?softCommit=true' -H 'Content-type:application/json' -d
 '[{id:1870719,column:{set:11}}]'


 Issue: If both are run in parallel, then commit in b/w take place.

 Example: i have 10k in total indexes i fire an solr query to update 1000
 records and in between i fire a total import(full indexer) what's
 happening is that in between commit is taken place... i.e untill total
 indexer runs i got limited records(1000).

 How to solve this ?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-parallel-update-and-total-indexing-Issue-tp4131935.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing issue

2012-09-23 Thread Erick Erickson
That's exactly how I would expect WordDelimiterFilterFactory to
split up that input.

You really need to look at the analysis chain to understand what
happens here, simply saying the field text isn't enough. What I'm
looking for is the fieldType... definition.

In solr 3.6, for example, there's no fieldType name=text. defined,
so I have no idea what the analysis chain your field is using. Unfortunately
the screenshot you provided doesn't show that info, looks like it was cropped.
The nearest I can get is that there is a _field_ named text, but its type
is text_general and that doesn't split up the token as you've shown, so you
must be using some other version than 3.6 or you've customized it

BTW, clicking the verbose checkbox on the analysis page will give you the
name of the filters along with each transformation...

Best
Erick

On Fri, Sep 21, 2012 at 4:36 AM, zainu zainu...@gmail.com wrote:
 Thank you very much guys for your help.
 @Erick
 FieldType is Text and from anylsis following is the result.
 http://lucene.472066.n3.nabble.com/file/n4009372/Unbenannt.png

 From image, you can see its not tokenizing every possible segment of
 '8E0061123-8E1' but just some of them.




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122p4009372.html
 Sent from the Solr - User mailing list archive at Nabble.com.


indexing issue

2012-09-20 Thread zainu
Dear fellows,
I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*',
it does return me all values starting with'8E' which is totally right but it
returns nothing when i search '8E0*'. I guess it is not indexing 8E0 or so.
I want to search with all combinations likes '8E', '8E0', '8E00', '8E006'
etc. But currently it return result only when i type 8E or comeplete
''8E0061123-8E1'...any idea??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing issue

2012-09-20 Thread Erick Erickson
Not enough info to go on here, what is your fieldType?

But the first place to look is admin/analysis to see how the
text is tokenized.

Best
Erick

On Thu, Sep 20, 2012 at 5:49 AM, zainu zainu...@gmail.com wrote:
 Dear fellows,
 I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*',
 it does return me all values starting with'8E' which is totally right but it
 returns nothing when i search '8E0*'. I guess it is not indexing 8E0 or so.
 I want to search with all combinations likes '8E', '8E0', '8E00', '8E006'
 etc. But currently it return result only when i type 8E or comeplete
 ''8E0061123-8E1'...any idea??



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing issue

2012-09-20 Thread Jack Krupansky
You probably are using a text field which is tokenizing the input when 
this data should probably be a string (or text with the 
KeywordAnalyzer.)


-- Jack Krupansky

-Original Message- 
From: zainu

Sent: Thursday, September 20, 2012 5:49 AM
To: solr-user@lucene.apache.org
Subject: indexing issue

Dear fellows,
I have a field in solr with value '8E0061123-8E1'. Now when i seach '8E*',
it does return me all values starting with'8E' which is totally right but it
returns nothing when i search '8E0*'. I guess it is not indexing 8E0 or so.
I want to search with all combinations likes '8E', '8E0', '8E00', '8E006'
etc. But currently it return result only when i type 8E or comeplete
''8E0061123-8E1'...any idea??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-issue-tp4009122.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Mark Miller
we really need to resolve that issue soon...

On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:

 Yury,
 
 Thank you so much! That was it. Man, I spent a good long while trouble
 shooting this. Probably would have spent quite a bit more time. I
 appreciate your help!!
 
 -Briggs
 
 On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote:
 
 On 7/18/2012 7:11 PM, Briggs Thompson wrote:
 I have realized this is not specific to SolrJ but to my instance of
 Solr. Using curl to delete by query is not working either.
 
 Can be this: https://issues.apache.org/jira/browse/SOLR-3432
 

- Mark Miller
lucidimagination.com













Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Briggs Thompson
This is unrelated for the most part, but the javabin update request handler
does not seem to be working properly when calling solrj
method*HttpSolrServer.deleteById(ListString ids)
*. A single Id gets deleted from the index as opposed to the full list. It
appears properly in the logs - shows delete of all Ids sent, although all
but one remain in the index.

I confirmed that the default update request handler deletes the list
properly, so this appears to be a problem with
the BinaryUpdateRequestHandler.

Not an issue for me, just spreading the word.

Thanks,
Briggs

On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller markrmil...@gmail.com wrote:

 we really need to resolve that issue soon...

 On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:

  Yury,
 
  Thank you so much! That was it. Man, I spent a good long while trouble
  shooting this. Probably would have spent quite a bit more time. I
  appreciate your help!!
 
  -Briggs
 
  On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote:
 
  On 7/18/2012 7:11 PM, Briggs Thompson wrote:
  I have realized this is not specific to SolrJ but to my instance of
  Solr. Using curl to delete by query is not working either.
 
  Can be this: https://issues.apache.org/jira/browse/SOLR-3432
 

 - Mark Miller
 lucidimagination.com














Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Mark Miller
https://issues.apache.org/jira/browse/SOLR-3649

On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson 
w.briggs.thomp...@gmail.com wrote:

 This is unrelated for the most part, but the javabin update request handler
 does not seem to be working properly when calling solrj
 method*HttpSolrServer.deleteById(ListString ids)
 *. A single Id gets deleted from the index as opposed to the full list. It
 appears properly in the logs - shows delete of all Ids sent, although all
 but one remain in the index.

 I confirmed that the default update request handler deletes the list
 properly, so this appears to be a problem with
 the BinaryUpdateRequestHandler.

 Not an issue for me, just spreading the word.

 Thanks,
 Briggs

 On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller markrmil...@gmail.com
 wrote:

  we really need to resolve that issue soon...
 
  On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:
 
   Yury,
  
   Thank you so much! That was it. Man, I spent a good long while trouble
   shooting this. Probably would have spent quite a bit more time. I
   appreciate your help!!
  
   -Briggs
  
   On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote:
  
   On 7/18/2012 7:11 PM, Briggs Thompson wrote:
   I have realized this is not specific to SolrJ but to my instance of
   Solr. Using curl to delete by query is not working either.
  
   Can be this: https://issues.apache.org/jira/browse/SOLR-3432
  
 
  - Mark Miller
  lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 




-- 
- Mark

http://www.lucidimagination.com


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Briggs Thompson
Thanks Mark!

On Thu, Jul 19, 2012 at 4:07 PM, Mark Miller markrmil...@gmail.com wrote:

 https://issues.apache.org/jira/browse/SOLR-3649

 On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com wrote:

  This is unrelated for the most part, but the javabin update request
 handler
  does not seem to be working properly when calling solrj
  method*HttpSolrServer.deleteById(ListString ids)
  *. A single Id gets deleted from the index as opposed to the full list.
 It
  appears properly in the logs - shows delete of all Ids sent, although all
  but one remain in the index.
 
  I confirmed that the default update request handler deletes the list
  properly, so this appears to be a problem with
  the BinaryUpdateRequestHandler.
 
  Not an issue for me, just spreading the word.
 
  Thanks,
  Briggs
 
  On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller markrmil...@gmail.com
  wrote:
 
   we really need to resolve that issue soon...
  
   On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:
  
Yury,
   
Thank you so much! That was it. Man, I spent a good long while
 trouble
shooting this. Probably would have spent quite a bit more time. I
appreciate your help!!
   
-Briggs
   
On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com
 wrote:
   
On 7/18/2012 7:11 PM, Briggs Thompson wrote:
I have realized this is not specific to SolrJ but to my instance of
Solr. Using curl to delete by query is not working either.
   
Can be this: https://issues.apache.org/jira/browse/SOLR-3432
   
  
   - Mark Miller
   lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
 



 --
 - Mark

 http://www.lucidimagination.com



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
I have realized this is not specific to SolrJ but to my instance of Solr.
Using curl to delete by query is not working either.

Running
curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml
--data-binary 'deletequery*:*/query/delete'

Yields this in the logs:
INFO: [coupon] webapp=/solr path=/update
params={stream.body=deletequery*:*/query/delete}
{deleteByQuery=*:*} 0 0

But the corpus of documents in the core do not change.

My solrconfig is pretty barebones at this point, but I attached it in case
anyone sees something strange. Anyone have any idea why documents aren't
getting deleted?

Thanks in advance,
Briggs Thompson

On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson 
w.briggs.thomp...@gmail.com wrote:

 Hello All,

 I am using 4.0 Alpha and running into an issue with indexing using
 HttpSolrServer (SolrJ).

 Relevant java code:
 HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
 solrServer.setRequestWriter(new BinaryRequestWriter());

 Relevant Solrconfig.xml content:

   requestHandler name=/update class=solr.UpdateRequestHandler  /

   requestHandler name=/update/javabin
 class=solr.BinaryUpdateRequestHandler /

 Indexing documents works perfectly fine (using addBeans()), however, when
 trying to do deletes I am seeing issues. I tried to do
 a solrServer.deleteByQuery(*:*) followed by a commit and optimize, and
 nothing is deleted.

 The response from delete request is a success, and even in the solr logs
 I see the following:

 INFO: [coupon] webapp=/solr path=/update/javabin
 params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1
 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start
 commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}



 I tried removing the binaryRequestWriter and have the request send out in
 default format, and I get the following error.

 SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType:
 application/octet-stream  Not in: [application/xml, text/csv, text/json,
 application/csv, application/javabin, text/xml, application/json]

 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
  at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
  at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
  at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:636)


 I thought that an optimize does the same thing as expungeDeletes, but in
 the log I see expungeDeletes=false. Is there a way to force that using
 SolrJ?

 Thanks in advance,
 Briggs


?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--
 This is a stripped down 

Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Brendan Grainger
Hi Briggs,

I'm not sure about Solr 4.0, but do you need to commit?

 curl http://localhost:8983/solr/coupon/update?commit=true -H Content-Type: 
 text/xml --data-binary 'deletequery*:*/query/delete'


Brendan


www.kuripai.com

On Jul 18, 2012, at 7:11 PM, Briggs Thompson wrote:

 I have realized this is not specific to SolrJ but to my instance of Solr. 
 Using curl to delete by query is not working either. 
 
 Running 
 curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml 
 --data-binary 'deletequery*:*/query/delete'
 
 Yields this in the logs:
 INFO: [coupon] webapp=/solr path=/update 
 params={stream.body=deletequery*:*/query/delete} {deleteByQuery=*:*} 
 0 0
 
 But the corpus of documents in the core do not change. 
 
 My solrconfig is pretty barebones at this point, but I attached it in case 
 anyone sees something strange. Anyone have any idea why documents aren't 
 getting deleted?
 
 Thanks in advance,
 Briggs Thompson
 
 On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com wrote:
 Hello All,
 
 I am using 4.0 Alpha and running into an issue with indexing using 
 HttpSolrServer (SolrJ). 
 
 Relevant java code:
 HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
 solrServer.setRequestWriter(new BinaryRequestWriter());
 
 Relevant Solrconfig.xml content:
   requestHandler name=/update class=solr.UpdateRequestHandler  /
   requestHandler name=/update/javabin 
 class=solr.BinaryUpdateRequestHandler /
 
 Indexing documents works perfectly fine (using addBeans()), however, when 
 trying to do deletes I am seeing issues. I tried to do a 
 solrServer.deleteByQuery(*:*) followed by a commit and optimize, and 
 nothing is deleted. 
 
 The response from delete request is a success, and even in the solr logs I 
 see the following:
 INFO: [coupon] webapp=/solr path=/update/javabin 
 params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1
 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start 
 commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
 
 
 I tried removing the binaryRequestWriter and have the request send out in 
 default format, and I get the following error. 
 SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType: 
 application/octet-stream  Not in: [application/xml, text/csv, text/json, 
 application/csv, application/javabin, text/xml, application/json]
   at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:636)
 
 
 I thought that an optimize does the same thing as expungeDeletes, but in the 
 log I see expungeDeletes=false. Is there a way to force that using SolrJ?
 
 Thanks in advance,
 Briggs
 
 
 solrconfig.xml



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Yury Kats
On 7/18/2012 7:11 PM, Briggs Thompson wrote:
 I have realized this is not specific to SolrJ but to my instance of Solr. 
 Using curl to delete by query is not working either. 

Can be this: https://issues.apache.org/jira/browse/SOLR-3432


Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
Yury,

Thank you so much! That was it. Man, I spent a good long while trouble
shooting this. Probably would have spent quite a bit more time. I
appreciate your help!!

-Briggs

On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote:

 On 7/18/2012 7:11 PM, Briggs Thompson wrote:
  I have realized this is not specific to SolrJ but to my instance of
 Solr. Using curl to delete by query is not working either.

 Can be this: https://issues.apache.org/jira/browse/SOLR-3432



Indexing Issue between Mac OS X 10.5 and 10.6

2011-01-07 Thread Kevin Murdoff
Greetings Everyone -

I am hoping someone can help me with this unusual issue I have here.

Issue
Indexing information in a database (i.e.  /dataimport [full-import]) succeeds 
when I perform this function on a Mac OS X 10.6 with Java 1.6, but fails when I 
attempt the same indexing task on a 10.5 / Java 1.5 server.  When the indexing 
succeeds, I end up with 211,095 documents.  When the indexing fails (on the 
10.5 machine), I end up with 58,286 documents.  The error I receive in the 
Tomcat 'catalina.out' log file is:

SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.StackOverflowError
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.StackOverflowError
at com.frontbase.jdbc.FBJRowHandler.close(Unknown Source)
at com.frontbase.jdbc.FBJRowHandler.close(Unknown Source)
...

Background
I want to index the database information as a single document in Solr 1.4.1.  
The document, as defined in the 'data-config.xml' file, has 10 entities, each 
with 5 primitive fields and 2 entity fields.  Most of these 10 entities do not 
represent very large datasets except one, which could represent over 95% of the 
result set.

I have tried tweaking the configuration values in the mainIndex section of 
the 'solrconfig.xml' file.  I lowered the maxFieldLength from 10,000 to 100, 
and lowered the mergeFactor from 10 to 5.  Making these changes, 
independently and together, did not exhibit any change in the indexing failures 
I have been experiencing.

I expanded the JVM min/max memory settings using -Xms and -Xmx set as high as 
1024/2048 respectively.

I also obtained the Solr-1.4.1 release source code, built it on the 10.5 /1.5 
server machine, and performed the same indexing task.  This resulted in the 
same stack overflow error.

Inquiry
Can someone tell me if they have experienced something similar?  If so, did you 
find a solution?  Or, does anyone know what may be causing these stack overflow 
errors?

Please let me know what other information I can provide that would be useful.

Thank you for your help!

- KFM



Fwd: indexing: issue with default values

2010-02-12 Thread nabil rabhi
in the schema.xml I have fileds with int type and default value
exp:  field name=postal_code type=int indexed=true stored=true
default=0/
but when a document has no value for the field postal_code
at indexing, I get the following error:

Posting file Immo.xml to http://localhost:8983/solr/update/
html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2preFor input string: 

java.lang.NumberFormatException: For input string: 
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
/pre

/body
/html

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime4/int/lst
/response

any help? thx


Re: indexing: issue with default values

2010-02-12 Thread Erik Hatcher
When a document has no value, are you still sending a postal_code  
field in your post to Solr?  Seems like you are.


Erik

On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:


in the schema.xml I have fileds with int type and default value
exp:  field name=postal_code type=int indexed=true  
stored=true

default=0/
but when a document has no value for the field postal_code
at indexing, I get the following error:

Posting file Immo.xml to http://localhost:8983/solr/update/
html
head
meta http-equiv=Content-Type content=text/html;  
charset=ISO-8859-1/

titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2preFor input string: 

java.lang.NumberFormatException: For input string: 
   at
java 
.lang 
.NumberFormatException.forInputString(NumberFormatException.java:48)

   at java.lang.Integer.parseInt(Integer.java:470)
   at java.lang.Integer.parseInt(Integer.java:499)
   at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
   at  
org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)

   at
org 
.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 
246)

   at
org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java: 
139)

   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at
org 
.apache 
.solr 
.handler 
.ContentStreamHandlerBase 
.handleRequestBody(ContentStreamHandlerBase.java:54)

   at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

   at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

   at
org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)

   at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 
365)

   at
org 
.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)

   at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 
181)

   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 
712)
   at  
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

   at
org 
.mortbay 
.jetty 
.handler 
.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)

   at
org 
.mortbay 
.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

   at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 
139)

   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)

   at
org.mortbay.jetty.HttpConnection 
$RequestHandler.content(HttpConnection.java:835)

   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at
org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)

   at
org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)

/pre

/body
/html

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime4/int/lst
/response

any help? thx




Re: indexing: issue with default values

2010-02-12 Thread nabil rabhi
yes, sometimes the document has postal_code with no values , i still post it
to solr
2010/2/12 Erik Hatcher erik.hatc...@gmail.com

 When a document has no value, are you still sending a postal_code field in
 your post to Solr?  Seems like you are.

Erik


 On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:

  in the schema.xml I have fileds with int type and default value
 exp:  field name=postal_code type=int indexed=true stored=true
 default=0/
 but when a document has no value for the field postal_code
 at indexing, I get the following error:

 Posting file Immo.xml to http://localhost:8983/solr/update/
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2preFor input string: 

 java.lang.NumberFormatException: For input string: 
   at

 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Integer.parseInt(Integer.java:470)
   at java.lang.Integer.parseInt(Integer.java:499)
   at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
   at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
   at

 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
   at

 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at

 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at

 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at

 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 /pre

 /body
 /html

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime4/int/lst
 /response

 any help? thx





Re: indexing: issue with default values

2010-02-12 Thread nabil rabhi
thanx Eric, that was very helpfull

2010/2/12 Erik Hatcher erik.hatc...@gmail.com

 That would be the problem then, I believe.  Simply don't post a value to
 get the default value to work.

Erik


 On Feb 12, 2010, at 10:18 AM, nabil rabhi wrote:

  yes, sometimes the document has postal_code with no values , i still post
 it
 to solr
 2010/2/12 Erik Hatcher erik.hatc...@gmail.com

  When a document has no value, are you still sending a postal_code field
 in
 your post to Solr?  Seems like you are.

  Erik


 On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote:

 in the schema.xml I have fileds with int type and default value

 exp:  field name=postal_code type=int indexed=true stored=true
 default=0/
 but when a document has no value for the field postal_code
 at indexing, I get the following error:

 Posting file Immo.xml to http://localhost:8983/solr/update/
 html
 head
 meta http-equiv=Content-Type content=text/html;
 charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2preFor input string: 

 java.lang.NumberFormatException: For input string: 
  at


 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
  at java.lang.Integer.parseInt(Integer.java:470)
  at java.lang.Integer.parseInt(Integer.java:499)
  at org.apache.solr.schema.TrieField.createField(TrieField.java:416)
  at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
  at


 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
  at


 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
  at


 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
  at


 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at


 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at


 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at


 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
  at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
  at


 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
  at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
  at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
  at


 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
  at


 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
  at org.mortbay.jetty.Server.handle(Server.java:285)
  at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
  at


 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
  at


 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
  at


 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 /pre

 /body
 /html

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime4/int/lst
 /response

 any help? thx







Indexing issue with XML control characters

2009-07-20 Thread Rupert Fiasco
During indexing I will often get this error:

SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 3))
 at [row,col {unknown-source}]: [2,1]
at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)


By looking at this list and elsewhere I know that I need to filter out
most control characters so I have been employing this regex:

/[\x00-\x08\x0B\x0C\x0E-\x1F]/

But I still get the error. What is strange is that if I re-run my
indexing process after a failure it will work on the previously failed
node and then error out on another node some time later. That is, it
is not deterministic. If I look at the text that is attempted to be
indexed its pure as you can get one (a bunch of medical keywords like
leg bones and nose).

Any ideas would be greatly appreciated.

The platform is:

Solr implementation version: 1.3.0 694707
Lucene implementation version: 2.4-dev 691741
Mac OS X 10.5.7
JVM 1.5.0_19-b02-304


Thanks
/Rupert


Re: Indexing issue in DIH - not all records are Indexed

2009-05-19 Thread jayakeerthi s
I changed the UniqueKey and it worked fine.thank you very much Nobel

2009/5/18 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 the problem is that your uniquekey may not be unique

 just remove the entry altogether

 On Mon, May 18, 2009 at 10:53 PM, jayakeerthi s mail2keer...@gmail.com
 wrote:
  Hi Noble,
  Many thanks for the reply
 
  Yes there is a UniqueKey in the Schema which is the ProductID.
 
  I also tried uniqueKey required=falsePROD_ID/uniqueKey. But no luck
  same only one document seen after querying *:*
 
  I have attached the Schema.xml used for your reference,please advise.
 
  Thanks and regards,
  Jay
 
  2009/5/16 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  check out if you have a uniqueKey in your schema. I there are
  duplicates they are overwritten
 
  On Sat, May 16, 2009 at 1:38 AM, jayakeerthi s mail2keer...@gmail.com
  wrote:
   I am using Solr for our application with JBoss Integration.
  
   I have managed to configure the indexing from Oralce db for 22
   fields.Here
   is the db-data-config.xml
  
   dataConfig
 dataSource type=JdbcDataSource
   driver=oracle.jdbc.driver.OracleDriver
   url=jdbc:oracle:thin:@camatld6.***.com:1521:atlasint
   user=service_product_lgd password=/
  
   document name=products
  
entity name=PROD transformer=RegexTransformer query=SELECT
   A.PROD_ID,A.PROD_CD,C.REG_CMRC_STYL_NM,C.SAP_LANG_ID,A.DIV_ID
   ,c.SIZE_RUN_DESC, c.INSM_DESC, c.OTSM_DESC, c.DIM_DESC,
c.PRFL_DESC,c.UPR_DESC,c.MDSL_DESC,c.OUTSL_DESC,c.CTNT_DESC,
   D.SPORT_ACTY_DESC, E.GNDR_AGE_DESC,
A.PO_GRID_DESC,A.COLR_DISP_CD, B.STYL_CD , A.SILO_ID, A.SILH_ID,
   F.SILH_DESC, g.SILO_DESC , h.FRST_PROD_OFFR_DT,
h.END_FTR_OFFR_DT,
  
  
 h.RETL_PR_AMT,h.RETL_CRCY_ID,h.WHSLE_PR_AMT,h.WHSLE_CRCY_ID,I.ORG_LGCY_DIV_CD
from
PROD A ,PROD_STYL B ,PROD_REG_CMRC_STYL C , PROD_SPORT_ACTY D ,
   PROD_GNDR_AGE E , PROD_SILH F, PROD_SILO G, PROD_REG H, ORG_DIV I
WHERE
A.PROD_STYL_ID=B.PROD_STYL_ID
AND A.PROD_STYL_ID = c.PROD_STYL_ID
AND B.PROD_STYL_ID = C.PROD_STYL_ID
AND A.SPORT_ACTY_ID = d.SPORT_ACTY_ID
AND A.GNDR_AGE_ID = E.GNDR_AGE_ID
and A.SILH_ID = F.SILH_ID
AND A.SILO_ID = G.SILO_ID
AND A.PROD_ID = H.PROD_ID
AND A.DIV_ID = I.DIV_ID 
/entity
  /document
 /dataConfig
  
   And I have attached the Schema.xml used.done a full-import
   http://localhost:8983/solr/dataimport?command=full-import
  
  
   response
   -
   lst name=responseHeader
   int name=status0/int
   int name=QTime0/int
   /lst
   -
   lst name=initArgs
   -
   lst name=defaults
   -
   str name=config
  
  
 C:\apache-solr-nightly\example\example-DIH\solr\db\conf\db-data-config.xml
   /str
   /lst
   /lst
   str name=commandfull-import/str
   str name=statusidle/str
   str name=importResponse/
   -
   lst name=statusMessages
   str name=Total Requests made to DataSource1/str
   str name=Total Rows Fetched15/str
   str name=Total Documents Skipped0/str
   str name=Full Dump Started2009-05-11 11:27:02/str
   -
   str name=
   Indexing completed. Added/Updated: 15 documents. Deleted 0 documents.
   /str
   str name=Committed2009-05-11 11:27:05/str
   str name=Optimized2009-05-11 11:27:05/str
   str name=Time taken 0:0:2.625/str
   /lst
   -
   str name=WARNING
   This response format is experimental.  It is likely to change in the
   future.
   /str
   /response
  
   The issue I am facing is:though the response is Indexing completed.
   Added/Updated: 15 documents. Deleted 0 documents
   I am able to seee only one document when I query *:* so all the other
 14
   documents are missing.
   Similarly I tried indexing 1 million records and found only 2500 docs
 by
   using *:* query
  
   So could anyone please help resolving this.
  
  
   Regards,
   Jay
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 



 --
  -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Re: Indexing issue in DIH - not all records are Indexed

2009-05-18 Thread jayakeerthi s
Hi Noble,
Many thanks for the reply

Yes there is a UniqueKey in the Schema which is the ProductID.

I also tried uniqueKey required=falsePROD_ID/uniqueKey. But no luck
same only one document seen after querying *:*

I have attached the Schema.xml used for your reference,please advise.

Thanks and regards,
Jay

2009/5/16 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 check out if you have a uniqueKey in your schema. I there are
 duplicates they are overwritten

 On Sat, May 16, 2009 at 1:38 AM, jayakeerthi s mail2keer...@gmail.com
 wrote:
  I am using Solr for our application with JBoss Integration.
 
  I have managed to configure the indexing from Oralce db for 22
 fields.Here
  is the db-data-config.xml
 
  dataConfig
dataSource type=JdbcDataSource
 driver=oracle.jdbc.driver.OracleDriver
  url=jdbc:oracle:thin:@camatld6.***.com:1521:atlasint
  user=service_product_lgd password=/
 
  document name=products
 
   entity name=PROD transformer=RegexTransformer query=SELECT
  A.PROD_ID,A.PROD_CD,C.REG_CMRC_STYL_NM,C.SAP_LANG_ID,A.DIV_ID
  ,c.SIZE_RUN_DESC, c.INSM_DESC, c.OTSM_DESC, c.DIM_DESC,
   c.PRFL_DESC,c.UPR_DESC,c.MDSL_DESC,c.OUTSL_DESC,c.CTNT_DESC,
  D.SPORT_ACTY_DESC, E.GNDR_AGE_DESC,
   A.PO_GRID_DESC,A.COLR_DISP_CD, B.STYL_CD , A.SILO_ID, A.SILH_ID,
  F.SILH_DESC, g.SILO_DESC , h.FRST_PROD_OFFR_DT,
   h.END_FTR_OFFR_DT,
 
 h.RETL_PR_AMT,h.RETL_CRCY_ID,h.WHSLE_PR_AMT,h.WHSLE_CRCY_ID,I.ORG_LGCY_DIV_CD
   from
   PROD A ,PROD_STYL B ,PROD_REG_CMRC_STYL C , PROD_SPORT_ACTY D ,
  PROD_GNDR_AGE E , PROD_SILH F, PROD_SILO G, PROD_REG H, ORG_DIV I
   WHERE
   A.PROD_STYL_ID=B.PROD_STYL_ID
   AND A.PROD_STYL_ID = c.PROD_STYL_ID
   AND B.PROD_STYL_ID = C.PROD_STYL_ID
   AND A.SPORT_ACTY_ID = d.SPORT_ACTY_ID
   AND A.GNDR_AGE_ID = E.GNDR_AGE_ID
   and A.SILH_ID = F.SILH_ID
   AND A.SILO_ID = G.SILO_ID
   AND A.PROD_ID = H.PROD_ID
   AND A.DIV_ID = I.DIV_ID 
   /entity
 /document
/dataConfig
 
  And I have attached the Schema.xml used.done a full-import
  http://localhost:8983/solr/dataimport?command=full-import
 
 
  response
  -
  lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  /lst
  -
  lst name=initArgs
  -
  lst name=defaults
  -
  str name=config
 
 C:\apache-solr-nightly\example\example-DIH\solr\db\conf\db-data-config.xml
  /str
  /lst
  /lst
  str name=commandfull-import/str
  str name=statusidle/str
  str name=importResponse/
  -
  lst name=statusMessages
  str name=Total Requests made to DataSource1/str
  str name=Total Rows Fetched15/str
  str name=Total Documents Skipped0/str
  str name=Full Dump Started2009-05-11 11:27:02/str
  -
  str name=
  Indexing completed. Added/Updated: 15 documents. Deleted 0 documents.
  /str
  str name=Committed2009-05-11 11:27:05/str
  str name=Optimized2009-05-11 11:27:05/str
  str name=Time taken 0:0:2.625/str
  /lst
  -
  str name=WARNING
  This response format is experimental.  It is likely to change in the
 future.
  /str
  /response
 
  The issue I am facing is:though the response is Indexing completed.
  Added/Updated: 15 documents. Deleted 0 documents
  I am able to seee only one document when I query *:* so all the other 14
  documents are missing.
  Similarly I tried indexing 1 million records and found only 2500 docs by
  using *:* query
 
  So could anyone please help resolving this.
 
 
  Regards,
  Jay



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--  
 This is the Solr schema file. This file should be named schema.xml and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml

 NOTE: this schema includes many optional features and should not
 be used for benchmarking.
--

schema name=example version=1.2
  !-- attribute name is the name of this schema and is only used for display 

Re: Indexing issue in DIH - not all records are Indexed

2009-05-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
check out if you have a uniqueKey in your schema. I there are
duplicates they are overwritten

On Sat, May 16, 2009 at 1:38 AM, jayakeerthi s mail2keer...@gmail.com wrote:
 I am using Solr for our application with JBoss Integration.

 I have managed to configure the indexing from Oralce db for 22 fields.Here
 is the db-data-config.xml

 dataConfig
   dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver
 url=jdbc:oracle:thin:@camatld6.***.com:1521:atlasint
 user=service_product_lgd password=/

 document name=products

  entity name=PROD transformer=RegexTransformer query=SELECT
 A.PROD_ID,A.PROD_CD,C.REG_CMRC_STYL_NM,C.SAP_LANG_ID,A.DIV_ID
 ,c.SIZE_RUN_DESC, c.INSM_DESC, c.OTSM_DESC, c.DIM_DESC,
  c.PRFL_DESC,c.UPR_DESC,c.MDSL_DESC,c.OUTSL_DESC,c.CTNT_DESC,
 D.SPORT_ACTY_DESC, E.GNDR_AGE_DESC,
  A.PO_GRID_DESC,A.COLR_DISP_CD, B.STYL_CD , A.SILO_ID, A.SILH_ID,
 F.SILH_DESC, g.SILO_DESC , h.FRST_PROD_OFFR_DT,
  h.END_FTR_OFFR_DT,
 h.RETL_PR_AMT,h.RETL_CRCY_ID,h.WHSLE_PR_AMT,h.WHSLE_CRCY_ID,I.ORG_LGCY_DIV_CD
  from
  PROD A ,PROD_STYL B ,PROD_REG_CMRC_STYL C , PROD_SPORT_ACTY D ,
 PROD_GNDR_AGE E , PROD_SILH F, PROD_SILO G, PROD_REG H, ORG_DIV I
  WHERE
  A.PROD_STYL_ID=B.PROD_STYL_ID
  AND A.PROD_STYL_ID = c.PROD_STYL_ID
  AND B.PROD_STYL_ID = C.PROD_STYL_ID
  AND A.SPORT_ACTY_ID = d.SPORT_ACTY_ID
  AND A.GNDR_AGE_ID = E.GNDR_AGE_ID
  and A.SILH_ID = F.SILH_ID
  AND A.SILO_ID = G.SILO_ID
  AND A.PROD_ID = H.PROD_ID
  AND A.DIV_ID = I.DIV_ID 
  /entity
/document
   /dataConfig

 And I have attached the Schema.xml used.done a full-import
 http://localhost:8983/solr/dataimport?command=full-import


 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 -
 lst name=initArgs
 -
 lst name=defaults
 -
 str name=config
 C:\apache-solr-nightly\example\example-DIH\solr\db\conf\db-data-config.xml
 /str
 /lst
 /lst
 str name=commandfull-import/str
 str name=statusidle/str
 str name=importResponse/
 -
 lst name=statusMessages
 str name=Total Requests made to DataSource1/str
 str name=Total Rows Fetched15/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2009-05-11 11:27:02/str
 -
 str name=
 Indexing completed. Added/Updated: 15 documents. Deleted 0 documents.
 /str
 str name=Committed2009-05-11 11:27:05/str
 str name=Optimized2009-05-11 11:27:05/str
 str name=Time taken 0:0:2.625/str
 /lst
 -
 str name=WARNING
 This response format is experimental.  It is likely to change in the future.
 /str
 /response

 The issue I am facing is:though the response is Indexing completed.
 Added/Updated: 15 documents. Deleted 0 documents
 I am able to seee only one document when I query *:* so all the other 14
 documents are missing.
 Similarly I tried indexing 1 million records and found only 2500 docs by
 using *:* query

 So could anyone please help resolving this.


 Regards,
 Jay



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Indexing issue

2009-03-17 Thread Chris Hostetter

: I have two cores in different machines which are referring to the same data 
directory.

this isn't really considered a supported configuration ... both solr 
instances are going to try and own the directory for updating, and 
unless you do somethign special to ensure only one has control you are
going to have problems...

: below error.   HTTP Status 500 - java.io.FileNotFoundException: 
: \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The 
: system cannot find the file specified) java.lang.RuntimeException: 
: java.io.FileNotFoundException: 
: \\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The 

...like this.  one core is mucking with the files in a way the other core 
doesn't know about.

: I have changed lockType to simple and none, but still no luck…
: Could you please correct me if I am doing wrong?

none isn't going to help you -- it's just going to make the problem 
worse (two misconfigured  instances of Solr in the same JVM could corrupt 
eachother with lockType=none).

simple is only going to help you on some filesystems -- sicne you said 
these two solr instances are running on different machines, that implies 
NFS (or something like it) and SimpleFSLockFactory doesn't work reliably 
in those cases.

If you want to get something like this working, you'll probably need 
to setup your own network based lockType (instead of relying on the 
filesystem)


-Hoss


Indexing issue

2009-03-03 Thread mahendra mahendra
Hi,
 
I have two cores in different machines which are referring to the same data 
directory.
I have implemented this mechanism to have fault tolerance in place, if any of 
the machines are down the fault tolerance take care to index data from other 
machine.
 
Since two cores are referring to the same data directory some times reindexing 
getting failed and it is showing the below error.
 
HTTP Status 500 - java.io.FileNotFoundException: 
\\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The system 
cannot find the file specified) java.lang.RuntimeException: 
java.io.FileNotFoundException: 
\\SolrShare\CollectionSet2\English\Auction\Auction0\index\_c.fdt (The system 
cannot find the file specified) at 
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) at 
org.apache.solr.core.SolrCore.init(SolrCore.java:470) at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:323) at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:107)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at
 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 
I have changed lockType to simple and none, but still no luck…
Could you please correct me if I am doing wrong?
 
Thanks in advance!!
 
Regards,
Mahendra