Re: Solr client

2017-09-01 Thread ganesh m

Thank you all for the reply. I have updated the solr client list.

Regards

Ganesh



On 31-08-2017 00:37, Leonardo Perez Pulido wrote:

Hi,
Apart from take a look at the Solr's wiki, I think one of the main reasons
why these API's are all out dated is that Solr itself provides the 'API' to
many different languages in the form of output formats.

Maybe you know that the main protocol used in Solr for communication with
its clients is HTTP. Many (if not all) of today's programming languages
provides a mean to send request to Solr via HTTP. And Solr 'responses' to
every one of those languages via different available response formats.

By default there are response formats for: JavaScript, Python, Ruby, and
SolrJ (Java). All that response formats are first-class citizens in Solr.

Have a look:
http://wiki.apache.org/solr/IntegratingSolr
https://lucene.apache.org/solr/guide/6_6/client-apis.html

Regards.

On Wed, Aug 30, 2017 at 1:59 PM, Alexandre Rafalovitch 
wrote:


We do have a page on the Wiki with a lot of that information.

Did you see it?

Regards,
 Alex


On 29 Aug. 2017 2:28 am, "Aditya"  wrote:

Hi

I am aggregating open source solr client libraries across all languages.
Below are the links. Very few projects are currently active. Most of them
are last updated few years back. Please provide me pointers, if i missed
any solr client library.

http://www.findbestopensource.com/tagged/solr-client
http://www.findbestopensource.com/tagged/solr-gui


Regards
Ganesh

PS: The website http://www.findbestopensource.com search is powered by
Solr.





Re: SolrCloud - Sharing zookeeper ensemble with Kafka

2017-07-11 Thread Ganesh M
Even we use same zookeeper for HBase and SolrCloud with corresponding
folder structure.

On Tue 11 Jul, 2017 7:01 pm Joe Obernberger, 
wrote:

> Vincenzo - we do this in our environment.  Zookeeper handles, HDFS,
> HBase, Kafka, and Solr Cloud.
>
> -Joe
>
>
> On 7/11/2017 4:18 AM, Vincenzo D'Amore wrote:
> > Hi All,
> >
> > in my test environment I've two Zookeeper instances one for SolrCloud
> > (6.6.0) and another for a Kafka server (2.11-0.10.1.0).
> >
> > My task (for now) is reading from a topic queue from the Kafka instance
> and
> > then writing all the documents in a Solr Collection.
> >
> > I write here just to ask if in your experience, I can share the zookeeper
> > instance (or ensemble) between the two server (instead of have two
> separate
> > instances) and if not what are the counter-indications.
> >
> > Thanks in advance for your time and best regards,
> > Vincenzo
> >
>
>


Re: Poll: Master-Slave or SolrCloud?

2017-04-30 Thread Ganesh M
We use zookeeper for Hadoop / HBase and so we use same ensemble for Solr
too. We are using Solr Cloud in EC2 instances with 6 collections containing
4 shards and 2 replicas.

We followed the one of the blog

in the internet for our setup and it's works fine. Though the setup is on
tomcat, for latest  solr version with Jetty can also be used with little
change.

Hope this is useful.

Regards,




On Sun, Apr 30, 2017 at 9:06 PM Shawn Heisey  wrote:

> On 4/25/2017 3:13 PM, Otis Gospodnetić wrote:
> > Could one run *only* embedded ZK on some SolrCloud nodes, sans any data?
> > It would be equivalent of dedicated Elasticsearch nodes, which is the
> > current ES best practice/recommendation.  I've never heard of anyone
> being
> > scared of running 3 dedicated master ES nodes, so if SolrCloud offered
> the
> > same, perhaps even completely hiding ZK from users, that would present
> the
> > same level of complexity (err, simplicity) ES users love about ES.  Don't
> > want to talk about SolrCloud vs. ES here at all, just trying to share
> > observations since we work a lot with both Elasticsearch and Solr(Cloud)
> at
> > Sematext.
>
> Yes, you could do that ... but I don't see any real value right now.
> You have to learn how to configure a redundant ZK ensemble and apply
> that configuration to the embedded servers manually.  Since that's not
> any different from what you'd do with an external ensemble, I think it's
> better to just use the external install.  As I understand it, elastic
> wrote their cluster code themselves ... it's part of ES, not provided by
> a separate software package, so their recommendation makes sense for ES.
>
> Using embedded ZK as you have described, there will be at least three
> extra Solr nodes that are not intended to host collections.  To keep it
> running this way, it will be important to explicitly avoid putting new
> collections on those nodes, because that won't happen by default.  With
> dedicated external ZK processes, there's no Solr node to worry about,
> and no need to create a "master node" capability.
>
> I'm not opposed to automated scripts included with Solr to configure and
> start standalone ZK processes, including a way to create an init
> script.  That would be very useful and go a long way towards extremely
> easy instructions for setting up a fault tolerant SolrCloud installation
> on multiple servers.
>
> In situations where ZK is installed on dedicated hardware, a native ZK
> will require less heap memory than one embedded in Solr, and probably
> will have slightly lower CPU requirements.
>
> SOLR-9386 does make your idea more viable because it brings the full
> capability of recent zookeeper configuration options to the embedded
> server.  It will be available in version 6.6.
>
> Thanks,
> Shawn
>
>


Re: Using multi valued field in solr cloud Graph Traversal Query

2017-04-24 Thread Ganesh M
Hi Joel,

Any idea from when multi value field is supported for gatherNodes ? I am
using version 6.5 ? Is it already there ?

Kindly update,
Ganesh

On Sat, Mar 11, 2017 at 7:51 AM Joel Bernstein  wrote:

> Currently gatherNodes only works on single value fields. You can seed a
> gatherNodes with a facet() expression which works with multi-value fields,
> but after that it only works with single value fields.
>
> So you would have to index the data as a graph like this:
>
> id, concept1, participant1
> id, concept1, participant2
> id, concept2, participant1
> id, concept2, participant3
> id, concept3, participant2
> 
>
> Then you walk the graph like this:
>
> gatherNodes(mydata,
>   gatheNodes(mydata, walk="concept1->conceptID",
> gather="participantID")
>   walk="node->particpantID",
>   gather="conceptID")
>
> This is a two step graph expression:
> 1) Gathers all the participantID's where concept1 is in the conceptID
> field.
> 2) Gathers all the conceptID's for the participantID's gathered in step 1.
>
> Let me know if you have other questions about how to structure the data or
> run the queries.
>
>
>
>
>
>
>
>
> Adding multi-value field support is a fairly high priority so I would
> expect this to be coming in a future release.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Mar 10, 2017 at 5:15 PM, Pratik Patel  wrote:
>
> > I am trying to do a graph traversal query using gatherNode function. I am
> > seeding a streaming expression to get some documents and then I am trying
> > to map their ids(conceptid) to a multi valued field "participantIds" and
> > gather nodes.
> >
> > Here is the query I am doing.
> >
> >
> > gatherNodes(collection1,
> > > search(collection1,q="*:*",fl="conceptid",sort="conceptid
> > > asc",fq=storeid:"524efcfd505637004b1f6f24",fq=tags:"Project"),
> > > walk=conceptid->participantIds,
> > > gather="conceptid")
> >
> >
> > The field participantIds is a multi valued field. This is the field which
> > holds connections between the documents. When I execute this query, I get
> > exception as below.
> >
> >
> > { "result-set": { "docs": [ { "EXCEPTION":
> > "java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > java.io.IOException: java.util.concurrent.ExecutionException:
> > java.io.IOException: -->
> > http://169.254.40.158:8081/solr/collection1_shard1_replica1/:can not
> sort
> > on multivalued field: participantIds", "EOF": true, "RESPONSE_TIME": 15
> } ]
> > } }
> >
> >
> > Does this mean you can not look into multivalued fields in graph
> traversal
> > query? In our solr index, we have documents having "conceptid" field
> which
> > is id and we have participantIds which is a multivalued field storing
> > connections of that documents to other documents. I believe we need to
> have
> > one field in document which stores connections of that document so that
> > graph traversal is possible. If not, what is the other the way to index
> > graph data and use graph traversal. I am trying to explore graph
> traversal
> > and am new to it. Any help would be appreciated.
> >
> > Thanks,
> > Pratik
> >
>


Re: Graph traversel

2017-04-21 Thread Ganesh M
I also tried with the sample data mentioned in this link.

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-GraphQueryParser

even for that, after loading the data and for the query

http://localhost:8983/solr/graph/query?q={!graph%20from=in_edge%20to=out_edge}id:A&fl=id

I got the response as


{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":7,
"params":{
  "q":"{!graph from=in_edge to=out_edge}id:A",
  "fl":"id"}},
  "response":{"numFound":1,"start":0,"maxScore":1.0,"docs":[
  {
"id":"A"}]
  }}

instead of

"response":{"numFound":6,"start":0,"docs":[
   { "id":"A" },
   { "id":"B" },
   { "id":"C" },
   { "id":"D" },
   { "id":"E" },
   { "id":"F" } ]
}




Is any settings to enable graph traversal has to be done ?

Kindly let me know

Regards,

On Fri, Apr 21, 2017 at 1:20 PM Ganesh M 
mailto:mgane...@live.in>> wrote:
Hi

I am trying graph traversal based on the documentation available over here

http://solr.pl/en/2016/04/18/solr-6-0-and-graph-traversal-support/

But the it's not working as expected.

For this query

http://localhost:8983/solr/graph/query?q=*:*&fq={!graph%20from=parent_id%20to=id}id:1<http://localhost:8983/solr/graph/query?q=*:*&fq=%7B!graph%20from=parent_id%20to=id%7Did:1>

( which is to get all node getting traversed via node 1 )

I get the result as
"docs":[
  {
"id":"1"},
  {
"id":"11"},
  {
"id":"12"},
  {
"id":"13"},
  {
"id":"122"}]

Where as I expect result as 1,11,12,13,121, 122, 131.

What's going wrong ?

Can any body help us on this ?

Is the graph traversal stable enough in SOLR 6.5 ?

Regards,
Ganesh















Graph traversel

2017-04-21 Thread Ganesh M
Hi

I am trying graph traversal based on the documentation available over here

http://solr.pl/en/2016/04/18/solr-6-0-and-graph-traversal-support/

But the it's not working as expected.

For this query

http://localhost:8983/solr/graph/query?q=*:*&fq={!graph%20from=parent_id%20to=id}id:1

( which is to get all node getting traversed via node 1 )

I get the result as
"docs":[
  {
"id":"1"},
  {
"id":"11"},
  {
"id":"12"},
  {
"id":"13"},
  {
"id":"122"}]

Where as I expect result as 1,11,12,13,121, 122, 131.

What's going wrong ?

Can any body help us on this ?

Is the graph traversal stable enough in SOLR 6.5 ?

Regards,
Ganesh















Re: fq performance

2017-03-17 Thread Ganesh M
Hi Shawn / Michael,

Thanks for your replies and I guess you have got my scenarios exactly right.

Initially my document contains information about who have access to the
documents, like field as (U1_s:true). if 100 users can access a document,
we will have 100 such fields for each user.
So when U1 wants to see all this documents..i will query like get all
documents where U1_s:true.

If user U5 added to group G1, then I have to take all the documents of
group G1 and have to set the information of user U5 in the document like
U5_s:true in the document. For this, I have re-index all the documents in
that group.

To avoid this, I was trying to keep group information instead of user
information like G1_s:true, G2_s:true in the document. And for querying
user documents, I will first get all the groups of User U1, and then query
get all documents where G1_s:true OR G2_s:true or G3_s:true  By this we
don't need to re-index all the documents. But while querying I need to
query with OR of all the groups user belongs to.

For how many ORs solr can give the results in less than one second.Can I
pass 100's of OR condtion in the solr query? will that affects the
performance ?

Pls share your valuable inputs.

On Thu, Mar 16, 2017 at 6:04 PM Shawn Heisey  wrote:

> On 3/16/2017 6:02 AM, Ganesh M wrote:
> > We have 1 million of documents and would like to query with multiple fq
> values.
> >
> > We have kept the access_control ( multi value field ) which holds
> information about for which group that document is accessible.
> >
> > Now to get the list of all the documents of an user, we would like to
> pass multiple fq values ( one for each group user belongs to )
> >
> >
> q:somefiled:value&fq:access_control:g1&fq:access_control:g2&fq:access_control:g3&fq:access_control:g4&fq:access_control:g5...
> >
> > Like this, there could be 100 groups for an user.
>
> The correct syntax is fq=field:value -- what you have there is not going
> to work.
>
> This might not do what you expect.  Filter queries are ANDed together --
> *every* filter must match, which means that if a document that you want
> has only one of those values in access_control, or has 98 of them but
> not all 100, then the query isn't going to match that document.  The
> solution is one filter query that can match ANY of them, which also
> might run faster.  I can't say whether this is a problem for you or
> not.  Your data might be completely correct for matching 100 filters.
>
> Also keep in mind that there is a limit to the size of a URL that you
> can send into any webserver, including the container that runs Solr.
> That default limit is 8192 bytes, and includes the "GET " or "POST " at
> the beginning and the " HTTP/1.1" at the end (note the spaces).  The
> filter query information for 100 of the filters you mentioned is going
> to be over 2K, which will fit in the default, but if your query has more
> complexity than you have mentioned here, the total URL might not fit.
> There's a workaround to this -- use a POST request and put the
> parameters in the request body.
>
> > If we fire query with 100 values in the fq, whats the penalty on the
> performance ? Can we get the result in less than one second for 1 million
> of documents.
>
> With one million documents, each internal filter query result is 25
> bytes -- the number of documents divided by eight.  That's 2.5 megabytes
> for 100 of them.  In addition, every time a filter is run, it must
> examine every document in the index to create that 25 byte
> structure, which means that filters which *aren't* found in the
> filterCache are relatively slow.  If they are found in the cache,
> they're lightning fast, because the cache will contain the entire 25
> byte bitset.
>
> If you make your filterCache large enough, it's going to consume a LOT
> of java heap memory, particularly if the index gets bigger.  The nice
> thing about the filterCache is that once the cache entries exist, the
> filters are REALLY fast, and if they're all cached, you would DEFINITELY
> be able to get results in under one second.  I have no idea whether the
> same would happen when filters aren't cached.  It might.  Filters that
> do not exist in the cache will be executed in parallel, so the number of
> CPUs that you have in the machine, along with the query rate, will have
> a big impact on the overall performance of a single query with a lot of
> filters.
>
> Also related to the filterCache, keep in mind that every time a commit
> is made that opens a new searcher, the filterCache will be autowarmed.
> If the autowarmCount value for the filterCache is large, that can make
> commits take a very long time, which will cause problems if commits are
> happening frequently.  On the other hand, a very small autowarmCount can
> cause slow performance after a commit if you use a lot of filters.
>
> My reply is longer and more dense than I had anticipated.  Apologies if
> it's information overload.
>
> Thanks,
> Shawn
>
>


fq performance

2017-03-16 Thread Ganesh M
Hi,

We have 1 million of documents and would like to query with multiple fq values.

We have kept the access_control ( multi value field ) which holds information 
about for which group that document is accessible.

Now to get the list of all the documents of an user, we would like to pass 
multiple fq values ( one for each group user belongs to )

q:somefiled:value&
fq:access_control:g1&fq:access_control:g2&fq:access_control:g3&fq:access_control:g4&fq:access_control:g5...

Like this, there could be 100 groups for an user.

If we fire query with 100 values in the fq, whats the penalty on the 
performance ? Can we get the result in less than one second for 1 million of 
documents.

Let us know your valuable inputs on this.

Regards,


Transactions behaviour on Batch insert / update

2016-10-24 Thread Ganesh M
Hi all,

We are planning to make use of batch update / insert of solr documents, with 
batch size of around 100 documents per batch.

Bit curious on how transactions are maintained per batch. I do knew SOLR is not 
meant for transaction based, but want to know whether SOLR is designed to throw 
error even if one document in batch fails due to issue in data or invalid 
fields.

In case of such errors, whether complete batch fails or only one specific 
document fails and we get error for that specific document.

Pls let me know how SOLR behaves.

Regards,
Ganesh


Re: SUM Function performance

2016-10-23 Thread Ganesh M
All, Thanks for reply.

Regards,
Ganesh

On Sun 23 Oct, 2016 7:21 pm Yonik Seeley,  wrote:

> No reason to think it would be a problem.  10K documents isn't very much.
> -Yonik
>
>
> On Sun, Oct 23, 2016 at 3:14 AM, Ganesh M  wrote:
> > Is anyone tried summation of numeric field with 10k to 100k documents
> very frequently and faced any performance issues ?
> > Pls share your experience.
> >
> > On Sun 23 Oct, 2016 12:27 am Ganesh M,  mgane...@live.in>> wrote:
> > Hi,
> > We will have 10K documents for every hour. We would like to find sum on
> one field f1 based on certain condition and sum it based on group by
> another field f2
> > What will be the performance of it ? When this summation happens there
> could be other queries coming from other concurrent users.
> >
> > I am planning to do summing using following statement
> >
> > http://localhost:8983/solr/query?q=*:*&;
> >json.facet={x:'sum(price)'}
> >
> > How far is this operation is costly. Can we execute this for every hour
> for 10k documents?
> >
> > Regards,
> > Ganesh
> >
>


Re: SUM Function performance

2016-10-23 Thread Ganesh M
Is anyone tried summation of numeric field with 10k to 100k documents very 
frequently and faced any performance issues ?
Pls share your experience.

On Sun 23 Oct, 2016 12:27 am Ganesh M, 
mailto:mgane...@live.in>> wrote:
Hi,
We will have 10K documents for every hour. We would like to find sum on one 
field f1 based on certain condition and sum it based on group by another field 
f2
What will be the performance of it ? When this summation happens there could be 
other queries coming from other concurrent users.

I am planning to do summing using following statement

http://localhost:8983/solr/query?q=*:*&;
   json.facet={x:'sum(price)'}

How far is this operation is costly. Can we execute this for every hour for 10k 
documents?

Regards,
Ganesh



SUM Function performance

2016-10-22 Thread Ganesh M
Hi,
We will have 10K documents for every hour. We would like to find sum on one 
field f1 based on certain condition and sum it based on group by another field 
f2
What will be the performance of it ? When this summation happens there could be 
other queries coming from other concurrent users.

I am planning to do summing using following statement

http://localhost:8983/solr/query?q=*:*&;
   json.facet={x:'sum(price)'}

How far is this operation is costly. Can we execute this for every hour for 10k 
documents?

Regards,
Ganesh



Re: Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-06 Thread Ganesh M
Hi Shawn,

Good to know about this configuration in shardHandler. We will try this
settings and keep you posted on status. Hope setting changes will resolve the 
issue.

Regards,
Ganesh

On 06-Sep-2016 10:32 pm, "Ganesh M" 
mailto:ganesh.sudhakar@gmail.com>> wrote:

Hi Shawn,

Good to know about this configuration in shardHandler. We will try this 
settings and keep you posted on status. Hopefully it resolved the issue.

Regards,
Ganesh

On 05-Sep-2016 10:02 pm, "Shawn Heisey" 
mailto:apa...@elyograg.org>> wrote:
On 9/4/2016 10:02 PM, Ganesh M wrote:
> We have captured all traffic of HTTP POST request going out from app

I'm the one you've interacted with on IRC for this issue.

If this index has multiple shards, one thing that might be a problem
here is the ShardHandler that's internal to Solr.  This is the internal
HttpClient that distributes requests between Solr nodes.  You may need
to bump up the maxConnectionsPerHost value from its default of 20 to
something larger, like 200 or 300.  This goes in a shardHandlerFactory
section of solr.xml.  If you do not have solr.xml in zookeeper, you'll
need to make this change on every Solr node.  All Solr nodes will need
to be restarted.

https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml

I hope this helps, but I cannot be certain that this is the problem.  If
it does fix your issue, then we might have a bug.

Thanks,
Shawn



Re: Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-06 Thread Ganesh M
Hi Shawn,

Good to know about this configuration in shardHandler. We will try this
settings and keep you posted on status. Hopefully it resolved the issue.

Regards,
Ganesh

On 05-Sep-2016 10:02 pm, "Shawn Heisey"  wrote:

> On 9/4/2016 10:02 PM, Ganesh M wrote:
> > We have captured all traffic of HTTP POST request going out from app
>
> I'm the one you've interacted with on IRC for this issue.
>
> If this index has multiple shards, one thing that might be a problem
> here is the ShardHandler that's internal to Solr.  This is the internal
> HttpClient that distributes requests between Solr nodes.  You may need
> to bump up the maxConnectionsPerHost value from its default of 20 to
> something larger, like 200 or 300.  This goes in a shardHandlerFactory
> section of solr.xml.  If you do not have solr.xml in zookeeper, you'll
> need to make this change on every Solr node.  All Solr nodes will need
> to be restarted.
>
> https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
>
> I hope this helps, but I cannot be certain that this is the problem.  If
> it does fix your issue, then we might have a bug.
>
> Thanks,
> Shawn
>
>


Re: Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-05 Thread Ganesh M
Hi Alex,

We have captured all traffic of HTTP POST request going out from  app
server to SOLR request. Only once that particular document with that id (
in our case it's rowkey ) is going out to SOLR. Also in the SOLR side, we
have enabled localhost_access logs and we could see only once that document
with that unique ID is reached and in localhost_access logs we could also
see 200 OK response getting captured. So we are sure that it's not
identical documents going to SOLR.
We were using 4.10.2, as we faced this issue, we migrated to 5.4 and we
could see same issue appearing in SOLR 5.4 too.
My big question is why is that SOLR can't throw the error when it's not
able to handle the request due to concurrency or for other reason. May be
we are not using it right, but couldn't nail down the problem. We are
loosing the reliable factor on SOLR due to this, though SOLR is really NOT.
Is there any limit that after number of threads / concurrency, SOLR behaves
strange like this ? Any settings, configurations etc to control this ?

Regards,
Ganesh

On Mon, Sep 5, 2016 at 8:13 AM Alexandre Rafalovitch 
wrote:

> I can't tell anything from the document provided. So, here would be my
> thoughts:
>
> If what you see is some sort of concurrency issues, the documents
> missed/dropped would unlikely be exactly the same ones. So, if you see
> the same documents dropped, it is much more likely to be something to
> do with documents, with handler end-points, with sharding, etc.
>
> If this is easily reproducible, I would run a network analyzer such as
> Wireshark and compare your Admin UI session with your client session
> and verify that everything expected is absolutely identical.
>
> You could also temporarily turn on Debug via Admin console (under
> logs). You could turn individual elements to Trace to get low-level
> information on what's happening.
>
> Finally, I am assuming this is all happening with latest Solr? If not,
> it may be worth trying that and/or checking Jira for bugs. Lots of
> things have been fixed/improved in more recent Solr related to
> multi-threaded, multi-server setups.
>
> Regards,
>Alex.
>
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 5 September 2016 at 00:17, Ganesh M 
> wrote:
> > Hi Alex,
> > We tried to post the same manually from SOLR ADMIN / documents UI. It got
> > indexed successfully.  We are sure that it's not duplicate issue. We are
> > using default update handler and doesn't configure for custom one. We
> fire
> > the request to index using direct HTTP request using   XML
> > format. We are getting 200 OK response. But not getting indexed.
> >
> > This is the request we fired and got 200. But not getting indexed. Same
> > request fired via SOLR ADMIN / Document UI, it's getting indexed
> > successfully.
> > 
> > 
> > false
> > 55788327
> > false
> > Factuur _PERF29161663_Voor _Va Bene.pdf
> > 55788327-PERF29161663
> > 3.00
> > 2916847
> > STCUA02150011472808279078
> > EUR
> > 50.00
> > VAT
> > 50.00
> > UA0215001:VB1
> > VB1:A02150:vbgroupnft+1:1472808278137
> > RA02150AT009428
> > 10,false
> > 62440101
> > UNKNOWN
> >  RA02150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
> >
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/UA0215001/1472808278632.png#f
> > RA02150AT009425#pdf.pdf#
> >
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/UA0215001/1472808278843.png#f
> > 1472808279002
> >
> CLEA021509223370564294689844EXCC1019223370564046496793C1LEA021509223370564294752110EXCC201
> > PERF2020916145437 LEA021509223370564294752110EXCC201 Va Bene VA
> > Beheer B.V. LEA021509223370564294689844EXCC101 VA Beheer B.V. VA
> > Beheer B.V.null null null  2.1null  urn:www.cenbii.eu:
> > transaction:biicoretrdm010:ver1.0:#urn:www.peppol.eu:
> > bis:peppol4a:ver1.0#urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.xnull
> >  urn:www.cenbii.eu:profile:bii04:ver2.0null  PERF20209161454372  null
> >  147275460null  3806 UNCL1001 null  EUR6 ISO 4217 Alpha null null
> >  29168472  null null  pdf.pdf2  null null  RA02150AT009425#pdf.pdf#
> >
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/UA0215001/1472808278843.png#fpdf.pdf
> > application/pdf null null  Factuur _PERF29161663_Voor _Va Bene.pdf2  null
> >  PrimaryImagenull null  RA02150AT009424#Factuur _PERF29161663_Voor
> _Va
> > Bene.pdf#
> >
> http://srv-cbe-col1.everbindi

Re: Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-05 Thread Ganesh M
Hi Dheerendra,

This doesn't always happens. When we add single document, no issue on that.
It get's added. But when add in parallel with 50 threads concurrently, out
of 2000 documents 10 documents are getting missed ( not getting indexed ).
When this is happening, we also tried to do hard commit manually and tried
optimize too from Admin screen. But the documents are not getting indexed.
As I mentioned we are using autoSoftcommit as 1 sec and autohardcommit as
30 seconds.

Regards,
Ganesh

On Mon, Sep 5, 2016 at 1:47 AM Dheerendra Kulkarni 
wrote:

> Can you try this:
>
> 1. Add the document
> 2. Follow up by optimize in the core admin ui,
>
> If above works then you may need to check your commit.
>
> Regards,
> Dheerendra
>
> On Sun, Sep 4, 2016 at 10:47 PM, Ganesh M 
> wrote:
>
> > Hi Alex,
> > We tried to post the same manually from SOLR ADMIN / documents UI. It got
> > indexed successfully.  We are sure that it's not duplicate issue. We are
> > using default update handler and doesn't configure for custom one. We
> fire
> > the request to index using direct HTTP request using   XML
> > format. We are getting 200 OK response. But not getting indexed.
> >
> > This is the request we fired and got 200. But not getting indexed. Same
> > request fired via SOLR ADMIN / Document UI, it's getting indexed
> > successfully.
> > 
> > 
> > false
> > 55788327
> > false
> > Factuur _PERF29161663_Voor _Va Bene.pdf
> > 55788327-PERF29161663
> > 3.00
> > 2916847
> > STCUA02150011472808279078
> > EUR
> > 50.00
> > VAT
> > 50.00
> > UA0215001:VB1
> > VB1:A02150:vbgroupnft+1:1472808278137
> > RA02150AT009428
> > 10,false
> > 62440101
> > UNKNOWN
> >  RA02150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
> > http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/
> > UA0215001/1472808278632.png#f
> > RA02150AT009425#pdf.pdf#
> > http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/
> > UA0215001/1472808278843.png#f
> > 1472808279002
> > CLEA021509223370564294689844EXCC10192233705640464967
> > 93C1LEA021509223370564294752110EXCC201
> > PERF2020916145437 LEA021509223370564294752110EXCC201 Va Bene VA
> > Beheer B.V. LEA021509223370564294689844EXCC101 VA Beheer B.V. VA
> > Beheer B.V.null null null  2.1null  urn:www.cenbii.eu:
> > transaction:biicoretrdm010:ver1.0:#urn:www.peppol.eu:
> > bis:peppol4a:ver1.0#urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.xnull
> >  urn:www.cenbii.eu:profile:bii04:ver2.0null  PERF20209161454372  null
> >  147275460null  3806 UNCL1001 null  EUR6 ISO 4217 Alpha null null
> >  29168472  null null  pdf.pdf2  null null  RA02150AT009425#pdf.pdf#
> > http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/
> > UA0215001/1472808278843.png#fpdf.pdf
> > application/pdf null null  Factuur _PERF29161663_Voor _Va Bene.pdf2  null
> >  PrimaryImagenull null  RA02150AT009424#Factuur _PERF29161663_Voor
> _Va
> > Bene.pdf#
> > http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/
> > UA0215001/1472808278632.png#fFactuur
> > _PERF29161663_Voor _Va Bene.pdf application/pdf null null null
> > 62440101ZZZ
> > NL:KVK null null  2916847ZZZ NL:VAT null null  VA Beheer B.V.null null
> >  Schurinkstraatnull  23null  Ommennull  7731GCnull null  NL6
> > ISO3166-1:Alpha2 null null  2916847ZZZ NL:VAT null null  VAT6 UN/ECE 5153
> > null null  62440101ZZZ NL:KVK null null null  55788327ZZZ NL:KVK null
> null
> >  55788327ZZZ NL:KVK null null  Va Benenull null  Voorstraatnull  26null
> >  Voorschotennull  2251BNnull null  NL6 ISO3166-1:Alpha2 null null
> >  2916847ZZZ NL:VAT null null  VAT6 UN/ECE 5153 null null  55788327ZZZ
> > NL:KVK null null  147517380null null null null  NL6 ISO3166-1:Alpha2
> > null null  316 UNCL4461 null  147508740null
> 55788327-PERF29161663null
> > null  29168472 IBAN null  UNKNOWNBIC null  Betaling?binnen?14?dagen op
> > bankrekening?2916847?onder vermelding van?55788327/PERF29161663null null
> >  3.00EUR null null  50.00EUR null  3.00EUR null null  S6 UNCL5305 null
> >  6.00null null  VAT6 UN/ECE 5153 null null  50.00EUR null  50.00EUR null
> >  53.00EUR null  53.00EUR null null  102  null  5.00BX null  50.00EUR null
> > null  PERF2020916145437null  PERF2020916145437null null  12  null null
> S6
> > UNCL5305 null  6.00null null  VAT6 UN/ECE 5153 null null  10.00EUR null
> >  RA02150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#

Re: Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-04 Thread Ganesh M
9844EXCC101
CLEA021509223370564294689844EXCC1019223370564046496793C1LEA021509223370564294752110EXCC201
NL:KVK:62440101
VA Beheer B.V.
1472808279002
10.00
Factuur
A02150
62440101
UA0215001
A02150
GLDT9223370666504283001RA6DTP201
DM001
55788327
VAT
1
VAT
2916847
XCNIN199751
VB1 VB1
PERF2020916145437
RA02150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/UA0215001/1472808278632.png#f

Ommen
6.00
VA Beheer B.V.
53.00
Group
GLDT9223370666504283001RA6DTP201
S
Va Bene
2916847
23
10,false
PrimaryImage
NL
7731GC
CLEA021509223370564294689844EXCC1019223370564046496793C1LEA021509223370564294752110EXCC201
false
Voorschoten
RA02150AT009425#pdf.pdf#
http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/UA0215001/1472808278843.png#f

RA02150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/UA0215001/1472808278632.png#f

CLEA021509223370564294689844EXCC1019223370564046496793C1LEA021509223370564294752110EXCC201
false
2916847
1472808279002
Schurinkstraat
LEA021509223370564294689844EXCC101
Va Bene
3.00
10
10,false
S
PERF2020916145437
vbgroupnft+1
false
380
50.00
Voorstraat
RA02150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A02150/UA0215001/1472808278632.png#f

6.00
LEA021509223370564294752110EXCC201




Only difference is when we post via manually via SOLR ADMIN, it's fired
when there is no concurrency. But initially there would be around 50
threads firing update POST request and also few threads fire's GET request
to different collections.
Little more information about the setup
We have around 5 Collection and each collection has 2 shards ( one shard in
each node, one shard for index and other for replica), totally 2 nodes with
master master setup.

We are getting this error only when there is concurrency of of around 50
threads firing POST request to various collections same time.

Strange thing is why SOLR not returning error when it's not able to index
it. If SOLR has returned error, we could have retry the document indexing.
Is there any way we can make SOLR to return error instead of 200 when they
fail to index ?

Regards,
Ganesh

On Sun, Sep 4, 2016 at 10:11 PM Alexandre Rafalovitch 
wrote:

> Can you identify the specific documents that 'fail'? What happens if
> you post them manually? Try posting them manually but with one field
> super-distinct to see whether it made it in. What happens if you post
> it to an empty index (copy definition and try).
>
> Also, what's your request handler's parameters look like. Perhaps you
> have a signature processor, in which case it may be triggering
> duplicates avoidance with different calculation from just an id.
>
> My guess is still that it is some sort of duplicate issue.
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 4 September 2016 at 23:10, Ganesh M  wrote:
> > Some more information on this... Most of documents get indexed properly.
> Few documents are not getting indexed.
> >
> > All documents POST are seen in the localhost_access and 200 OK response
> is seen in local host access file. But in catalina, there are some
> difference in the logs for which are indexing properly, following is the
> logs.
> >
> > FINE: PRE_UPDATE add
> >
> {,id=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301}
> >
> params(crid=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301),defaults(wt=xml)
> > Sep 01, 2016 7:39:31 AM org.apache.solr.update.TransactionLog 
> > FINE: New TransactionLog
> file=/ebdata2/solrdata/IOB_shard1_replica1/data/tlog/tlog.0220856,
> exists=false, size=0, openExisting=false
> > Sep 01, 2016 7:39:31 AM org.apache.solr.update.SolrCmdDistributor submit
> > FINE: sending update to
> http://xx.xx.xx.xx:7070/solr/IOB_shard1_replica2/ retry:0
> add{version=1544254202941800448,id=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301}
> params:update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fxx.xx.xx.xx%3A7070%2Fsolr%2FIOB_shard1_replica1%2F
> > Sep 01, 2016 7:39:31 AM
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner run
> > FINE: starting runner:
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3fb794b2
> > Sep 01, 2016 7:39:31 AM
> org.apache.solr.update.processor.LogUpdateProcessor finish
> > FINE: PRE_UPDATE FINISH
> params(crid=CUA00439019223370564139207241C3LEA0207692233705674043

Re: Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-04 Thread Ganesh M
Some more information on this... Most of documents get indexed properly. Few 
documents are not getting indexed.

All documents POST are seen in the localhost_access and 200 OK response is seen 
in local host access file. But in catalina, there are some difference in the 
logs for which are indexing properly, following is the logs.

FINE: PRE_UPDATE add
{,id=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301}
params(crid=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301),defaults(wt=xml)
Sep 01, 2016 7:39:31 AM org.apache.solr.update.TransactionLog 
FINE: New TransactionLog 
file=/ebdata2/solrdata/IOB_shard1_replica1/data/tlog/tlog.0220856, 
exists=false, size=0, openExisting=false
Sep 01, 2016 7:39:31 AM org.apache.solr.update.SolrCmdDistributor submit
FINE: sending update to http://xx.xx.xx.xx:7070/solr/IOB_shard1_replica2/ 
retry:0 
add{version=1544254202941800448,id=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301}
 
params:update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fxx.xx.xx.xx%3A7070%2Fsolr%2FIOB_shard1_replica1%2F
Sep 01, 2016 7:39:31 AM 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner run
FINE: starting runner: 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3fb794b2
Sep 01, 2016 7:39:31 AM org.apache.solr.update.processor.LogUpdateProcessor 
finish
FINE: PRE_UPDATE FINISH 
params(crid=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301),defaults(wt=xml)
Sep 01, 2016 7:39:31 AM 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner run
FINE: finished: 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3fb794b2
Sep 01, 2016 7:39:31 AM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: [IOB_shard1_replica1] webapp=/solr path=/update params=
{crid=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301}
{add=[CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301
 (1544254202941800448)]}
Sep 01, 2016 7:39:31 AM org.apache.solr.servlet.SolrDispatchFilter doFilter
FINE: Closing out SolrRequest: 
params(crid=CUA00439019223370564139207241C3LEA020769223370567404392838EXCC301),defaults(wt=xml)
-

For the one which document is not getting indexed, we could see only following 
log in catalina.out. Not sure whether it's getting added to SOLR.


Sep 01, 2016 7:39:56 AM org.apache.solr.update.processor.LogUpdateProcessor 
finish
FINE: PRE_UPDATE FINISH 
params(crid=CUA00439019223370564139182810C3LEA020179223370567061972057EXCC102),defaults(wt=xml)
Sep 01, 2016 7:39:56 AM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: [IOB_shard1_replica1] webapp=/solr path=/update params=
{crid=CUA00439019223370564139182810C3LEA020179223370567061972057EXCC102}
{} 0 1
Sep 01, 2016 7:39:56 AM org.apache.solr.servlet.SolrDispatchFilter doFilter
FINE: Closing out SolrRequest: 
params(crid=CUA00439019223370564139182810C3LEA020179223370567061972057EXCC102),defaults(wt=xml)

--

You can see that in above log for missing documents ( which is not indexed), in 
catalina log, we are not seeing "PRE UPDATE ADD". Is that causing / reason for 
document not getting indexed ?

We have set autosoftcommit to 1 seconds and autohardcommit to 30 seconds.

We are not getting any errors or exceptions in the log.

This issue is becoming very critical and sort of reliable factor. Though we get 
200 OK response from SOLR for update HTTP POST request, nothing happens on the 
SOLR side. If SOLR is not able to process, isn't it we get error from SOLR 
instead of giving 200 OK response.

Anybody has faced this sort of issue or any sort of help would be very much 
appreciated.




On Sun, Sep 4, 2016 at 12:59 PM Ganesh M 
mailto:mgane...@live.in>> wrote:
Nitin, Thanks for reply. Our each document has unique id and its hbase rowkey 
id. So it will be unique only. So there is no chance of duplicates id being 
send.



On Sun 4 Sep, 2016 12:41 pm Nitin Kumar, 
mailto:nitinkumar.i...@gmail.com>> wrote:
Please check doc's unique key(Id). All keys shd be unique. Else docs having
same id will be replaced.

On 04-Sep-2016 12:13 PM, "Ganesh M" mailto:mgane...@live.in>> 
wrote:

> Hi,
> we are keep sending documents to Solr from our app server. Single document
> per request, but in parallel of 10 request hits solr cloud in a second.
>
> We could see our post request ( update request ) hitting our solr 5.4 in
> localhost_access logs, and it's response as 200 Ok response. And also we
> get HTTP 200 OK response to our app servers as well for out HTTP request we
> fired to SOLR Cloud.
>
> But few documents are not getting indexed. Out of 2000 document

Re: Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-04 Thread Ganesh M
Nitin, Thanks for reply. Our each document has unique id and its hbase rowkey 
id. So it will be unique only. So there is no chance of duplicates id being 
send.



On Sun 4 Sep, 2016 12:41 pm Nitin Kumar, 
mailto:nitinkumar.i...@gmail.com>> wrote:
Please check doc's unique key(Id). All keys shd be unique. Else docs having
same id will be replaced.

On 04-Sep-2016 12:13 PM, "Ganesh M" mailto:mgane...@live.in>> 
wrote:

> Hi,
> we are keep sending documents to Solr from our app server. Single document
> per request, but in parallel of 10 request hits solr cloud in a second.
>
> We could see our post request ( update request ) hitting our solr 5.4 in
> localhost_access logs, and it's response as 200 Ok response. And also we
> get HTTP 200 OK response to our app servers as well for out HTTP request we
> fired to SOLR Cloud.
>
> But few documents are not getting indexed. Out of 2000 documents we sent
> 10 documents are getting missed. Thought there is not error, few documents
> are getting missed.
>
> We use autoSoftcommit as 2 secs and autohardcommit as 30 secs.
>
> Why is that 10 documents not getting indexed and also no error getting
> thrown back if server is not able to index it ?
>
> Regards,
>
>
>
>


Solr document missing or not getting indexed though we get 200 ok status from server

2016-09-03 Thread Ganesh M
Hi,
we are keep sending documents to Solr from our app server. Single document per 
request, but in parallel of 10 request hits solr cloud in a second.

We could see our post request ( update request ) hitting our solr 5.4 in 
localhost_access logs, and it's response as 200 Ok response. And also we get 
HTTP 200 OK response to our app servers as well for out HTTP request we fired 
to SOLR Cloud.

But few documents are not getting indexed. Out of 2000 documents we sent 10 
documents are getting missed. Thought there is not error, few documents are 
getting missed.

We use autoSoftcommit as 2 secs and autohardcommit as 30 secs.

Why is that 10 documents not getting indexed and also no error getting thrown 
back if server is not able to index it ?

Regards,





SocketException when using Solr and Tomcat

2013-02-04 Thread Ganesh M
Hi

I am using Solr 4.0 and i am getting below socket exception very frequently. 
Could any one have any idea, why this error is generated.
Tomcat is running but it it is not accepting any connection. I need to kill the 
process and restart tomact.

INFO: Retrying request
Feb 3, 2013 3:58:20 PM org.apache.http.impl.client.DefaultRequestDirector 
tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing request: 
Connection reset
Feb 3, 2013 3:58:20 PM org.apache.http.impl.client.DefaultRequestDirector 
tryExecute
INFO: Retrying request


Regards
Ganesh