Re: phrase extraction from user paragraph input
Hi Nokos, Can you quote an example for your usecase, I guess that will be helpful for understanding the problem more clearly. Cheers! On Fri, Nov 28, 2014 at 2:31 PM, Nikos Chaliasos nchal...@cs.uoi.gr wrote: Hello, I am investigating a university project where in a part of it, the user would give a paragraph of text as input and the parsing process (after removing stopwords) would extract a series of descriptive topics about the paragraph, with which I could then search in documents for results. Is there any available bibliography/source that could help me start with? I am very new to solr/lucene and I couldn't find anything similar to what I am thinking. Thank you, Nikos Chaliasos
Re: Inconsistent Behavior of Solr Cloud
Hi Erick, Thanks for your response, well I got it resolved. I think the index were not properly distributed and moreover I had some uneven behavior while indexing, so to elaborate it, I had three shards in my collection, I started indexing with EmbeddedSolrServer and indexed around 50 Million Documents(15 GB index size without replication), there after I indexed another 50 Million to different directory for next Shard but when I checked the stats of indexing next day(probably running after 15 hrs or so) it was still running and the index size was grown to 60 GB(I didn't understood why such a huge disk allocation had taken place even for the same amount of 50 Million data I indexed previously), eventually I stopped the process as I couldn't get better updates and copied the indexes to the next Shard. #When I queried later with *:* I got the response as 69 Million documents(which was supposed to be 100 Million). ##I am not sure where another 30 Million was gone, but the problem started coming once after I again indexed to next Shard with remaining 30 Million which was not coming in querying #. I have read somewhere consistency of the cloud is broken if different shards are holding the value for same UniqueID field. With this I got few things to clarify. *Does the inconsistency behavior was because of the step I took at ## ? *If the inconsistency was because of ## then why all 100 Million documents was not present after # ? *When the same set of data was previously indexed with just 15 GB, why the index size for next 50 Million was grown to 60 GB? *For indexing huge data in reasonable time for SolrCloud what approach should be taken, if EmbeddedSolrServer is not better choice? Looking out for response. Thanks! On Sat, Jun 14, 2014 at 12:31 AM, Erick Erickson erickerick...@gmail.com wrote: It seems like for some reason you have shards that are not reachable. What does your cloud stat in the admin UI tell you when you don't get all the docs back? Best, Erick On Fri, Jun 13, 2014 at 1:37 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, I am having a Cloud setup with 3 Shards and 2 Replica running on 3 Tomcats with 3 External Zookeeper, all running on single machine. I have Indexed around 70 Mln Documents that seems to be querying back fine. When I index another 30 Mln to same, the result are vague as with the query *:* its sometimes returning 2 Shards result and sometime all the shards result. So to make it clear if I query with *:* to the 100Mln index its should return back 100Mln docs, but sometimes its returning 70Mln and sometimes 100Mln(Actual Result) with the same query. This is just not case with the *:* query but even if I query with the id q=id:123 its sometimes coming with the result and sometimes not. Looking for possible solution. Thanks!
Inconsistent Behavior of Solr Cloud
Hi All, I am having a Cloud setup with 3 Shards and 2 Replica running on 3 Tomcats with 3 External Zookeeper, all running on single machine. I have Indexed around 70 Mln Documents that seems to be querying back fine. When I index another 30 Mln to same, the result are vague as with the query *:* its sometimes returning 2 Shards result and sometime all the shards result. So to make it clear if I query with *:* to the 100Mln index its should return back 100Mln docs, but sometimes its returning 70Mln and sometimes 100Mln(Actual Result) with the same query. This is just not case with the *:* query but even if I query with the id q=id:123 its sometimes coming with the result and sometimes not. Looking for possible solution. Thanks!
Re: Collection communication internally
Then are there some other alternative so that we can achieve the goal. As querying with this way of set of foreign id is really going to make the query very large and the response is also awaited for long(previously tested with the standalone Solr core with Master Slave Architecture). Thanks! On Mon, Jun 9, 2014 at 8:42 PM, Erick Erickson erickerick...@gmail.com wrote: My first answer is don't do it that way :). Solr works best with flattened (de-normlized) data. If at all possible, you _really_ would be better off combining the two collections and flattening the data even though there would be more data. Whenever I see a question like this, I wonder if you're trying to use Solr like a DB, in this case with collections substituting for tables, and this is almost always a mistake. If you really must do this, consider cross-core joins if at all possible, but I don't think this is supported yet for distributed setups. Best, Erick On Mon, Jun 9, 2014 at 7:32 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, I was curious to know how multiple Collection communication be achieved? If yes then by what means. The use case says, having multiple collection I need to query the first collection and get the unique ids from first collection to query the second one(Foreign Key Relation). Now if the no. of terms to be passed to second collection is relatively small then its fine otherwise the problem arise, as adding them to the query is little time consuming in sense of building the query, querying to solr and waiting for the result to respond back. So the query would look something like - http://localhost:7070/solr/mycollection/select?q= http://localhost:7070/solr/recollection/select?q=*:*fl=idsort=id_S%20desc ID:( 1 OR 2 OR ... OR 10)fl=* So for the above form of query where the query terms are expanding vigorously I was looking out for some solution where the collections can internally resolve the query and fetch the resultant output. Thanks!
Re: solr4 optimization
As Otis mentioned, its obviously good to run Optimization once in a while or when you are done with most of your heavy indexing operation. Its not concern with the Disk Capacity rather with the IO and seeking in segements, When comparably it has less segments to query the IO operation will be less and so quick will be your query response. Give it a go and come up with the stats. Cheers! On Tue, Jun 10, 2014 at 1:54 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I don't remember last time I ran optimize. Sure, yes, things will work faster if you optimize an index and reduce the number of segments, but if you are regularly writing to that index and performance is OK, leave it to Lucene segment merges to purge deletes. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 9, 2014 at 4:15 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. On some of the boxes we have about 5 million deleted docs and we have never run optimization since beginning. Does number of deleted docs have anything to do with performance of query? Should we consider optimization at all if we're not worried about disk space? Thanks!
Collection communication internally
Hi All, I was curious to know how multiple Collection communication be achieved? If yes then by what means. The use case says, having multiple collection I need to query the first collection and get the unique ids from first collection to query the second one(Foreign Key Relation). Now if the no. of terms to be passed to second collection is relatively small then its fine otherwise the problem arise, as adding them to the query is little time consuming in sense of building the query, querying to solr and waiting for the result to respond back. So the query would look something like - http://localhost:7070/solr/mycollection/select?q= http://localhost:7070/solr/recollection/select?q=*:*fl=idsort=id_S%20descID:( 1 OR 2 OR ... OR 10)fl=* So for the above form of query where the query terms are expanding vigorously I was looking out for some solution where the collections can internally resolve the query and fetch the resultant output. Thanks!
Re: Solr maximum Optimal Index Size per Shard
Hi Shawn, Thanks for your response, wanted to clarify a few things. *Does that mean for querying smoothly we need to have memory atleast equal or greater to the size of index? As in my case the index size will be very heavy(~2TB) and practically speaking that amount of memory is not possible. Even If it goes to multiple shards, say around 10 Shards then also 200GB of RAM will not be an feasible option. *With CloudSolrServer can we specify which Shard the particular index should go and reside, which I can do with EmbeddedSolrServer by indexing in different directories and moving them to appropriate shard directories. Thanks! On Wed, Jun 4, 2014 at 12:43 PM, Shawn Heisey s...@elyograg.org wrote: On 6/4/2014 12:45 AM, Vineet Mishra wrote: Thanks all for your response. I presume this conversation concludes that indexing around 1Billion documents per shard won't be a problem, as I have 10 Billion docs to index, so approx 10 shards with 1 Billion each should be fine with it and how about Memory, what size of RAM should be fine for this amount of data? Figure out the heap requirements of the operating system and every program on the machine (Solr especially). Then you would add that number to the total size of the index data on the machine. That is the ideal minimum RAM. http://wiki.apache.org/solr/SolrPerformanceProblems Unfortunately, if you are dealing with a huge index with billions of documents, it is likely to be prohibitively expensive to buy that much RAM. If you are running Solr on Amazon's cloud, the cost for that much RAM would be astronomical. Exactly how much RAM would actually be required is very difficult to predict. If you had only 25% of the ideal, your index might have perfectly acceptable performance, or it might not. It might do fine under a light query load, but if you increase to 50 queries per second, performance may drop significantly ... or it might be good. It's generally not possible to know how your hardware will perform until you actually build and use your index. http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ A general rule of thumb for RAM that I have found to be useful is that if you've got less than half of the ideal memory size, you might have performance problems. Moreover what should be the indexing technique for this huge data set, as currently I am indexing with EmbeddedSolrServer but its going pathetically slow after some 20Gb of indexing. Comparatively SolrHttpPost was slow due to network delays and response but after this long running the indexing with EmbeddedSolrServer I am getting a different notion. Any good indexing technique for this huge dataset would be highly appreciated. EmbeddedSolrServer is not recommended. Run Solr in the traditional way with HTTP connectivity. HTTP overhead on a LAN is usually quite small. Solr is fully thread-safe, so you can have several indexing threads all going at the same time. Indexes at this scale should normally be built with SolrCloud, with enough servers so that each machine is only handling one shard replica. The ideal indexing program would be written in Java, using CloudSolrServer. Thanks, Shawn
Re: Solr maximum Optimal Index Size per Shard
Hey Jack, Well I have indexed around some 10 Million documents consuming 20 GB index size. Each Document is consisting of nearly 100 String Fields with data upto 10 characters per field. For my case each document containing number of fields can expand much widely (from current 100 to 500 or ever more). As for the typical exceptional case I was more interested for a way to evenly maintain the right ratio of index vs shard. Thanks! On Wed, Jun 4, 2014 at 7:47 PM, Jack Krupansky j...@basetechnology.com wrote: How many documents was in that 20GB index? I'm skeptical that a 1 billion document shard won't be a problem. I mean technically it is possible, but as you are already experiencing, it may take a long time and a very powerful machine to do so. 100 million (or 250 million max) would be a more realistic goal. Even then, it depends on your doc size and machine size. The main point from the previous discussion is that although the technical hard limit for a Solr shard is 2G docs, from a practical perspective it is very difficult to get to that limit, not that indexing 1 billion docs on a single shard is just fine! As a general rule, if you want fast queries for high volume, strive to assure that your per-shard index fits entirely into the system memory available for OS caching of file system pages. In any case, a proof of concept implementation will tell you everything you need to know. -- Jack Krupansky -Original Message- From: Vineet Mishra Sent: Wednesday, June 4, 2014 2:45 AM To: solr-user@lucene.apache.org Subject: Re: Solr maximum Optimal Index Size per Shard Thanks all for your response. I presume this conversation concludes that indexing around 1Billion documents per shard won't be a problem, as I have 10 Billion docs to index, so approx 10 shards with 1 Billion each should be fine with it and how about Memory, what size of RAM should be fine for this amount of data? Moreover what should be the indexing technique for this huge data set, as currently I am indexing with EmbeddedSolrServer but its going pathetically slow after some 20Gb of indexing. Comparatively SolrHttpPost was slow due to network delays and response but after this long running the indexing with EmbeddedSolrServer I am getting a different notion. Any good indexing technique for this huge dataset would be highly appreciated. Thanks again! On Wed, Jun 4, 2014 at 6:40 AM, rulinma ruli...@gmail.com wrote: mark. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-maximum- Optimal-Index-Size-per-Shard-tp4139565p4139698.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr maximum Optimal Index Size per Shard
Hi Toke, That was Spectacular, really great to hear that you have already indexed 2.7TB+ data to your server and still the query response time is under ms or a few seconds for such a huge dataset. Could you state what indexing mechanism are you using, as I started with EmbeddedSolrServer but it was pretty slow after a few GB(~30+) of indexing. I started indexing 1 week back and still its 37GB, although I assume HttpPost mechanism will perform lethargic slow due to network latency and for the response await. Furthermore I started with CloudSolrServer but facing some weird exception saying ClassCastException Cannot cast to Exception while adding the SolrInputDocument to the Server. CloudSolrServer server1 = new CloudSolrServer(zkHost:port1,zkHost:port2,zkHost:port3,false); server1.setDefaultCollection(mycollection); SolrInputDocument doc = new SolrInputDocument(); doc.addField( ID, 123); doc.addField( A0_s, 282628854); server1.add(doc); //Error at this line server1.commit(); Thanks again Toke for sharing that Stats. On Fri, Jun 6, 2014 at 5:04 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Fri, 2014-06-06 at 12:32 +0200, Vineet Mishra wrote: *Does that mean for querying smoothly we need to have memory atleast equal or greater to the size of index? If you absolutely, positively have to reduce latency as much as possible, then yes. With an estimated index size of 2TB, I would guess that 10-20 machines with powerful CPUs (1 per shard per expected concurrent request) would also be advisable. While you're at it, do make sure that you're using high-speed memory. That was not a serious suggestion, should you be in doubt. Very few people need the best latency possible. Most just need the individual searches to be fast enough and want to scale throughput instead. As in my case the index size will be very heavy(~2TB) and practically speaking that amount of memory is not possible. Even If it goes to multiple shards, say around 10 Shards then also 200GB of RAM will not be an feasible option. We're building a projected 24TB index collection and are currently at 2.7TB+, growing with about 1TB/10 days. Our current plan is to use a single machine with 256GB of RAM, but we will of course adjust along the way if it proves to be too small. Requirements differ with the corpus and the needs, but for us, SSDs as storage seems to provide quite enough of a punch. I did a little testing yesterday: https://plus.google.com/u/0/+TokeEskildsen/posts/4yPvzrQo8A7 tl;dr: for small result sets ( 1M hits) on unwarmed searches with simple queries, response time is below 100ms. If we enable faceting with plain Solr, this jumps to about 1 second. I did a top on the machine and it says that 50GB is currently used for caching, so an 80GB (and probably less) machine would work fine for our 2.7TB index. - Toke Eskildsen, State and University Library, Denmark
Re: Solr maximum Optimal Index Size per Shard
Earlier I used to index with HtttpPost Mechanism only, making each post size specific to 2Mb to 20Mb that was going fine, but we had a suspect that instead of indexing through network call(which ofcourse results in latency due to network delays and http protocol) if we can index Offline by just writing the index and dumping it to Shards it would be much better. Although I am doing commit with a batch of 25K docs which I will try to replace with CommitWithin(seems it works faster) or probably have a look at this Binary Prot. Thanks! On Fri, Jun 6, 2014 at 5:55 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Fri, 2014-06-06 at 14:05 +0200, Vineet Mishra wrote: Could you state what indexing mechanism are you using, as I started with EmbeddedSolrServer but it was pretty slow after a few GB(~30+) of indexing. I suspect that is due to too-frequent commits, too small heap or something third, unrelated to EmbeddedSolrServer itself. Underneath the surface it is just the same as a standalone Solr. We're building our ~1TB indexes individually, using standalone workers for the heavy part of the analysis (Tika). The delivery from the workers to the Solr server is over the network, using the Solr binary protocol. My colleague Thomas Egense just created a small write-up at https://github.com/netarchivesuite/netsearch I started indexing 1 week back and still its 37GB, although I assume HttpPost mechanism will perform lethargic slow due to network latency and for the response await. Maybe if you send the documents one at a time, but if you bundle them in larger updates, the post-method should be fine. - Toke Eskildsen, State and University Library, Denmark
Re: Solr maximum Optimal Index Size per Shard
Thanks all for your response. I presume this conversation concludes that indexing around 1Billion documents per shard won't be a problem, as I have 10 Billion docs to index, so approx 10 shards with 1 Billion each should be fine with it and how about Memory, what size of RAM should be fine for this amount of data? Moreover what should be the indexing technique for this huge data set, as currently I am indexing with EmbeddedSolrServer but its going pathetically slow after some 20Gb of indexing. Comparatively SolrHttpPost was slow due to network delays and response but after this long running the indexing with EmbeddedSolrServer I am getting a different notion. Any good indexing technique for this huge dataset would be highly appreciated. Thanks again! On Wed, Jun 4, 2014 at 6:40 AM, rulinma ruli...@gmail.com wrote: mark. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-maximum-Optimal-Index-Size-per-Shard-tp4139565p4139698.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr maximum Optimal Index Size per Shard
Hi All, Has anyone came across the maximum threshold document or size wise for each core of solr to hold. As I have indexed some 10 Million Documents of 18Gb and when I index another 5 (9Gb)Million Documents on top of these indexes it responds little slow with Stats query. Considering I have around 2Tb of data to index what should be an appropriate balanced proportionate of Data vs # of Shards. Its more of a indexing Big data for NRT. Looking forward for your response. Urgent! Thanks!
Re: Offline Indexes Update to Shard
Hi Otis, I have to index some huge amount of data that's around Billions of records, since indexing via HTTP post mechanism will be a slow and lethargic due to network delay hence I am indexing through EmbeddedSolrServer to create index which I can later upload to different Shards in SolrCloud, although copy pasting the index is possible but I was looking out for some other alternative which can take care of copying it to shard and its replicas. Is copying manually good and only approach because the index size may grow upto a TB or so. Looking out for your response. Thanks! On Thu, May 29, 2014 at 7:52 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, Has anyone tried with building Offline indexes with EmbeddedSolrServer and posting it to Shards. What do you mean by posting it to shards? How is that different than copying them manually to the right location in FS? Could you please elaborate? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ FYI, I am done building the indexes but looking out for a way to post these index files on shards. Copying the indexes manually to each shard's replica is possible and is working fine but I don't want to go with that approach. Thanks!
Re: Offline Indexes Update to Shard
Hi Erick, Thanks for your mail, please let me go through with my use case. I am having around 20-40 Billion Records to index with each record is having around 200-400 fields, the data is sensor data so it can be easily stored in Integer or Float. Now to index this huge amount of data I am going with the indexing through EmbeddedSolrServer which was working fine but I was looking out for a way to move these generated indexes to different shards possibly without copying pasting it to each machines but some other approach as to submit this indexes to some shard and let the shard take care of it distributing it over leader and replica. I want to mention one more thing, as I started indexing with EmbeddedSolrServer it went fine for some million of starting documents but there after the indexing speed is pathetically slow, it indexed around 20GB in a day and just have indexed 9 GB in another 2 days. Any indexing optimization approach also requested. Hope this makes things much clearer. Looking forward to soon hear from you. Thanks and Regards! On Fri, May 30, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com wrote: You can copy to the shards and use the mergindexes command, the MapReduceIndexerTool follows that approach. But really, what is the higher-level use-case you're trying to support? This feels a little like an XY problem. You could do things like 1 index to a different collection then use collection aliasing to switch 2 just re-index to the current collection. 3 use the MapReduceIndexerTool (admittedly it needs Hadoop). All in all, it feels like you're doing work you don't need to do. But that's a guess since you haven't told us what the use-case is. Best, Erick On Thu, May 29, 2014 at 7:22 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, Has anyone tried with building Offline indexes with EmbeddedSolrServer and posting it to Shards. What do you mean by posting it to shards? How is that different than copying them manually to the right location in FS? Could you please elaborate? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ FYI, I am done building the indexes but looking out for a way to post these index files on shards. Copying the indexes manually to each shard's replica is possible and is working fine but I don't want to go with that approach. Thanks!
Re: Offline Indexes Update to Shard
Hi Wolfgang, Thanks for your response, can you quote some running example of MapReduceIndexerTool for indexing through csv files. If you are referring to http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html?scroll=csug_topic_6_1 I had a few points to clarify, *what is the morphline? *Is it necessary to use morphline for indexing, if yes how to create one? *can Index only reside on HDFS and not on LocalFS? *what is the minimum cdh version supported for it? Looking forward to your response. Thanks! On Mon, Jun 2, 2014 at 2:24 PM, Wolfgang Hoschek whosc...@cloudera.com wrote: Sounds like you should consider using MapReduceIndexerTool. AFAIK, this is the most scalable indexing (and merging) solution out there. Wolfgang. On Jun 2, 2014, at 10:33 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi Erick, Thanks for your mail, please let me go through with my use case. I am having around 20-40 Billion Records to index with each record is having around 200-400 fields, the data is sensor data so it can be easily stored in Integer or Float. Now to index this huge amount of data I am going with the indexing through EmbeddedSolrServer which was working fine but I was looking out for a way to move these generated indexes to different shards possibly without copying pasting it to each machines but some other approach as to submit this indexes to some shard and let the shard take care of it distributing it over leader and replica. I want to mention one more thing, as I started indexing with EmbeddedSolrServer it went fine for some million of starting documents but there after the indexing speed is pathetically slow, it indexed around 20GB in a day and just have indexed 9 GB in another 2 days. Any indexing optimization approach also requested. Hope this makes things much clearer. Looking forward to soon hear from you. Thanks and Regards! On Fri, May 30, 2014 at 9:09 PM, Erick Erickson erickerick...@gmail.com wrote: You can copy to the shards and use the mergindexes command, the MapReduceIndexerTool follows that approach. But really, what is the higher-level use-case you're trying to support? This feels a little like an XY problem. You could do things like 1 index to a different collection then use collection aliasing to switch 2 just re-index to the current collection. 3 use the MapReduceIndexerTool (admittedly it needs Hadoop). All in all, it feels like you're doing work you don't need to do. But that's a guess since you haven't told us what the use-case is. Best, Erick On Thu, May 29, 2014 at 7:22 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, Has anyone tried with building Offline indexes with EmbeddedSolrServer and posting it to Shards. What do you mean by posting it to shards? How is that different than copying them manually to the right location in FS? Could you please elaborate? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ FYI, I am done building the indexes but looking out for a way to post these index files on shards. Copying the indexes manually to each shard's replica is possible and is working fine but I don't want to go with that approach. Thanks!
Offline Indexes Update to Shard
Hi All, Has anyone tried with building Offline indexes with EmbeddedSolrServer and posting it to Shards. FYI, I am done building the indexes but looking out for a way to post these index files on shards. Copying the indexes manually to each shard's replica is possible and is working fine but I don't want to go with that approach. Thanks!
Indexing Getting Failed
Hi I have setup default cloud cluster 4.6.0 with inbuilt Zookeeper running on Jetty, as I started with indexing till a few thousand it goes fine but soon after some 5000 documents or so it started giving error(please find below) and stopped the indexing too as the Zookeeper Leader selection was in transition, is it the problem due to built in Zookeeper. *Error Trace:* ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: No registered leader was found, collection:collection1 slice:shard2 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:223) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:679) Any Suggestion would be appreciated. Thanks!
Re: Inconsistent response from Cloud Query
Hi Shawn, There is no recovery case for me, neither the commit is pending. The case I am talking about is when I restart the Cloud all over again with index already flushed to disk. Thanks! On Sun, May 11, 2014 at 10:17 PM, Shawn Heisey s...@elyograg.org wrote: On 5/9/2014 11:42 AM, Cool Techi wrote: We have noticed Solr returns in-consistent results during replica recovery and not all replicas are in the same state, so when your query goes to a replica which might be recovering or still copying the index then the counts may differ. regards,Ayush SolrCloud should never send requests to a replica that is recovering. If that is happening (which I think is unlikely), then it's a bug. If *you* send a request to a replica that is still recovering, I would expect SolrCloud to redirect the request elsewhere unless distrib=false is used. I'm not sure whether that actually happens, though. Thanks, Shawn
Fwd: Inconsistent response from Cloud Query
Copying. Community: Looking forward for your response. -- Forwarded message -- From: Vineet Mishra clearmido...@gmail.com Date: Mon, May 12, 2014 at 5:57 PM Subject: Re: Inconsistent response from Cloud Query To: solr-user@lucene.apache.org Hi Shawn, There is no recovery case for me, neither the commit is pending. The case I am talking about is when I restart the Cloud all over again with index already flushed to disk. Thanks! On Sun, May 11, 2014 at 10:17 PM, Shawn Heisey s...@elyograg.org wrote: On 5/9/2014 11:42 AM, Cool Techi wrote: We have noticed Solr returns in-consistent results during replica recovery and not all replicas are in the same state, so when your query goes to a replica which might be recovering or still copying the index then the counts may differ. regards,Ayush SolrCloud should never send requests to a replica that is recovering. If that is happening (which I think is unlikely), then it's a bug. If *you* send a request to a replica that is still recovering, I would expect SolrCloud to redirect the request elsewhere unless distrib=false is used. I'm not sure whether that actually happens, though. Thanks, Shawn
Inconsistent response from Cloud Query
Hi All, I have setup cloud-4.6.2 with default configuration on single machine with 2 shards and 2 replication through https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud Cloud was up and running and I indexed the example data xml to it, it went fine. Now when I am querying with *distrib=true* it is giving inconsistent result, sometimes it gives 4 Result Response and sometimes 8(Actual Number) Has anyone been through the situation. Looking for positive and quick response. Thanks!
Re: Indexing Big Data With or Without Solr
I did it with Tomcat and Zookeeper Ensemble, will mail you the steps shortly. Cheers On Sat, Apr 19, 2014 at 9:09 AM, Aman Tandon amantandon...@gmail.comwrote: Vineet please share after you setup for solr cloud Are you using jetty or tomcat.? On Saturday, April 19, 2014, Vineet Mishra clearmido...@gmail.com wrote: Thanks Furkan, I will definitely give it a try then. Thanks again! On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Vineet; I've been using SolrCloud for such kind of Big Data and I think that you should consider to use it. If you have any problems you can ask it here. Thanks; Furkan KAMACI 2014-04-15 13:20 GMT+03:00 Vineet Mishra clearmido...@gmail.com: Hi All, I have worked with Solr 3.5 to implement real time search on some 100GB data, that worked fine but was little slow on complex queries(Multiple group/joined queries). But now I want to index some real Big Data(around 4 TB or even more), can SolrCloud be solution for it if not what could be the best possible solution in this case. *Stats for the previous Implementation:* It was Master Slave Architecture with normal Standalone multiple instance of Solr 3.5. There were around 12 Solr instance running on different machines. *Things to consider for the next implementation:* Since all the data is sensor data hence it is the factor of duplicity and uniqueness. *Really urgent, please take the call on priority with set of feasible solution.* Regards -- Sent from Gmail Mobile
Re: Indexing Big Data With or Without Solr
Thanks Furkan, I will definitely give it a try then. Thanks again! On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI furkankam...@gmail.comwrote: Hi Vineet; I've been using SolrCloud for such kind of Big Data and I think that you should consider to use it. If you have any problems you can ask it here. Thanks; Furkan KAMACI 2014-04-15 13:20 GMT+03:00 Vineet Mishra clearmido...@gmail.com: Hi All, I have worked with Solr 3.5 to implement real time search on some 100GB data, that worked fine but was little slow on complex queries(Multiple group/joined queries). But now I want to index some real Big Data(around 4 TB or even more), can SolrCloud be solution for it if not what could be the best possible solution in this case. *Stats for the previous Implementation:* It was Master Slave Architecture with normal Standalone multiple instance of Solr 3.5. There were around 12 Solr instance running on different machines. *Things to consider for the next implementation:* Since all the data is sensor data hence it is the factor of duplicity and uniqueness. *Really urgent, please take the call on priority with set of feasible solution.* Regards
Indexing Big Data With or Without Solr
Hi All, I have worked with Solr 3.5 to implement real time search on some 100GB data, that worked fine but was little slow on complex queries(Multiple group/joined queries). But now I want to index some real Big Data(around 4 TB or even more), can SolrCloud be solution for it if not what could be the best possible solution in this case. *Stats for the previous Implementation:* It was Master Slave Architecture with normal Standalone multiple instance of Solr 3.5. There were around 12 Solr instance running on different machines. *Things to consider for the next implementation:* Since all the data is sensor data hence it is the factor of duplicity and uniqueness. *Really urgent, please take the call on priority with set of feasible solution.* Regards
Re: SolrCloud with Tomcat
Hi Got it working! Much thanks for you response. On Sat, Mar 8, 2014 at 7:40 PM, Furkan KAMACI furkankam...@gmail.comwrote: Hi; Could you check here: http://lucene.472066.n3.nabble.com/Error-when-creating-collection-in-Solr-4-6-td4103536.html Thanks; Furkan KAMACI 2014-03-07 9:44 GMT+02:00 Vineet Mishra clearmido...@gmail.com: Hi I am installing SolrCloud with 3 External Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2 Tomcats(localhost:8181,localhost:8182) all available on a single Machine(Just for getting started). By Following these links http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html http://wiki.apache.org/solr/SolrCloudTomcat I have got the Solr UI on the machine pointing to http://localhost:8181/solr/#/~cloud In the Cloud Graph View it is coming with mycollection | |_ shard1 |_ shard2 But both the shards are empty and showing no cores or replica. Following http://myjeeva.com/solrcloud-cluster-single-collection-deployment.htmlblog , I have been successful till starting tomcat, since after the section Creating Collection, Shard(s), Replica(s) in SolrCloud I am facing the problem. Giving command to create replica for the shard using *curl ' http://localhost:8181/solr/admin/cores?action=CREATEname=shard1-replica-2collection=mycollectionshard=shard1 http://localhost:8181/solr/admin/cores?action=CREATEname=shard1-replica-2collection=mycollectionshard=shard1 '* it is giving error response lst name=responseHeaderint name=status400/intint name=QTime137/int/lstlst name=errorstr name=msg *Error CREATEing SolrCore 'shard1-replica-2': 192.168.2.183:8182_solr_shard1-replica-2 is removed* /strint name=code400/int/lst /response Has anybody went through this issue? Regards
SolrCloud with Tomcat
Hi I am installing SolrCloud with 3 External Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2 Tomcats(localhost:8181,localhost:8182) all available on a single Machine(Just for getting started). By Following these links http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html http://wiki.apache.org/solr/SolrCloudTomcat I have got the Solr UI on the machine pointing to http://localhost:8181/solr/#/~cloud In the Cloud Graph View it is coming with mycollection | |_ shard1 |_ shard2 But both the shards are empty and showing no cores or replica. Following http://myjeeva.com/solrcloud-cluster-single-collection-deployment.htmlblog, I have been successful till starting tomcat, since after the section Creating Collection, Shard(s), Replica(s) in SolrCloud I am facing the problem. Giving command to create replica for the shard using *curl 'http://localhost:8181/solr/admin/cores?action=CREATEname=shard1-replica-2collection=mycollectionshard=shard1 http://localhost:8181/solr/admin/cores?action=CREATEname=shard1-replica-2collection=mycollectionshard=shard1'* it is giving error response lst name=responseHeaderint name=status400/intint name=QTime137/int/lstlst name=errorstr name=msg *Error CREATEing SolrCore 'shard1-replica-2': 192.168.2.183:8182_solr_shard1-replica-2 is removed* /strint name=code400/int/lst /response Has anybody went through this issue? Regards
Re: Fault Tolerant Technique of Solr Cloud
Hi Per Thanks for your response, got it working. But moreover I was more interested in querying the same Cloud from UI in a case of one of the server down and querying the same server to get collection result. But I guess thats not possible. Thanks! On Mon, Feb 24, 2014 at 7:36 PM, Per Steffensen st...@designware.dk wrote: On 24/02/14 13:04, Vineet Mishra wrote: Can you brief as how to make a direct call to Zookeeper instead of Cloud Collection(as currently I was querying the Cloud something like *http://192.168.2.183:8900/solr/collection1/select?q=*:* http://192.168.2.183:8900/solr/collection1/select?q=*:** ) from UI, now if I assume shard 8900 is down then how can I still make the call. It is obvious that you cannot make the call to localhost:8900 - the server listening to that port is down. You can make the call to any of the other servers, though. Information about which Solr-servers are running is available in ZooKeeper, CloudSolrServer reads that information in order to know which servers to route requests to. As long as localhost:8900 is down it will not route requests to that server. You do not make a direct call to ZooKeeper. ZooKeeper is not an HTTP server that will receive your calls. It just has information about which Solr-servers are up and running. CloudSolrServers takes advantage of that information. You really cannot do without CloudSolrServer (or at least LBHttpSolrServer), unless you write a component that can do the same thing in some other language (if the reason you do not want to use CloudSolrServer, is that your client is not java). Else you need to do other clever stuff, like e.g. what Shalin suggests. Regards, Per Steffensen
Scalability Limit of SolrCloud
Hi All What is the Scalability Limit of CloudSolr, can it reach to index Billions of Documents and each document containing 400-500 Number Field(probably Float or Double). Is it possible and feasible to go with current CloudSolr Architecture or are there some other alternative or replacement. Regards
Re: Fault Tolerant Technique of Solr Cloud
Can you brief as how to make a direct call to Zookeeper instead of Cloud Collection(as currently I was querying the Cloud something like *http://192.168.2.183:8900/solr/collection1/select?q=*:* http://192.168.2.183:8900/solr/collection1/select?q=*:** ) from UI, now if I assume shard 8900 is down then how can I still make the call. I have followed the Apache Tutorial(with separate zookeeper running on port 2181) http://wiki.apache.org/solr/SolrCloud Can you please be more specific in respect to zookeeper distributed calls. Regards On Wed, Feb 19, 2014 at 9:45 PM, Per Steffensen st...@designware.dk wrote: On 19/02/14 07:57, Vineet Mishra wrote: Thanks for all your response but my doubt is which *Server:Port* should the query be made as we don't know the crashed server or which server might crash in the future(as any server can go down). That is what CloudSolrServer will deal with for you. It knows which servers are down and make sure not to send request to those servers. The only intention for writing this doubt is to get an idea about how the query format for distributed search might work if any of the shard or replica goes down. // Setting up your CloudSolrServer-client CloudSolrServer client= new CloudSolrServer(zkConnectionStr); // zkConnectionStr being the same string as you provide in -D|zkHost when starting your servers |client.setDefaultCollection(collection1); client.connect(); // Creating and firing queries (you can do it in different way, but at least this is an option) SolrQuery query = new SolrQuery(*:*); QueryResponse results = client.query(query); Because you are using CloudSolrServer you do not have to worry about not sending the request to a crashed server. In your example I believe the situation is as follows: * One collection called collection1 with two shards shard1 and shard2 each having two replica replica1 and replica2 (a replica is an instance of a shard, and when you have one replica you are not having replication). * collection1.shard1.replica1 is running on localhost:8983 and collection1.shard1.replica2 is running on localhost:8900 (or maybe switched) * collection1.shard2.replica1 is running on localhost:7574 and collection1.shard2.replica2 is running on localhost:7500 (or maybe switched) If localhost:8900 is the only server that is down, all data is still available for search because every shard has at least on replica running. In that case I believe setting shards.tolerant will not make a difference. You will get your response no matter what. But if localhost:8983 was also down there would no live replica of shard1. I that case you will get an exception from you query, indicating that the query cannot be carried out over the complete data-set. In that case if you set shards.tolerant that behaviour will change, and you will not get an exception - you will get a real response, but it will just not include data from shard1, because it is not available at the moment. That is just the way I believe shards.tolerant works, but you might want to verify that. To set shards.tolerant: SolrQuery query = new SolrQuery(*:*); query.set(shards.tolerant, true); QueryResponse results = client.query(query); Believe distributes search is default, but you can explicitly require it by query.setDistrib(true); or query.set(distrib, true); Thanks
Fault Tolerant Technique of Solr Cloud
Hi All, I want to have clear idea about the Fault Tolerant Capability of SolrCloud Considering I have setup the SolrCloud with a external Zookeeper, 2 shards, each having a replica with single collection as given in the official Solr Documentation. https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud *Collection1* /\ /\ /\ /\ /\ / \ *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. Regards
Re: Fault Tolerant Technique of Solr Cloud
Thanks for all your response but my doubt is which *Server:Port* should the query be made as we don't know the crashed server or which server might crash in the future(as any server can go down). The only intention for writing this doubt is to get an idea about how the query format for distributed search might work if any of the shard or replica goes down. Thanks On Tue, Feb 18, 2014 at 11:22 PM, Shawn Heisey s...@elyograg.org wrote: On 2/18/2014 8:32 AM, Shawn Heisey wrote: On 2/18/2014 6:05 AM, Vineet Mishra wrote: *Shard 1 Shard 2* localhost:8983localhost:7574 localhost:8900localhost:7500 I Indexed some document and then if I shutdown any of the replica or Leader say for ex- *localhost:8900*, I can't query to the collection to that particular port http:/*/localhost:8900*/solr/collection1/select?q=*:* Then how is it Fault Tolerant or how the query has to be made. What is the complete error you are getting? If you don't see the error in the response, you'll need to find your Solr Logfile and look for the error (including a large java stacktrace) there. Good catch by Per. I did not notice that you were trying to send the query to the server that you took down. This isn't going to work -- if the software you're trying to reach is not running, it won't respond. Think about what happens if you are sending requests to a server and it crashes completely. If you want to always send to the same host/port, you will need a load balancer listening on that port. You'll also want something that maintains a shared IP address, so that if the machine dies, the IP address and the load balancer move to another machine. Haproxy and Pacemaker work very well as a combination for this. There are many other choices, both hardware and software. Per also mentioned the other option - you can write code that knows about multiple URLs and can switch between them. This is something you get for free with CloudSolrServer when writing Java code with SolrJ. Thanks, Shawn
pool-1-thread-4 java.lang.NoSuchMethodError: org.apache.solr.util.SimplePostTool
pool-1-thread-4 java.lang.NoSuchMethodError: org.apache.solr.util.SimplePostTool I am getting this error while posting Data to Solr from XML generated file. Although the Solr post.jar is present in the Library Class Path and I also tried keeping the Source class of the Post Tool. Urgent Call. Thanks!
Making a Web Request is failing with 403 Request Forbidden
Hi All, I am making web server call to a website for Shortening the links, that is bit.ly but recieving a 403 Request Forbidden. Although if I use their webpage to short the web link its working good. Can any body tell me what might be the reason for such a vague behavior. Here is the code included. String url = https://bitly.com/shorten/;; StringBuffer response; try { URL obj = new URL(url); HttpsURLConnection con = (HttpsURLConnection) obj.openConnection(); //add reuqest header con.setRequestMethod(POST); con.setRequestProperty(User-Agent, Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.22 (KHTML, like Gecko) Ubuntu Chromium/25.0.1364.160 Chrome/25.0.1364.160 Safari/537.22); con.setRequestProperty(Accept-Language, en-US,en;q=0.8); con.setRequestProperty(Accept, text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8); con.setRequestProperty(Accept-Charset, ISO-8859-1,utf-8;q=0.7,*;q=0.3); con.setRequestProperty(Content-Type, application/x-www-form-urlencoded); con.setRequestProperty(Host, bitly.com); String urlParameters = url= http://bit.ly/1f3aLrPie=utf-8oe=utf-8gws_rd=crei=sKlwUvPbN8j-rAf-5IDwAQbasic_style=1classic_mode=rapid_shorten_mode=_xsrf=a2b71eaf499c4690a77a21d3c87e6302 ; // Send post request con.setDoOutput(true); DataOutputStream wr = new DataOutputStream(con.getOutputStream()); wr.writeBytes(urlParameters); wr.flush(); wr.close(); int responseCode = con.getResponseCode(); System.out.println(Response Code : + responseCode); BufferedReader in = new BufferedReader( new InputStreamReader(con.getInputStream())); String inputLine; response = new StringBuffer(); while ((inputLine = in.readLine()) != null) { response.append(inputLine); } in.close(); System.out.println(response.toString()); } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ProtocolException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } Hoping for your response. Thanks!
Post Call to Solr RequestHandler
Hi Currently I am working with RequestHandler in Solr, where the user defined query is processed at the class specified by the requesthandler in Solrconfig.xml. But my requirement is that I want to make it a Post call rather than a Get query call. Is it possible or are there some way we can accomplish querying Solr RequestHandler with Post Method. This is an urgent call. Please revert back soon possible. Thanks Vineet
Re: Unexpected character '' (code 60) expected '='
I am using Solr 3.5 with the posting XML file size of just 1Mb. On Wed, Jul 31, 2013 at 8:19 PM, Shawn Heisey s...@elyograg.org wrote: On 7/31/2013 7:16 AM, Vineet Mishra wrote: I checked the File. . .nothing is there. I mean the formatting is correct, its a valid XML file. What version of Solr, and how large is your XML file? If Solr is older than version 4.1, then the POST buffer limit is decided by your container config, which based on your stacktrace, is tomcat. If you have 4.1 or later, then the POST buffer limit is decided by Solr, and defaults to 2048KiB. Could that be the problem? Thanks, Shawn
Re: Unexpected character '' (code 60) expected '='
XML Validation goes good, It does not give any error rather while Running again it goes for a few lakhs of records and then stops saying *SimplePostTool: FATAL: IOException while reading response: java.io.IOException: Incomplete output stream* * * I guess this is the issue of Threads Resource Management with Lock. On Thu, Aug 1, 2013 at 1:50 PM, Paul Masurel paul.masu...@gmail.com wrote: You can check for your xml validity with xmllint very simply. xmllint file Does this return an error? On Thu, Aug 1, 2013 at 9:59 AM, deniz denizdurmu...@gmail.com wrote: Vineet Mishra wrote I am using Solr 3.5 with the posting XML file size of just 1Mb. On Wed, Jul 31, 2013 at 8:19 PM, Shawn Heisey lt; solr@ gt; wrote: On 7/31/2013 7:16 AM, Vineet Mishra wrote: I checked the File. . .nothing is there. I mean the formatting is correct, its a valid XML file. What version of Solr, and how large is your XML file? If Solr is older than version 4.1, then the POST buffer limit is decided by your container config, which based on your stacktrace, is tomcat. If you have 4.1 or later, then the POST buffer limit is decided by Solr, and defaults to 2048KiB. Could that be the problem? Thanks, Shawn you might need to escape some chars like to lt; and so on - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Unexpected-character-code-60-expected-tp4081603p4081854.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Masurel Paul e-mail: paul.masu...@gmail.com
SimplePostTool: FATAL: Solr returned an error #400 Bad Request
Hi All Currently I am in a mid of a project which Index some data to Solrs multiple instance. I have the Configuration as, on the same machine I have made multiple instances of Solr http://localhost:8080/solr1 http://localhost:8080/solr2 http://localhost:8080/solr3 http://localhost:8080/solr4 http://localhost:8080/solr5 http://localhost:8080/solr6 Now when I am posting the Data to Solr through SimplePostTool by passing a xml file in spt.postFile(file) method and committing it there after. This all process is Multithreaded and works fine till 1 Million of data record but there after it suddenly stops saying, *SimplePostTool: FATAL: Solr returned an error #400 Bad Request* * * in the Tomcat Catalina I found *WARNING: Failed to register info bean: searcher* *javax.management.InstanceAlreadyExistsException: solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher* * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)* * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) * * at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513) * * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)* * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)* * at org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220) * * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)* * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)* * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)* * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)* * at java.util.concurrent.FutureTask.run(FutureTask.java:166)* * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) * * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) * * at java.lang.Thread.run(Thread.java:722)* * * *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher* *INFO: [] Registered new searcher Searcher@5fa1891b main* *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close* Has anybody traced such issue. Please this is really very Urgent and Important. Waiting for your response. Thanks and Regards Vineet
Re: SimplePostTool: FATAL: Solr returned an error #400 Bad Request
I got it resolved, actually the error trace was even more above this one. It was just that the posting XML was not forming properly for the Solr field *Date* which usually takes the format *2006-07-15T22:18:48Z* * * This is the standard format for the Solr date(datatype) which follows specifically some of the pattern mentioned. - 1995-12-31T23:59:59Z - 1995-12-31T23:59:59.9Z - 1995-12-31T23:59:59.99Z - 1995-12-31T23:59:59.999Z As documented by Solr http://www.meticent.com/DAt( *www.meticent.com/DAt*) By the way thanks! Vineet On Wed, Jul 31, 2013 at 4:47 PM, Erick Erickson erickerick...@gmail.comwrote: Probably not the root of your problem, but bq: and committing it there after. Does that mean you're calling commit after every document? This is usually poor practice, I'd set the autocommit intervals on solrconfig.xml and NOT call commit explicitly. Does the same document fail every time? What does it look like? You really haven't provided much information to go on. Best Erick On Wed, Jul 31, 2013 at 3:55 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All Currently I am in a mid of a project which Index some data to Solrs multiple instance. I have the Configuration as, on the same machine I have made multiple instances of Solr http://localhost:8080/solr1 http://localhost:8080/solr2 http://localhost:8080/solr3 http://localhost:8080/solr4 http://localhost:8080/solr5 http://localhost:8080/solr6 Now when I am posting the Data to Solr through SimplePostTool by passing a xml file in spt.postFile(file) method and committing it there after. This all process is Multithreaded and works fine till 1 Million of data record but there after it suddenly stops saying, *SimplePostTool: FATAL: Solr returned an error #400 Bad Request* * * in the Tomcat Catalina I found *WARNING: Failed to register info bean: searcher* *javax.management.InstanceAlreadyExistsException: solr/:type=searcher,id=org.apache.solr.search.SolrIndexSearcher* * at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)* * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900) * * at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324) * * at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:513) * * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:141)* * at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)* * at org.apache.solr.search.SolrIndexSearcher.register(SolrIndexSearcher.java:220) * * at org.apache.solr.core.SolrCore.registerSearcher(SolrCore.java:1349)* * at org.apache.solr.core.SolrCore.access$000(SolrCore.java:84)* * at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1247)* * at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)* * at java.util.concurrent.FutureTask.run(FutureTask.java:166)* * at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) * * at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) * * at java.lang.Thread.run(Thread.java:722)* * * *Jul 31, 2013 12:46:00 PM org.apache.solr.core.SolrCore registerSearcher* *INFO: [] Registered new searcher Searcher@5fa1891b main* *Jul 31, 2013 12:46:00 PM org.apache.solr.search.SolrIndexSearcher close* Has anybody traced such issue. Please this is really very Urgent and Important. Waiting for your response. Thanks and Regards Vineet
Unexpected character '' (code 60) expected '='
Hi All I am currently stuck in a Solr Issue while Posting some data to Solr Server. I have some record from Hbase which I am posting to Solr, but after posting some 1 Million of data records, it suddenly stopped. Checking the Catalina log trace it showed, *org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * * * * I am not sure whether its the issue with some malformed data for the posting, because whatever xml file which I am generating before posting I have tried posting that specific file to the solr and its going well. Below is the whole log trace, *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)* * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) * * at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) * * at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) * * at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) * * at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) * * at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) * * at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) * * at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) * * at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)* * at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)* * at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) * * at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) * * at java.lang.Thread.run(Thread.java:722)* *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)* * at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001) * * at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936) * * at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)* * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)* * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)* * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)* * ... 17 more* * * Has anybody faced this issue. Thanks and Regards Vineet
Re: Unexpected character '' (code 60) expected '='
I checked the File. . .nothing is there. I mean the formatting is correct, its a valid XML file. On Wed, Jul 31, 2013 at 6:38 PM, Markus Jelsma markus.jel...@openindex.iowrote: This file is malformed: *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* Check row 20281 column 18 -Original message- From:Vineet Mishra clearmido...@gmail.com Sent: Wednesday 31st July 2013 15:05 To: solr-user@lucene.apache.org Subject: Unexpected character 'lt;' (code 60) expected '=' Hi All I am currently stuck in a Solr Issue while Posting some data to Solr Server. I have some record from Hbase which I am posting to Solr, but after posting some 1 Million of data records, it suddenly stopped. Checking the Catalina log trace it showed, *org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * * * * I am not sure whether its the issue with some malformed data for the posting, because whatever xml file which I am generating before posting I have tried posting that specific file to the solr and its going well. Below is the whole log trace, *SEVERE: org.apache.solr.common.SolrException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)* * at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) * * at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) * * at org.apache.solr.core.SolrCore.execute(SolrCore.java:1398)* * at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) * * at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) * * at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) * * at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) * * at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) * * at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) * * at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) * * at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) * * at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)* * at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)* * at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) * * at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) * * at java.lang.Thread.run(Thread.java:722)* *Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '' (code 60) expected '='* * at [row,col {unknown-source}]: [20281,18]* * at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)* * at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3001) * * at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936) * * at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)* * at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)* * at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:295)* * at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:157)* * at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)* * ... 17 more* * * Has anybody faced this issue. Thanks and Regards Vineet
Group and performing statistics on groups
Hi This is a urgent call, I am grouping the solr documents by a field name and want to get the Range(Min and Max) value for another field in that group. StatsComponent works fine on all the document as whole rendering the max and min of a field, is it possible to get the StatsComponent per group of the solr. Thanks and Regards Vineet
Sorting the Solr Document after clubbing them from multiple instances
Hi I have a Master Solr through which I am querying to multiple solr instance and aggregating their response and responding back to the user. Now the requirement is that when I get the data querying multiple solr instance, I want it to be sorted based on some field name. Say I have 3 Slave Solrs - Solr 1, Solr 2, Solr 3 and I am getting some sorted response to the master's requesthandler as Solr 1 - str name=value5/str str name=value8/str Solr 2 - str name=value6/str str name=value9/str Solr 3 - str name=value2/str str name=value4/str but its not sorted, as in this case I will get the response as str name=value5/str str name=value8/str str name=value6/str str name=value9/str str name=value2/str str name=value4/str but what I want is, merged sorted response as str name=value2/str str name=value4/str str name=value5/str str name=value6/str str name=value8/str str name=value9/str What currently I am thinking to move on is, I will be creating a map kind of thing, with the Field to sort and document, and based on that I will go on with sorting. Is this a good way to go, or there are some other way round as well? Thanks Vineet
Custom RequestHandlerBase XML Response Issue
Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet
Re: Custom RequestHandlerBase XML Response Issue
Thanks for your response Shalin, so does that mean that we can't return a XML object in SolrQueryResponse through Custom RequestHandler? On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet -- Regards, Shalin Shekhar Mangar.
Re: Custom RequestHandlerBase XML Response Issue
But it seems it even have something called XML ResponseWriter https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/XMLResponseWriter.java Wont it be appropriate in my case? Although I have not implemented it yet but how come there couldn't be any way to make a SolrQueryResponse in XML format! On Thu, Jul 18, 2013 at 4:36 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Solr's response writers support only a few known types. Look at the writeVal method in TextResponseWriter: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra clearmido...@gmail.com wrote: Thanks for your response Shalin, so does that mean that we can't return a XML object in SolrQueryResponse through Custom RequestHandler? On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This isn't a Solr issue. Maybe ask on the xerces list? On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra clearmido...@gmail.com wrote: Hi all I am using a Custom RequestHandlerBase where I am querying from multiple different Solr instance and aggregating their output as a XML Document using DOM, now in the RequestHandler's function handleRequestBody(SolrQueryRequest req, SolrQueryResponse resp) I want to output this XML Document to the user as a response, but if I write it as a Document or Node by For Document response.add(grouped, domResult); or response.add(grouped, domNode); its writing to the user For Document com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null] or For Node com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null] Even when the Document is present, because when I convert the Document to String its coming perfectly, but I don't want it as a String rather I want it in a XML format. Please this is very urgent, has anybody worked on this! Regards Vineet -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Custom RequestHandlerBase XML Response Issue
So does that mean there is no way that we can write a XML or JSON object to the SolrQueryResponse and expect it to be formatted?
Re: Custom RequestHandlerBase XML Response Issue
My case is like, I have got a few Solr Instances and querying them and getting their xml response, out of that xml I have to extract a group of specific xml nodes, later I am combining other solr's response into a single xml and making a DOM document out of it. So as you mentioned in your last mail, how can I prepare a combined response for this xml doc and even if I do I don't think it would work because the same I am doing in the RequstHandler. On Thu, Jul 18, 2013 at 6:30 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Okay, let me explain. If you construct your combined response (why are you doing that again?) in the form a Solr NamedList or SolrDocumentList then the XMLResponseWriter (which btw uses TextResponseWriter) has no problem writing it down as XML. The problem here is that you are giving it an object (a DOM Document?) which it doesn't know how to serialize so it just calls .toString on it and writes it out. As long as you stick a known type into the SolrQueryResponse, you should be fine. On Thu, Jul 18, 2013 at 6:24 PM, Vineet Mishra clearmido...@gmail.com wrote: So does that mean there is no way that we can write a XML or JSON object to the SolrQueryResponse and expect it to be formatted? -- Regards, Shalin Shekhar Mangar.