Architect /Lead Custom Cloud Middleware Platforms (APACHE SOLR)
We are very excited and proud to be selected as the only recruiting company assigned to a retained search project for a brand new High Technology Service line( Cloud Technologies) for a Top Tier IT Service and Solutions company. This new service line is very exciting and has many new products lines already partnered with it. These positions offer high visibility with quick advancement opportunities. *1. **Job Role:* Sr. Architect (C2) / Architect (C1) / Tech Lead (B3) – Custom Cloud Middleware Platforms (APACHE SOLR) *2. **Location and Number of Positions* *a.*Mountain View, California *b.*New York City, NY *c.*Houston, TEXAS * * *3. **Job Description * We are looking for Architects who would be responsible for the following activities - Technology evaluation, architecture design, implementation, testing and technical reviews of highly scalable platforms, solutions and technology building blocks to address customer’s requirement. - Lead the overall technical solution implementation as part of customer’s project delivery team - Mentor and groom project team members in the core technology areas, usage of SDLC tools and Agile - Engage in external and internal forums to demonstrate thought leadership *3. Skills Required* - 7+ years of overall work experience with Large Enterprises / Technology Vendors / ISVs / Service Providers. - Strong hands-on work experience in systems integration and custom engineering of Infrastructure / Middleware platforms leveraging Linux Open Source technologies using Agile methodologies is required. - Experience in supporting troubleshooting deployments in production environments is highly preferred. Strong work experience in more than one of the below technology areas is required. Programming Languages Python, Ruby, Core Java, J2EE, JDBC, Spring, Struts, Scripting Languages Distributed Systems Cloud Computing, Grid Computing, Cluster Computing, Distributed File Systems, High Speed Messaging, Distributed Caching Server Virtualization Cloud Stacks KVM, Xen, ESX, OpenStack, CloudStack, RHEV, Eucalyptus, Amazon Web Services, Azure Cloud Services Management Automation tools Hyperic, Nagios, OpenNMS, Cobbler, Puppet, Chef Middleware Tools Splunk, Esper, Solr, Thrift, RabbitMQ, Zookeeper, memcached Storage Technologies SAN, NAS, JBOD, CIFS, Replication, Storage Management Networking DNS, DHCP, NAT, Firewall, Routing, Switching, Load Balancers, VLAN, VPN If you are interested , please respond with a current resume to * jess...@kudukisgroup.com*. I will give you a call to speak in further detail. We keep all information confidential. Feel free to reply with any questions. Thanks, ~ Jessica
Found child node with improper name
I have this warning when I try to create a collection and the collection is not created. Apr 01, 2013 10:05:26 AM org.apache.solr.handler.admin.CollectionsHandler handleCreateAction INFO: Creating Collection : collection.configName=statisticsBucket-archivemaxShardsPerNode=3name=ST-ARCHIVE_07replicationFactor=2action=CREATE Apr 01, 2013 10:05:26 AM org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process INFO: Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged Apr 01, 2013 10:05:26 AM org.apache.solr.cloud.DistributedQueue orderedChildren WARNING: Found child node with improper name: qnr-02 Apr 01, 2013 10:05:26 AM org.apache.solr.cloud.OverseerCollectionProcessor run INFO: Overseer Collection Processor: Get the message id:/overseer/collection-queue-work/qn-02 message:{ operation:createcollection, numShards:null, maxShardsPerNode:3, collection.configName:statisticsBucket-archive, createNodeSet:null, name:ST-ARCHIVE_07, replicationFactor:2} - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Found-child-node-with-improper-name-tp4052855.html Sent from the Solr - User mailing list archive at Nabble.com.
highlight on same field with different fragsize
We use Solr4.2 in our application. We need to return highlight on same field 2 times, with different fragsize. Solr allows to highlight on different fields with different fragsize as mentioned below , but do not work with same fields http://localhost:8080/solr/select?q=my searchhl=onhl.fl=content1,content2f.content1.hl.fragsize=400f.content2.hl.fragsize=100 Solr4.0 also apply alias for field, but it do not seems to work for highlighting. Can anybody suggest me on this, how to achieve it? -- View this message in context: http://lucene.472066.n3.nabble.com/highlight-on-same-field-with-different-fragsize-tp4052863.html Sent from the Solr - User mailing list archive at Nabble.com.
solr4.1 No live SolrServers available to handle this request
hi,all. I am new to Solr. when i query solrcloud4.1 with solrj, the client throws exceptions as follows. there are 2 shards in my solrcloud. each shard is on a server with 4cpu/3G RAM, and jvm has 2G ram. when the query requests get more and more, the exception occers. [java] org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request [java] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:486) [java] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) [java] at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) [java] at com.netease.index.service.impl.SearcherServiceImpl.search(Unknown Source) [java] at com.netease.index.util.ConSearcher.run(Unknown Source) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:662) [java] Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://cms.test.com/solr/doc [java] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416) [java] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) [java] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:439) [java] ... 7 more [java] Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool [java] at org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:416) [java] at org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:299) [java] at org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:242) [java] at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:455) [java] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) [java] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) [java] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) [java] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353) [java] ... 9 more ps: lbhttpsolrserver seems to allocate task imbalance...some node get a much heavy load, while others may be not. i use nginx so that task could be more controllable. is this right? please help me out, Thank you in advance. ^_^ -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-1-No-live-SolrServers-available-to-handle-this-request-tp4052862.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Phonetic Search Highlight issue in search results
Good question, you're causing me to think... about code I know very little about G. So rather than spouting off, I tried it and.. it works fine for me, either with or without using fast vector highlighter on, admittedly, a very simple test. So I think I'd try peeling off all the extra stuff you've put into your configs (sorry, I don't have time right now to try to reproduce) and get the very simple case working, then build the rest back up and see where the problem begins. Sorry for the mis-direction! Erick On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar soumyanayan@rebaca.com wrote: Hi Erick, Thanks for the reply. But help me understand this: If Solr is able to isolate the two documents which contain the term fact being the phonetic equivalent of the search term fakt, then why will it be unable to highlight the terms based on the same logic it uses to search the documents. Also, it is correctly highlighting the results in other searches which are also approximate searches and not exact ones for eg. Fuzzy or Synonym search. In these cases also the highlights in the search results are far from the actual search term but still they are getting correctly highlighted. Maybe I am getting it completely wrong but it looks like there is something wrong with my implementation. Thanks Regards, Soumya. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 27 March 2013 06:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr Phonetic Search Highlight issue in search results How would you expect it to highlight successfully? The term is fakt, there's nothing built in (and, indeed couldn't be) to un-phoneticize it into fact and apply that to the Content field. The whole point of phonetic processing is to do a lossy translation from the word into some variant, losing precision all the way. So this behavior is unsurprising... Best Erick On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com wrote: When we are issuing a query with Phonetic Search, it is returning the correct documents but not returning the highlights. When we use Stemming or Synonym searches we are getting the proper highlights. For example, when we execute a phonetic query for the term fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it returns two documents containing the term fact(phonetic token equivalent), but the list of highlights is empty as shown in the response below. response lst name=responseHeader int name=status0/int int name=QTime16/int lst name=params str name=qContentSearchPhonetic:fakt/str str name=wtxml/str /lst /lst result name=response numFound=2 start=0 doc long name=DocId1/long str name=DocTitleDoc 1/str str name=ContentAnyway, this game was excellent and was well worth the time. The graphics are truly amazing and the sound track was pretty pleasant also. The preacher was in fact a thief./str long name=_version_1430480998833848320/long /doc doc long name=DocId2/long str name=DocTitleDoc 2/str str name=Contentstunning. The preacher was in fact an excellent thief who had stolen the original manuscript of Hamlet from an exhibit on the Riviera, where he also acquired his remarkable and tan./str long name=_version_1430480998841188352/long /doc /result lst name=highlighting lst name=1/ lst name=2/ /lst /response Relevant section of Solr schema: field name=DocId type=long indexed=true stored=true required=true/ field name=DocTitle type=string indexed=false stored=true required=true/ field name=Content type=text_general indexed=false stored=true required=true/ field name=ContentSearch type=text_general indexed=true stored=false multiValued=true/ field name=ContentSearchStemming type=text_stem indexed=true stored=false multiValued=true/ field name=ContentSearchPhonetic type=text_phonetic indexed=true stored=false multiValued=true/ field name=ContentSearchSynonym type=text_synonym indexed=true stored=false multiValued=true/ uniqueKeyDocId/uniqueKey copyField source=Content dest=ContentSearch/ copyField source=Content dest=ContentSearchStemming/ copyField source=Content dest=ContentSearchPhonetic/ copyField source=Content dest=ContentSearchSynonym/ fieldType name=text_stem class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SnowballPorterFilterFactory/ /analyzer /fieldType fieldType name=text_phonetic class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PhoneticFilterFactory
query regarding running solr4.1.0 on tomcat6
hi all I had installed tomcat6 on centos redhat linux os and had configured solr with name on solrt on tomcat and It was running fine now what I did was placed another copy of solr home folder in centos and changed the tomcat directory to this new solr and now every thing is working fine like the full database import and all from the browser and query from browser but when I open the solr-example/admin(the default solr admin panel ) from browser it shows the error that : http://localhost:8080/solr-example/#/ HTTP Status 404 - -- *type* Status report *message* *description* *The requested resource () is not available.* -- Apache Tomcat/6.0.24 and other wise when I hit http://localhost:8080/solr-example/collection1/select?q=samsung%20duoswt=jsonindent=truerows=20 its running fine and even if i hit http://localhost:8080/solr-example/dataimport?command=full-importindent=trueclean=true its running fine and even in the tomcat manager panal I can see solr-example and when I click on it shows the same error. 404 what could be the problem with the solr admin panel help anyone. thanks regards rohan
Re: Need Help in Patching OPENNLP
Here's the start-up page: http://wiki.apache.org/solr/HowToContribute First, just check out the code via svn and build it (see the page above). That'll tell you if you have all the tools available. Second, apply the patch to the source. From the root of your source, 'patch -p0 -i patch name' Third, execute ant example dist and that should build you source with the patch in place... If you get stuck, let us know what problems you are having, specific errors you're receiving, all that kind of stuff Best Erick On Fri, Mar 29, 2013 at 7:42 AM, karthicrnair karthicrn...@gmail.com wrote: Hi All, am very new to solr and Java technology. I would wonder if some one can gimme a way out to patch the OpenNLP platform with Solr. Am simply blocked out at the initial step, applying patch to Solr 4.2. Any pointer would be highly appreciated. Thanks, Karthic -- View this message in context: http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Realtime updates solrcloud
Does the 30 second interval persist for a long time after you stop your queries? It's possible that your requests are queueing up and you have a bunch of search in the queue in front of the update Best Erick On Fri, Mar 29, 2013 at 8:33 AM, roySolr royrutten1...@gmail.com wrote: Hello Guys, I want to use the realtime updates mechanism of solrcloud. My setup is as follow: 3 solr engines, 3 zookeeper instances(ensemble) The setup works great, recovery, leader election etc. The problem is the realtime updates, it's slow after the servers gets some traffic. I try to explain it: I test the realtime update with the following command: *curl http://SOLRURL:SOLRPORT/solr/update -H Content-Type: text/xml --data-binary 'adddocfield name=id3504811/fieldfield name=websitehttp://www.google.nl/add/doc'* I see this in logs of solr server: *Mar 29, 2013 12:38:51 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [collection1] webapp=/solr path=/update params={} {add=[3504811 (1430841858290876416)]} 0 35 * The other solr servers get the following lines in the log: *INFO: [collection1] webapp=/solr path=/update params={distrib.from=http://SOLRIP:SOLRPORT/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[3504811 (1430844456234385408)]} 0 14* This looks good, the doc is added and the leader send this doc to the other solr servers. First times it takes 1 sec to make the update visible:) When i send some traffic to the server(200q/s), the update takes +- 30 sec to make it visible. I stopped the traffic it's still takes 30 sec's to make the update visible. How is it possible? The solrconfig parts: *autoCommit maxTime60/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit* Did i miss something? Best Regards, Roy -- View this message in context: http://lucene.472066.n3.nabble.com/Realtime-updates-solrcloud-tp4052370.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: had query regarding the indexing and analysers
hi does this means that while indexing also ace is been stored as ac in solr index? thanks regards Rohan On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky j...@basetechnology.comwrote: Actually, it's the Porter Stemmer that is turning ace into ac. Try making a copy of text_en_splitting and delete the PorterStemFilterFactory filter from both the query and index analyzers. -- Jack Krupansky -Original Message- From: Rohan Thakur Sent: Wednesday, March 20, 2013 8:39 AM To: solr-user@lucene.apache.org Subject: Re: had query regarding the indexing and analysers hi jack I have been using text_en_splitting initially but what it was doing is it is changing by query aswell for example: if i am searching for ace term it is taking it as ac thus giving split ac higher score... see debug statment: debug:{ rawquerystring:ace, querystring:ace, parsedquery:(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord, parsedquery_toString:+(**title:ac^30.0), explain:{ :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 469, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=469)\n, :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 470, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=470)\n, :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 471, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=471)\n, :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 472, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=472)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 331, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=331)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 332, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=332)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 335, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=335)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 336, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=336)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 337, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=337)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 393, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=393)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 425, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=425)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 426, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=426)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 429, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=429)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 430, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39,
Re: had query regarding the indexing and analysers
Yes, if there is only a single analyzer or an index analyzer is specified and the Porter stemmer is used in it. -- Jack Krupansky -Original Message- From: Rohan Thakur Sent: Monday, April 01, 2013 9:13 AM To: solr-user@lucene.apache.org Subject: Re: had query regarding the indexing and analysers hi does this means that while indexing also ace is been stored as ac in solr index? thanks regards Rohan On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky j...@basetechnology.comwrote: Actually, it's the Porter Stemmer that is turning ace into ac. Try making a copy of text_en_splitting and delete the PorterStemFilterFactory filter from both the query and index analyzers. -- Jack Krupansky -Original Message- From: Rohan Thakur Sent: Wednesday, March 20, 2013 8:39 AM To: solr-user@lucene.apache.org Subject: Re: had query regarding the indexing and analysers hi jack I have been using text_en_splitting initially but what it was doing is it is changing by query aswell for example: if i am searching for ace term it is taking it as ac thus giving split ac higher score... see debug statment: debug:{ rawquerystring:ace, querystring:ace, parsedquery:(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord, parsedquery_toString:+(**title:ac^30.0), explain:{ :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 469, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=469)\n, :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 470, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=470)\n, :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 471, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=471)\n, :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472) [DefaultSimilarity], result of:\n 1.8650155 = fieldWeight in 472, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n 0.4375 = fieldNorm(doc=472)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 331, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=331)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 332, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=332)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 335, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=335)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 336, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=336)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 337, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=337)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 393, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=393)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 425, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=425)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 426, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=426)\n, :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429) [DefaultSimilarity], result of:\n 1.5985848 = fieldWeight in 429, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375 = fieldNorm(doc=429)\n, :\n1.5985848 =
Re: secure deployment of solr.war on jboss
Hi Ali, We have Solr 4.2 on Jboss running on a separate VM behind firewall. Only IT Administration and our FrontEnd Application Server is able to access the Solr servers in production. -- View this message in context: http://lucene.472066.n3.nabble.com/secure-deployment-of-solr-war-on-jboss-tp4052754p4052899.html Sent from the Solr - User mailing list archive at Nabble.com.
How does solr 4.2 do in returning large datasets ?
I thought I remembered reading that Solr is not good for returning large datasets. We are currently using lucene 3.6.0 and returning datasets of 10,000 to 60,000 results. In the future we might need to return even larger datasets. Would you all recommend going to Solr for this, or should we stick with Lucene (which has given us no problems in this regard)? I am a bit wary of using a web service to return datasets of this size. Thanks a lot Liz lizswo...@gmail.com
Re: secure deployment of solr.war on jboss
Thanks. Are you using IP tables firewall on the jboss to prevent access from other systems? Or are you using some jboss configuration for that? Thanks, Saqib On Mon, Apr 1, 2013 at 6:25 AM, adityab aditya_ba...@yahoo.com wrote: Hi Ali, We have Solr 4.2 on Jboss running on a separate VM behind firewall. Only IT Administration and our FrontEnd Application Server is able to access the Solr servers in production. -- View this message in context: http://lucene.472066.n3.nabble.com/secure-deployment-of-solr-war-on-jboss-tp4052754p4052899.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How does solr 4.2 do in returning large datasets ?
It really depends on what you are returning (how big is each document? Just a document ID? Pages and pages of data in fields?). It can take a long time for Solr to render an XML with 60,000 results. Solr will be serializing the data and then you'd (presumably) be de-serializing it. Depending on how big each field actually is, this could take a while or even cause DOS on your server. Your client would also need a fair bit of memory to parse a document with 60,000 results -Original Message- From: Liz Sommers [mailto:lizswo...@gmail.com] Sent: Monday, April 01, 2013 9:39 AM To: solr-user Subject: How does solr 4.2 do in returning large datasets ? I thought I remembered reading that Solr is not good for returning large datasets. We are currently using lucene 3.6.0 and returning datasets of 10,000 to 60,000 results. In the future we might need to return even larger datasets. Would you all recommend going to Solr for this, or should we stick with Lucene (which has given us no problems in this regard)? I am a bit wary of using a web service to return datasets of this size. Thanks a lot Liz lizswo...@gmail.com
Re: solr4.1 No live SolrServers available to handle this request
Check the Solr logs for Zookeeper disconnects. It could be that as load is increasing Solr is not able to respond to the Zookeeper pings which would bring the nodes offline. If you see Zookeeper disconnects then you can increase the zkClientTimeout set in solr.xml. But be aware that zk disconnects can also be a sign that your servers are overload and/or under resourced. Memory starvation and stop the world GC can often be the cause of zk disconnects. On Mon, Apr 1, 2013 at 6:18 AM, sling sling...@gmail.com wrote: hi,all. I am new to Solr. when i query solrcloud4.1 with solrj, the client throws exceptions as follows. there are 2 shards in my solrcloud. each shard is on a server with 4cpu/3G RAM, and jvm has 2G ram. when the query requests get more and more, the exception occers. [java] org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request [java] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:486) [java] at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) [java] at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) [java] at com.netease.index.service.impl.SearcherServiceImpl.search(Unknown Source) [java] at com.netease.index.util.ConSearcher.run(Unknown Source) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:662) [java] Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://cms.test.com/solr/doc [java] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416) [java] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) [java] at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:439) [java] ... 7 more [java] Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool [java] at org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:416) [java] at org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:299) [java] at org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:242) [java] at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:455) [java] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) [java] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) [java] at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) [java] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353) [java] ... 9 more ps: lbhttpsolrserver seems to allocate task imbalance...some node get a much heavy load, while others may be not. i use nginx so that task could be more controllable. is this right? please help me out, Thank you in advance. ^_^ -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-1-No-live-SolrServers-available-to-handle-this-request-tp4052862.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: highlight on same field with different fragsize
Why do you want to do this? Can't you just take the longer frag size and cut it down for display in the app? Best Erick On Mon, Apr 1, 2013 at 6:33 AM, meghana meghana.rav...@amultek.com wrote: We use Solr4.2 in our application. We need to return highlight on same field 2 times, with different fragsize. Solr allows to highlight on different fields with different fragsize as mentioned below , but do not work with same fields http://localhost:8080/solr/select?q=my searchhl=onhl.fl=content1,content2f.content1.hl.fragsize=400f.content2.hl.fragsize=100 Solr4.0 also apply alias for field, but it do not seems to work for highlighting. Can anybody suggest me on this, how to achieve it? -- View this message in context: http://lucene.472066.n3.nabble.com/highlight-on-same-field-with-different-fragsize-tp4052863.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DocValues vs stored fields?
Otis, DocValues are quite insufficient for true field updates. DocValues is a per-document value storage (hence the name); it's not uninverted/indexed. If you needed to search based on these values (e.g. find all docs that have this value or between these values) then that's not going to work. The most promising field update work going on right now is https://issues.apache.org/jira/browse/LUCENE-4258 Incremental Field Updates through Stacked Segments. In my opinion, that's the most exciting thing happening in Lucene right now; but it appears stalled a little. I do think a DocValues based hack could make a better replacement for Solr's ExternalizableFileField. It's for use in FunctionQueries. Another questioner asked essentially why a field that has DocValues won't have its value shown when the field is marked stored=false since the value is stored per-document after all. True, the disparity here is a bit confusing. DocValues are not intended as a replacement for stored fields in places where you are using stored fields now. It's basically to improve the performance and memory use of function queries, sorting, and faceting. It's the new FieldCache under a different name, but hasn't strictly replaced the FC (yet). It's not enabled by default because it creates new data on disk and Solr doesn't know that you want to use it. As of Solr 4.2, DocValues is also multi-valued -- awesome! All this said, I do think there's room for a proposed Solr DocTransformer to expose the DocValues value as if it were a stored field in your search results. Actually... I wish if you explicitly ask for the field, and it's not stored, then it would just go use docValues automatically. That'd be cool! ~ David Otis Gospodnetic-5 wrote Hi, The current field update mechanism is not really a field update mechanism. It just looks like that from the outside. DocValues should make true field updates implementable. Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki lt; mrzewucki@ gt; wrote: Hi, Atomic updates (single field updates) do not depend on DocValues. They were implemented in Solr4.0 and works fine (but all fields have to be retrievable). DocValues are supposed to be more efficient than FieldCache. Why not enabled by default ? Maybe because they are not for all fields and because of their limitations (a field has to be single-valued, required or to have default value). Regards. On 29 March 2013 17:20, Timothy Potter lt; thelabdude@ gt; wrote: Hi Jack, I've just started to dig into this as well, so sharing what I know but still some holes in my knowledge too. DocValues == Column Stride Fields (best resource I know of so far is Simon's preso from Lucene Rev 2011 - http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues ). It's pretty dense but some nuggets I've gleaned from this are: 1) DocValues are more efficient in terms of memory usage and I/O performance for building an alternative to FieldCache (slide 27 is very impressive) 2) DocValues has a more efficient way to store primitive types, such as packed ints 3) Faster random access to stored values In terms of switch-over, you have to re-index to change your fields to use DocValues on disk, which is why they are not enabled by default. Lastly, another goal of DocValues is to allow updates to a single field w/o re-indexing the entire doc. That's not implemented yet but I think still planned. Cheers, Tim On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky lt; jack@ gt; wrote: I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are? Are DocValues simply an alternative to “stored fields”? If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields? And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues? In short, when should a user use DocValues over stored fields, and vice versa? As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr. Thanks. P.S., And if I actually want to do Column Stride Fields, is there a way to do that? -- Jack Krupansky - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/DocValues-vs-stored-fields-tp4052406p4052966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Hi Shawn, I tried optimizing using this command... curl ' http://10.7.233.54:8088/solr/update?optimize=truemaxSegments=10waitFlush=true' And i got this response within secs... ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime840/int/lst /response Is this a valid response that one should get ? I checked the statistics link from /solr/admin page and it shows the number segments got updated. Would this be a good indication that optimization is complete ? At the same time - I even noticed the number of files in data/index directory hasn't reduced all files are not updated. Since it took just couple of secs for the response(even with waitFlush=true) - i am doubting if optimization really happened , but details on statistics page shows me correct number of segments. On Tue, Mar 12, 2013 at 8:34 PM, Shawn Heisey-4 [via Lucene] ml-node+s472066n4046834...@n3.nabble.com wrote: On 3/12/2013 4:17 PM, feroz_kh wrote: Do we really need to optimize in order to reformat ? The alternative would be to start with an empty index and just reindex your data. That is actually the best way to go, if that option is available to you. If yes, What is the best way of optimizing index - Online or Offline ? Can we do it online ? If yes - 1. What is the http request which we can use to invoke optimization - How long it takes ? 2. What is the command line command to invoked optimization - How long this one takes ? The only way I know of to optimize an index that's offline is using Luke, but it is difficult to find versions of Luke that work with indexes after 4.0-ALPHA - the official Luke page doesn't have any newer versions, and I have no idea why. Online is better. Solr 4.2 just got released, you may want to consider skipping 4.1 and going with 4.2. There would be no major speed difference between doing it offline or online. Whatever else the machine is doing might be a factor. I can only make guesses about how long it will take. You say your index in 3.5 is 14GB. I have experience with indexes that are 22GB in 3.5, which takes 11 minutes to optimize. The equivalent index in 4.2 is 14GB and takes 14 minutes, because of the extra compression/decompression step. This is on RAID10, volumes with no RAID or with other RAID levels would be slower. Also, if the structure of your index is significantly different than mine, yours might go faster or slower than the size alone would suggest. There is a curl command that optimizes the index in the wiki: http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_and_commitWithin_parameters_as_part_of_the_URL You would want to leave off the maxSegments option so it optimizes down to one segment. Whether to include waitFlush is up to you, but if you don't include it, you won't know exactly when it finishes unless you are looking at the index directory. Thanks, Shawn -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4046834.html To unsubscribe from Upgrade Solr3.5 to Solr4.1 - Index Reformat ?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4046391code=ZmVyb3oua2gyMDAwQGdtYWlsLmNvbXw0MDQ2MzkxfDIwNzA2NTYxOTI= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4052969.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Hi Shawn, I tried optimizing using this command... curl 'http://localhost:/solr/update?optimize=truemaxSegments=10waitFlush=true' And i got this response within secs... ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime840/int/lst /response Is this a valid response that one should get ? I checked the statistics link from /solr/admin page and it shows the number segments got updated. Would this be a good indication that optimization is complete ? At the same time - I even noticed the number of files in data/index directory hasn't reduced all files are not updated. Since it took just couple of secs for the response(even with waitFlush=true) - i am doubting if optimization really happened , but details on statistics page shows me correct number of segments. -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4052970.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Making tika process mail attachments eludes me
: I believe that the handling of the multipart MIME lacks some error checking, and : it is probably related to the content outside the MIME boundaries (in my : example, the text This is a multi-part message in MIME format.): : : I really hope that some SOLR developer can have a look, we cannot be the only : ones having this problem. And I've spent almost twenty hours debugging this. I am largely unfamiliar with the MailEntityProcessor, but IIRC it has not recieved much love over the years due to the lack of automated tests -- I believe all of the existing tests are disabled by default because they require an external IMAP server. If anyone is interested in helping to contribute some tests that could be automated by using some sort of mock IMAP server library, that would go a long way towards being able to verify correctness make improvemenets (even for people like me who are not familiar with the code and haven't thought very hard about MIME encapsulation in over 15 years) -Hoss
Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?
Also, Is it absolutely necessary to set the maxSegments=1 , if we need to reformat the whole index ? -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4052991.html Sent from the Solr - User mailing list archive at Nabble.com.
Filtering Search Cloud
I want to separate my cloud into two logical parts. One of them is indexer cloud of SolrCloud. Second one is Searcher cloud of SolrCloud. My first question is that. Does separating my cloud system make sense about performance improvement. Because I think that when indexing, searching make time to response and if I separate them I get a performance improvement. On the other hand maybe using all Solr machines as whole (I mean not partitioning as I mentioned) SolrCloud can make a better load balancing, I would want to learn it. My second question is that. Let's assume that I have separated my machines as I mentioned. Can I filter some indexes to be searchable or not from Searcher SolrCloud?
Top 10 Terms in Index (by date)
Our company has an application that is Facebook-like for usage by enterprise customers. We'd like to do a report of top 10 terms entered by users over (some time period). With that in mind I'm using the DataImportHandler to put all the relevant data from our database into a Solr 'content' field: field name=content type=text_general indexed=true stored=false multiValued=false required=true termVectors=true/ Along with the content is the 'dateCreated' for that content: field name=dateCreated type=tdate indexed=true stored=false multiValued=false required=true/ I'm struggling with the TermVectorComponent documentation to understand how I can put together a query that answers the 'report' mentioned above. For each document I need each term counted however many times it is entered (content of I think what I think would report 'think' as used twice). Does anyone have any insight as to whether I'm headed in the right direction and then what my query would be? Thanks, Andy Pickler
Re: How does solr 4.2 do in returning large datasets ?
Don't forget you could also tell Solr to return a CSV in response, which will be much lighter in terms of response size, though disk IO would still be there. Is returning large result sets recommended? No. :) But the same can be said for Lucene. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Apr 1, 2013 at 10:26 AM, Swati Swoboda sswob...@igloosoftware.com wrote: It really depends on what you are returning (how big is each document? Just a document ID? Pages and pages of data in fields?). It can take a long time for Solr to render an XML with 60,000 results. Solr will be serializing the data and then you'd (presumably) be de-serializing it. Depending on how big each field actually is, this could take a while or even cause DOS on your server. Your client would also need a fair bit of memory to parse a document with 60,000 results -Original Message- From: Liz Sommers [mailto:lizswo...@gmail.com] Sent: Monday, April 01, 2013 9:39 AM To: solr-user Subject: How does solr 4.2 do in returning large datasets ? I thought I remembered reading that Solr is not good for returning large datasets. We are currently using lucene 3.6.0 and returning datasets of 10,000 to 60,000 results. In the future we might need to return even larger datasets. Would you all recommend going to Solr for this, or should we stick with Lucene (which has given us no problems in this regard)? I am a bit wary of using a web service to return datasets of this size. Thanks a lot Liz lizswo...@gmail.com
Use of SolrJettyTestBase
I'm attempting to use SolrJettyTestBase to test a simple app that pushes content to Solr. I've subclassed SolrJettyTestBase, and added a test method (annotated with @test). However, my test method is never called. I see the following: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/upayavira/.m2/repository/org/slf4j/slf4j-log4j12/1.7.2/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/upayavira/.m2/repository/org/slf4j/slf4j-jdk14/1.6.4/slf4j-jdk14-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/upayavira/apache/lucene/solr/example/solr-webapp/webapp/WEB-INF/lib/slf4j-jdk14-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/upayavira/apache/lucene/solr/solrj/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Test class requires enabled assertions, enable globally (-ea) or for Solr/Lucene subpackages only: com.odoko.ArgPostActionTest NOTE: test params are: codec=null, sim=null, locale=null, timezone=(null) NOTE: Mac OS X 10.8.2 x86_64/Oracle Corporation 1.7.0_17 (64-bit)/cpus=8,threads=1,free=85043008,total=96665600 NOTE: All tests run in this JVM: [ArgPostActionTest] NOTE: reproduce with: ant test -Dtestcase=ArgPostActionTest -Dtests.seed=73C46F7785759CEF -Dtests.file.encoding=MacRoman I'm sure I'm missing something obvious. Any ideas/suggestions as to what I need to do to make my test actually run?? Upayavira
Re: Top 10 Terms in Index (by date)
So you have one document per user comment? Why not use faceting plus filtering on the dateCreated field? That would count number of documents for each term (so, in your case, if a term is used twice in one comment it would only count once). Is that what you are looking for? Tomás On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com wrote: Our company has an application that is Facebook-like for usage by enterprise customers. We'd like to do a report of top 10 terms entered by users over (some time period). With that in mind I'm using the DataImportHandler to put all the relevant data from our database into a Solr 'content' field: field name=content type=text_general indexed=true stored=false multiValued=false required=true termVectors=true/ Along with the content is the 'dateCreated' for that content: field name=dateCreated type=tdate indexed=true stored=false multiValued=false required=true/ I'm struggling with the TermVectorComponent documentation to understand how I can put together a query that answers the 'report' mentioned above. For each document I need each term counted however many times it is entered (content of I think what I think would report 'think' as used twice). Does anyone have any insight as to whether I'm headed in the right direction and then what my query would be? Thanks, Andy Pickler
Getting started with solr 4.2 and cassandra
Hello, I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search API, where data sits in cassandra. Getting started with elasticsearch is pretty straight forward and I was able to write an ES riverhttp://www.elasticsearch.org/guide/reference/river/ which pulls data from cassandra and indexes it in ES within a day. Now, I trying to implement something similar with solr and compare both of them. Getting started with solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas pretty easy and an example solr instance works. But the example folder contains whole bunch of stuff which I am not sure if I need: http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and 527 files So my questions are: 1. How can I create a bare bone solr app up and running with minimum set of configuration? (I will build over it when needed by taking reference from /example) 2. What is a best practice to run solr in production? Am approach like this jetty+nginx recommended: http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ? Once I am done setting up a simple solr instance: 3. What is the general practice to import data to solr? For now, I am writing a python script which will read data in bulk from cassandra and throw it to solr. -- Thanks, -Utkarsh
Re: Getting started with solr 4.2 and cassandra
You might want to check out DataStax Enterprise, which actually integrates Cassandra and Solr. You keep the data in Cassandra, but as data is added and updated and deleted, the Solr index is automatically updated in parallel. You can add and update data and query using either the Cassandra API or the Solr API. See: http://www.datastax.com/what-we-offer/products-services/datastax-enterprise -- Jack Krupansky -Original Message- From: Utkarsh Sengar Sent: Monday, April 01, 2013 6:34 PM To: solr-user@lucene.apache.org Subject: Getting started with solr 4.2 and cassandra Hello, I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search API, where data sits in cassandra. Getting started with elasticsearch is pretty straight forward and I was able to write an ES riverhttp://www.elasticsearch.org/guide/reference/river/ which pulls data from cassandra and indexes it in ES within a day. Now, I trying to implement something similar with solr and compare both of them. Getting started with solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas pretty easy and an example solr instance works. But the example folder contains whole bunch of stuff which I am not sure if I need: http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and 527 files So my questions are: 1. How can I create a bare bone solr app up and running with minimum set of configuration? (I will build over it when needed by taking reference from /example) 2. What is a best practice to run solr in production? Am approach like this jetty+nginx recommended: http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ? Once I am done setting up a simple solr instance: 3. What is the general practice to import data to solr? For now, I am writing a python script which will read data in bulk from cassandra and throw it to solr. -- Thanks, -Utkarsh
Re: Use of SolrJettyTestBase
: I've subclassed SolrJettyTestBase, and added a test method (annotated : with @test). However, my test method is never called. I see the You got an immediate failure from the tests setup, because you don'th ave assertions enabled in your JVM (the Lucene Solr test frameworks both require assertions enabled to run tests because so many important things can't be sanity checked w/o them)... : Test class requires enabled assertions, enable globally (-ea) or for : Solr/Lucene subpackages only: com.odoko.ArgPostActionTest FYI: in addition to that txt being written to System.err, it would have immediately been thrown as an Exception as well. (see TestRuleAssertionsRequired.java) -Hoss
Re: Getting started with solr 4.2 and cassandra
Thanks for the reply. So DSE is one of the options and I am looking into that too. Although, before diving into solr+cassandra integration (which comes out of the box with DSE). I am just trying to setup a solr instance on my local machine without the bloat the example solr instance has to offer. Any suggestions about that? Thanks, -Utkarsh On Mon, Apr 1, 2013 at 4:00 PM, Jack Krupansky j...@basetechnology.comwrote: You might want to check out DataStax Enterprise, which actually integrates Cassandra and Solr. You keep the data in Cassandra, but as data is added and updated and deleted, the Solr index is automatically updated in parallel. You can add and update data and query using either the Cassandra API or the Solr API. See: http://www.datastax.com/what-**we-offer/products-services/** datastax-enterprisehttp://www.datastax.com/what-we-offer/products-services/datastax-enterprise -- Jack Krupansky -Original Message- From: Utkarsh Sengar Sent: Monday, April 01, 2013 6:34 PM To: solr-user@lucene.apache.org Subject: Getting started with solr 4.2 and cassandra Hello, I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search API, where data sits in cassandra. Getting started with elasticsearch is pretty straight forward and I was able to write an ES riverhttp://www.**elasticsearch.org/guide/**reference/river/http://www.elasticsearch.org/guide/reference/river/ which pulls data from cassandra and indexes it in ES within a day. Now, I trying to implement something similar with solr and compare both of them. Getting started with solr/examplehttp://lucene.**apache.org/solr/4_2_0/**tutorial.htmlhttp://lucene.apache.org/solr/4_2_0/tutorial.html was pretty easy and an example solr instance works. But the example folder contains whole bunch of stuff which I am not sure if I need: http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and 527 files So my questions are: 1. How can I create a bare bone solr app up and running with minimum set of configuration? (I will build over it when needed by taking reference from /example) 2. What is a best practice to run solr in production? Am approach like this jetty+nginx recommended: http://sacharya.com/nginx-**proxy-to-jetty-for-java-apps/http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/? Once I am done setting up a simple solr instance: 3. What is the general practice to import data to solr? For now, I am writing a python script which will read data in bulk from cassandra and throw it to solr. -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Solr Multiword Search
We have a catalog of media content which is ingested into solr. We are trying to do a spell check on the title of the catalog item, to make sure that the client is able to correctly predict and correct the (mis)typed text. The requirement is that corrected text match a title in the catalog. I have been playing around with spellcheck component and the handler on SOLR 4.2 . solrconfig.xml -- searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_spell/str lst name=spellchecker str name=namedefault/str str name=fieldmySpell/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength4/int float name=maxQueryFrequency0.01/float /lst /searchComponent queryConverter name=queryConverter class=com.foo.MultiWordSpellingQueryConverter/ requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfmySpell/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count10/str str name=spellcheck.alternativeTermCount5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries10/str str name=spellcheck.maxCollations10/str /lst arr name=last-components strspellcheck/str /arr /requestHandler schema.xml types fieldType name=text_spell class=solr.TextField sortMissingLast=true omitNorms=true omitTermFreqAndPositions=true analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType /types fields field name=mySpell type=text_spell indexed=true stored=true multiValued=true / /fields copyField source=title dest=mySpell / Notice that I am using a custom QueryConverter, with definitions as follows: /* MultiWordSpellingQueryConverter.java */ package com.foo; import org.apache.log4j.Logger; import org.apache.lucene.analysis.Token; import org.apache.solr.spelling.QueryConverter; public class MultiWordSpellingQueryConverter extends QueryConverter { private static Logger log = Logger.getLogger(MultiWordSpellingQueryConverter.class); static { System.out.println(* Loading class MultiWordSpellingQueryConverter); log.fatal(* Loading class MultiWordSpellingQueryConverter); } /** * Converts the original query string to a collection of Lucene Tokens. * * @param original the original query string * @return a Collection of Lucene Tokens */ public CollectionToken convert( String original ) { if ( original == null ) { return Collections.emptyList(); } System.out.println(Original String : +original); log.error(Original String : +original); final Token token = new Token( original.toCharArray(), 0, original.length(), 0, original.length() ); return Arrays.asList( token ); } } I have followed directions as per another thread : http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tt3265257.html#a3281189 , because I feel this is what I really want. I have tried both placing the jar in the ${solr.home}/lib directory and un-jarring solr.war and adding the jar file created with the above Java compiled code into the WEB-INF/lib directory and re jarring it and placing it in the web-server deploy directory. I cannot tell if this file is even being invoked at spellcheck time. I have queryConverter tag defined in the solrconfig.xml file (refer to the solrconfig.xml definitions above). Query: http://localhost/solr/spell?q=((title:(charles%20and%20the%20chocolate%20factory)))spellcheck.q=charles%20and%20the%20chocolat%20factoryspellcheck=truespellcheck.collate=true Of course I have spelt charles incorrectly. There in fact exists in the catalog, a title with the name Charlie and the chocolate factory and the above query does not find it nor collate well enough to correct the spelling. I believe the error distance (or edits) is about 2. Charles should be spelt
Re: Getting started with solr 4.2 and cassandra
The Solr example really is rather simple. Download, unzip, run, add data, query. It's really that simple. Make sure you are looking at the Solr tutorial: http://lucene.apache.org/solr/4_2_0/tutorial.html Download from here: http://lucene.apache.org/solr/tutorial.html -- Jack Krupansky -Original Message- From: Utkarsh Sengar Sent: Monday, April 01, 2013 7:41 PM To: solr-user@lucene.apache.org Subject: Re: Getting started with solr 4.2 and cassandra Thanks for the reply. So DSE is one of the options and I am looking into that too. Although, before diving into solr+cassandra integration (which comes out of the box with DSE). I am just trying to setup a solr instance on my local machine without the bloat the example solr instance has to offer. Any suggestions about that? Thanks, -Utkarsh On Mon, Apr 1, 2013 at 4:00 PM, Jack Krupansky j...@basetechnology.comwrote: You might want to check out DataStax Enterprise, which actually integrates Cassandra and Solr. You keep the data in Cassandra, but as data is added and updated and deleted, the Solr index is automatically updated in parallel. You can add and update data and query using either the Cassandra API or the Solr API. See: http://www.datastax.com/what-**we-offer/products-services/** datastax-enterprisehttp://www.datastax.com/what-we-offer/products-services/datastax-enterprise -- Jack Krupansky -Original Message- From: Utkarsh Sengar Sent: Monday, April 01, 2013 6:34 PM To: solr-user@lucene.apache.org Subject: Getting started with solr 4.2 and cassandra Hello, I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search API, where data sits in cassandra. Getting started with elasticsearch is pretty straight forward and I was able to write an ES riverhttp://www.**elasticsearch.org/guide/**reference/river/http://www.elasticsearch.org/guide/reference/river/ which pulls data from cassandra and indexes it in ES within a day. Now, I trying to implement something similar with solr and compare both of them. Getting started with solr/examplehttp://lucene.**apache.org/solr/4_2_0/**tutorial.htmlhttp://lucene.apache.org/solr/4_2_0/tutorial.html was pretty easy and an example solr instance works. But the example folder contains whole bunch of stuff which I am not sure if I need: http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and 527 files So my questions are: 1. How can I create a bare bone solr app up and running with minimum set of configuration? (I will build over it when needed by taking reference from /example) 2. What is a best practice to run solr in production? Am approach like this jetty+nginx recommended: http://sacharya.com/nginx-**proxy-to-jetty-for-java-apps/http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/? Once I am done setting up a simple solr instance: 3. What is the general practice to import data to solr? For now, I am writing a python script which will read data in bulk from cassandra and throw it to solr. -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: Suggestions for Customizing Solr Admin Page
: I want to customize Solr Admin Page. I think that I will need more : complicated things to manage my cloud. I will separate my Solr cluster into : just indexing ones and just response ones. I will index my documents by : categorical and I will index them at different collections. A key design choice about the 4.x Solr Admin UI is that it is enitrely powered by javascript accessing machine parsable HTTP APIs under the covers -- so anything the Admin UI can do, you can also do in a custom UI by talking to Solr via HTTP and parsing the xml/json response. If you have ideas for generic functionality that you think could benefit nay SolrClod user, i would suggest youu implement that functionality as a patch against hte existing UI, and submit it for inclusion in Solr... https://wiki.apache.org/solr/HowToContribute ...if the functionality ou have in mind is very specific to your usecases, you *might* find the admin extra include cabability suitable enough for adding links/buttons/info into hte existing admin pages using javascript to trigger (local) HTTP API calls, but if not then impelementing a seperate application (in whatever langauge you choose) to talk to Solr via HTTP would be the best bet. https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/admin-extra.html https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/admin-extra.menu-top.html https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/admin-extra.menu-bottom.html -Hoss
Re: Top 10 Terms in Index (by date)
I need total number of occurrences across all documents for each term. Imagine this... Post #1: I think, therefore I am like you Reply #1: You think too much Reply #2 I think that I think much as you Each of those documents are put into 'content'. Pretending I don't have stop words, the top term query (not considering dateCreated in this example) would result in something like... think: 4 I: 4 you: 3 much: 2 ... Thus, just a number of documents approach doesn't work, because if a word occurs more than one time in a document it needs to be counted that many times. That seemed to rule out faceting like you mentioned as well as the TermsComponent (which as I understand also only counts documents). Thanks, Andy Pickler On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: So you have one document per user comment? Why not use faceting plus filtering on the dateCreated field? That would count number of documents for each term (so, in your case, if a term is used twice in one comment it would only count once). Is that what you are looking for? Tomás On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com wrote: Our company has an application that is Facebook-like for usage by enterprise customers. We'd like to do a report of top 10 terms entered by users over (some time period). With that in mind I'm using the DataImportHandler to put all the relevant data from our database into a Solr 'content' field: field name=content type=text_general indexed=true stored=false multiValued=false required=true termVectors=true/ Along with the content is the 'dateCreated' for that content: field name=dateCreated type=tdate indexed=true stored=false multiValued=false required=true/ I'm struggling with the TermVectorComponent documentation to understand how I can put together a query that answers the 'report' mentioned above. For each document I need each term counted however many times it is entered (content of I think what I think would report 'think' as used twice). Does anyone have any insight as to whether I'm headed in the right direction and then what my query would be? Thanks, Andy Pickler
RE: [ANNOUNCE] Solr wiki editing change
I would also like to contribute to SolrCloud's wiki where possible. Please add myself (TimVaillancourt) when you have a chance. Cheers, Tim -Original Message- From: Trey Grainger [mailto:solrt...@gmail.com] Sent: Saturday, March 30, 2013 9:43 PM To: d...@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: [ANNOUNCE] Solr wiki editing change Please add TreyGrainger to the the contributors group. Thanks! -Trey On Sun, Mar 24, 2013 at 11:18 PM, Steve Rowe sar...@gmail.com wrote: The wiki at http://wiki.apache.org/solr/ has come under attack by spammers more frequently of late, so the PMC has decided to lock it down in an attempt to reduce the work involved in tracking and removing spam. From now on, only people who appear on http://wiki.apache.org/solr/ContributorsGroup will be able to create/modify/delete wiki pages. Please request either on the solr-user@lucene.apache.org or on d...@lucene.apache.org to have your wiki username added to the ContributorsGroup page - this is a one-time step. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [ANNOUNCE] Solr wiki editing change
On Apr 1, 2013, at 9:40 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: I would also like to contribute to SolrCloud's wiki where possible. Please add myself (TimVaillancourt) when you have a chance. Added to solr wiki ContributorsGroup.
Re: Getting started with solr 4.2 and cassandra
Hi, Solr doesn't have anything like ES River. DIH (DataImportHandler) feels like the closest thing in Solr, though it's not quite the same thing. DIH pulls in data like a typical River does, but most people have external indexers that push data into Solr using one of its client libraries to talk to Solr, such as SolrJ. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Apr 1, 2013 at 6:34 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Hello, I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search API, where data sits in cassandra. Getting started with elasticsearch is pretty straight forward and I was able to write an ES riverhttp://www.elasticsearch.org/guide/reference/river/ which pulls data from cassandra and indexes it in ES within a day. Now, I trying to implement something similar with solr and compare both of them. Getting started with solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas pretty easy and an example solr instance works. But the example folder contains whole bunch of stuff which I am not sure if I need: http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and 527 files So my questions are: 1. How can I create a bare bone solr app up and running with minimum set of configuration? (I will build over it when needed by taking reference from /example) 2. What is a best practice to run solr in production? Am approach like this jetty+nginx recommended: http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ? Once I am done setting up a simple solr instance: 3. What is the general practice to import data to solr? For now, I am writing a python script which will read data in bulk from cassandra and throw it to solr. -- Thanks, -Utkarsh
Re: solr4.1 No live SolrServers available to handle this request
thx for your reply. my solr.xml is like this: solr persistent=true cores adminPath=/admin/cores defaultCoreName=doc host=${host:cms1.test.com} hostPort=${jetty.port:9090} hostContext=${hostContext:} zkClientTimeout=${zkClientTimeout:3} leaderVoteWait=${leaderVoteWait:2} core name=doc instanceDir=doc/ loadOnStartup=true transient=false collection=docCollection / /cores /solr i have change the zkclienttimeout from 15s to 30s, but this exception still shows. and the load on solrcloud servers are not too heavy, they are 1.4 1.5 1. and these disconnects appear in solrj logs, while the solrcloud is fine. -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-1-No-live-SolrServers-available-to-handle-this-request-tp4052862p4053075.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need Help in Patching OPENNLP
Hi Erick, thank you so much or the help and support. As you have mentioned I have made the svn set up and while trying to connect using check out option am getting this error; C:\binsvn co https://svn.apache.org/repos/asf/lucene/dev/ svn: E175002: Unable to connect to a repository at URL 'https://svn.apache.org/r epos/asf/lucene/dev' svn: E175002: OPTIONS of 'https://svn.apache.org/repos/asf/lucene/dev': could no t connect to server (https://svn.apache.org) Is this anything to do with the firewall set up? please advice me on the further steps. Thanks, KRN -- View this message in context: http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362p4053089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need Help in Patching OPENNLP
On 2 April 2013 11:00, karthicrnair karthicrn...@gmail.com wrote: Hi Erick, thank you so much or the help and support. As you have mentioned I have made the svn set up and while trying to connect using check out option am getting this error; C:\binsvn co https://svn.apache.org/repos/asf/lucene/dev/ Please read http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code carefully. You need to add a branch name to the SVN URL. You probably want something like svn co http://svn.apache.org/repos/asf/lucene/dev/trunk Regards, Gora
Re: Need Help in Patching OPENNLP
On 2 April 2013 11:08, Gora Mohanty g...@mimirtech.com wrote: On 2 April 2013 11:00, karthicrnair karthicrn...@gmail.com wrote: Hi Erick, thank you so much or the help and support. As you have mentioned I have made the svn set up and while trying to connect using check out option am getting this error; C:\binsvn co https://svn.apache.org/repos/asf/lucene/dev/ Please read http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code carefully. You need to add a branch name to the SVN URL. You probably want something like svn co http://svn.apache.org/repos/asf/lucene/dev/trunk Though svn co https://svn.apache.org/repos/asf/lucene/dev/; also works just fine.Are you sure that there is no network issue at your end? Are you able to ping svn.apache.org? Regards, Gora
Re: Need Help in Patching OPENNLP
Thanks Gora!! when I tried with ping command all my request got timed out. am able to access the svn through my explorer though. What could be the issue now? :( Thanks, krn -- View this message in context: http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362p4053092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need Help in Patching OPENNLP
On 2 April 2013 11:17, karthicrnair karthicrn...@gmail.com wrote: Thanks Gora!! when I tried with ping command all my request got timed out. am able to access the svn through my explorer though. What could be the issue now? :( Hard to tell. My guess would be that your network is blocking some things like ICMP. Not sure what Explorer you are referring to, but if you can access svn.apache.org, svn co should work. Regards, Gora