date:20130401

Architect /Lead Custom Cloud Middleware Platforms (APACHE SOLR)

2013-04-01 Thread jessica k

We are very excited and proud to be selected as the only recruiting company
assigned to a retained search project for a brand new High Technology
Service line( Cloud Technologies)  for a Top Tier IT Service and Solutions
company.

This new service line is very exciting and has many new products lines
already partnered with it.
These positions offer high visibility with quick advancement opportunities.

 *1.   **Job Role:*
 Sr. Architect (C2) / Architect (C1) / Tech Lead (B3) – Custom Cloud 
Middleware Platforms (APACHE SOLR)

 *2.   **Location and Number of Positions*
*a.*Mountain View, California
*b.*New York City, NY
*c.*Houston, TEXAS
 * *
 *3.   **Job Description *
 We are looking for Architects who would be responsible for the following
activities

   - Technology evaluation, architecture  design, implementation, testing
   and technical reviews of highly scalable platforms, solutions and
   technology building blocks to address customer’s requirement.
   - Lead the overall technical solution implementation as part of
   customer’s project delivery team
   - Mentor and groom project team members in the core technology areas,
   usage of SDLC tools and Agile
   - Engage in external and internal forums to demonstrate thought
   leadership


*3. Skills Required*

   - 7+ years of overall work experience with Large Enterprises /
   Technology Vendors / ISVs / Service Providers.
   - Strong hands-on work experience in systems integration and custom
   engineering of Infrastructure / Middleware platforms leveraging Linux 
   Open Source technologies using Agile methodologies is required.
   - Experience in supporting  troubleshooting deployments in production
   environments is highly preferred.


Strong work experience in more than one of the below technology areas is
required.
  Programming Languages
 Python, Ruby, Core Java, J2EE, JDBC, Spring, Struts, Scripting Languages
  Distributed Systems
 Cloud Computing, Grid Computing, Cluster Computing, Distributed File
Systems, High Speed Messaging, Distributed Caching
  Server Virtualization  Cloud Stacks
 KVM, Xen, ESX, OpenStack, CloudStack, RHEV, Eucalyptus, Amazon Web
Services, Azure Cloud Services
  Management  Automation tools
 Hyperic, Nagios, OpenNMS, Cobbler, Puppet, Chef
  Middleware Tools
 Splunk, Esper, Solr, Thrift, RabbitMQ, Zookeeper, memcached
  Storage Technologies
 SAN, NAS, JBOD, CIFS, Replication, Storage Management
  Networking
 DNS, DHCP, NAT, Firewall, Routing, Switching, Load Balancers, VLAN, VPN


If you are interested , please respond with a current resume to *
jess...@kudukisgroup.com*.
I will give you a call to speak in further detail.
We keep all information confidential. Feel free to reply with any questions.

Thanks,
~ Jessica

Found child node with improper name

2013-04-01 Thread yriveiro

I have this warning when I try to create a collection  and the collection
is not created.

Apr 01, 2013 10:05:26 AM org.apache.solr.handler.admin.CollectionsHandler
handleCreateAction
INFO: Creating Collection :
collection.configName=statisticsBucket-archivemaxShardsPerNode=3name=ST-ARCHIVE_07replicationFactor=2action=CREATE
Apr 01, 2013 10:05:26 AM
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process
INFO: Watcher fired on path: /overseer/collection-queue-work state:
SyncConnected type NodeChildrenChanged
Apr 01, 2013 10:05:26 AM org.apache.solr.cloud.DistributedQueue
orderedChildren
WARNING: Found child node with improper name: qnr-02
Apr 01, 2013 10:05:26 AM org.apache.solr.cloud.OverseerCollectionProcessor
run
INFO: Overseer Collection Processor: Get the message
id:/overseer/collection-queue-work/qn-02 message:{
  operation:createcollection,
  numShards:null,
  maxShardsPerNode:3,
  collection.configName:statisticsBucket-archive,
  createNodeSet:null,
  name:ST-ARCHIVE_07,
  replicationFactor:2}



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Found-child-node-with-improper-name-tp4052855.html
Sent from the Solr - User mailing list archive at Nabble.com.

highlight on same field with different fragsize

2013-04-01 Thread meghana

We use Solr4.2 in our application. We need to return highlight on same field
2 times, with different fragsize. Solr allows to highlight on different
fields with different fragsize as mentioned below  , but do not work with
same fields

http://localhost:8080/solr/select?q=my
searchhl=onhl.fl=content1,content2f.content1.hl.fragsize=400f.content2.hl.fragsize=100

Solr4.0 also apply alias for field, but it do not seems to work for
highlighting. 

Can anybody suggest me on this, how to achieve it? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/highlight-on-same-field-with-different-fragsize-tp4052863.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr4.1 No live SolrServers available to handle this request

2013-04-01 Thread sling

hi,all. I am new to Solr.
when i query solrcloud4.1 with solrj, the client throws exceptions as
follows.
there are 2 shards in my solrcloud.  
each shard is on a server with 4cpu/3G RAM, and jvm has 2G ram.
when the query requests get more and more, the exception occers.
 [java] org.apache.solr.client.solrj.SolrServerException: No live
SolrServers available to handle this request
 [java] at
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:486)
 [java] at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 [java] at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 [java] at
com.netease.index.service.impl.SearcherServiceImpl.search(Unknown Source)
 [java] at com.netease.index.util.ConSearcher.run(Unknown Source)
 [java] at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 [java] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 [java] at java.lang.Thread.run(Thread.java:662)
 [java] Caused by: org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at: http://cms.test.com/solr/doc
 [java] at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416)
 [java] at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 [java] at
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:439)
 [java] ... 7 more
 [java] Caused by: org.apache.http.conn.ConnectionPoolTimeoutException:
Timeout waiting for connection from pool
 [java] at
org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:416)
 [java] at
org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:299)
 [java] at
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:242)
 [java] at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:455)
 [java] at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 [java] at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 [java] at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 [java] at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
 [java] ... 9 more

ps: lbhttpsolrserver seems to allocate task imbalance...some node get a much
heavy load, while others may be not.   i use nginx so that task could be
more controllable.  is this right?


please help me out, Thank you in advance. ^_^








--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-1-No-live-SolrServers-available-to-handle-this-request-tp4052862.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Phonetic Search Highlight issue in search results

2013-04-01 Thread Erick Erickson

Good question, you're causing me to think... about code I know very
little about G.

So rather than spouting off, I tried it and.. it works fine for me, either with
or without using fast vector highlighter on, admittedly, a very simple test.

So I think I'd try peeling off all the extra stuff you've put into your configs
(sorry, I don't have time right now to try to reproduce) and get the very
simple case working, then build the rest back up and see where the
problem begins.

Sorry for the mis-direction!

Erick



On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar
soumyanayan@rebaca.com wrote:
 Hi Erick,

 Thanks for the reply. But help me understand this: If Solr is able to
 isolate the two documents which contain the term fact being the phonetic
 equivalent of the search term fakt, then why will it be unable to
 highlight the terms based on the same logic it uses to search the documents.

 Also, it is correctly highlighting the results in other searches which are
 also approximate searches and not exact ones for eg. Fuzzy or Synonym
 search. In these cases also the highlights in the search results are far
 from the actual search term but still they are getting correctly
 highlighted.

 Maybe I am getting it completely wrong but it looks like there is something
 wrong with my implementation.

 Thanks  Regards,

 Soumya.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 27 March 2013 06:07 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Phonetic Search Highlight issue in search results

 How would you expect it to highlight successfully? The term is fakt,
 there's nothing built in (and, indeed couldn't be) to un-phoneticize it into
 fact and apply that to the Content field. The whole point of phonetic
 processing is to do a lossy translation from the word into some variant,
 losing precision all the way.

 So this behavior is unsurprising...

 Best
 Erick




 On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com
 wrote:

 When we are issuing a query with Phonetic Search, it is returning the
 correct documents but not returning the highlights. When we use
 Stemming or Synonym searches we are getting the proper highlights.



 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it
 returns two documents containing the term fact(phonetic token
 equivalent), but the list of highlights is empty as shown in the
 response below.



 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime16/int

 lst name=params

   str name=qContentSearchPhonetic:fakt/str

   str name=wtxml/str

 /lst

   /lst

 result name=response numFound=2 start=0

 doc

   long name=DocId1/long

   str name=DocTitleDoc 1/str

   str name=ContentAnyway, this game was excellent and was
 well worth the time.  The graphics are truly amazing and the sound
 track was pretty pleasant also. The  preacher was in  fact a
 thief./str

   long name=_version_1430480998833848320/long

 /doc

 doc

   long name=DocId2/long

   str name=DocTitleDoc 2/str

   str name=Contentstunning. The  preacher was in  fact an
 excellent thief who  had stolen the original manuscript of Hamlet
 from an exhibit on the  Riviera, where  he also  acquired his
 remarkable and tan./str

   long name=_version_1430480998841188352/long

 /doc

   /result

   lst name=highlighting

 lst name=1/

 lst name=2/

   /lst

 /response



 Relevant section of Solr schema:



 field name=DocId type=long indexed=true stored=true
 required=true/

 field name=DocTitle type=string indexed=false stored=true
 required=true/

 field name=Content type=text_general indexed=false
 stored=true
 required=true/



 field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/

 field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/

 field name=ContentSearchPhonetic type=text_phonetic
 indexed=true
 stored=false multiValued=true/

 field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/



 uniqueKeyDocId/uniqueKey

 copyField source=Content dest=ContentSearch/

 copyField source=Content dest=ContentSearchStemming/

 copyField source=Content dest=ContentSearchPhonetic/

 copyField source=Content dest=ContentSearchSynonym/



 fieldType name=text_stem class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.SnowballPorterFilterFactory/

   /analyzer

 /fieldType



 fieldType name=text_phonetic class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.PhoneticFilterFactory

query regarding running solr4.1.0 on tomcat6

2013-04-01 Thread Rohan Thakur

hi all

I had installed tomcat6 on centos redhat linux os and had configured solr
with name on solrt on tomcat and It was running fine now what I did was
placed another copy of solr home folder in centos and changed the tomcat
directory to this new solr and now every thing is working fine like the
full database import and all from the browser and query from browser but
when I open the solr-example/admin(the default solr admin panel ) from
browser it shows the error that :
http://localhost:8080/solr-example/#/
HTTP Status 404 -
--

*type* Status report

*message*

*description* *The requested resource () is not available.*
--
Apache Tomcat/6.0.24


and other wise when I hit
http://localhost:8080/solr-example/collection1/select?q=samsung%20duoswt=jsonindent=truerows=20
its running fine and even if i hit
http://localhost:8080/solr-example/dataimport?command=full-importindent=trueclean=true

its running fine and even in the tomcat manager panal I can see
solr-example and when I click on it shows the same error. 404

what could be the problem with the solr admin panel help anyone.

thanks
regards
rohan

Re: Need Help in Patching OPENNLP

2013-04-01 Thread Erick Erickson

Here's the start-up page:
http://wiki.apache.org/solr/HowToContribute

First, just check out the code via svn and build it (see the page
above). That'll tell you if you have all the tools available.
Second, apply the patch to the source. From the root of your source,
'patch -p0 -i patch name'
Third, execute ant example dist and that should build you source
with the patch in place...

If you get stuck, let us know what problems you are having, specific
errors you're receiving, all that kind of stuff

Best
Erick

On Fri, Mar 29, 2013 at 7:42 AM, karthicrnair karthicrn...@gmail.com wrote:
 Hi All,

 am very new to solr and Java technology. I would wonder if some one can
 gimme a way out to patch the OpenNLP platform with Solr.

 Am simply blocked out at the initial step, applying patch to Solr 4.2. Any
 pointer would be highly appreciated.

 Thanks,
 Karthic



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Realtime updates solrcloud

2013-04-01 Thread Erick Erickson

Does the 30 second interval persist for a long time after you stop
your queries? It's possible that your requests are queueing up and you
have a bunch of search in the queue in front of the update

Best
Erick

On Fri, Mar 29, 2013 at 8:33 AM, roySolr royrutten1...@gmail.com wrote:
 Hello Guys,

 I want to use the realtime updates mechanism of solrcloud. My setup is as
 follow:

 3 solr engines,
 3 zookeeper instances(ensemble)

 The setup works great, recovery, leader election etc.

 The problem is the realtime updates, it's slow after the servers gets some
 traffic.

 I try to explain it:
 I test the realtime update with the following command:

 *curl http://SOLRURL:SOLRPORT/solr/update -H Content-Type: text/xml
 --data-binary 'adddocfield name=id3504811/fieldfield
 name=websitehttp://www.google.nl/add/doc'*

 I see this in logs of solr server:

 *Mar 29, 2013 12:38:51 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: [collection1] webapp=/solr path=/update params={} {add=[3504811
 (1430841858290876416)]} 0 35 *

 The other solr servers get the following lines in the log:

 *INFO: [collection1] webapp=/solr path=/update
 params={distrib.from=http://SOLRIP:SOLRPORT/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
 {add=[3504811 (1430844456234385408)]} 0 14*

 This looks good, the doc is added and the leader send this doc to the other
 solr servers.

 First times it takes 1 sec to make the update visible:)

 When i send some traffic to the server(200q/s), the update takes +- 30 sec
 to make it visible.
 I stopped the traffic it's still takes 30 sec's to make the update visible.
 How is it possible? The solrconfig parts:

 *autoCommit
  maxTime60/maxTime
  openSearcherfalse/openSearcher
 /autoCommit

 autoSoftCommit
 maxTime2000/maxTime
 /autoSoftCommit*

 Did i miss something?

 Best Regards,
 Roy



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Realtime-updates-solrcloud-tp4052370.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: had query regarding the indexing and analysers

2013-04-01 Thread Rohan Thakur

hi

does this means that while indexing also ace is been stored as ac in solr
index?

thanks
regards
Rohan

On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky j...@basetechnology.comwrote:

 Actually, it's the Porter Stemmer that is turning ace into ac.

 Try making a copy of text_en_splitting and delete the
 PorterStemFilterFactory filter from both the query and index analyzers.


 -- Jack Krupansky

 -Original Message- From: Rohan Thakur
 Sent: Wednesday, March 20, 2013 8:39 AM

 To: solr-user@lucene.apache.org
 Subject: Re: had query regarding the indexing and analysers

 hi jack

 I have been using text_en_splitting initially but what it was doing is it
 is changing by query aswell
 for example:
 if i am searching for ace term it is taking it as ac thus giving split
 ac higher score...
 see debug statment:

 debug:{
rawquerystring:ace,
querystring:ace,
parsedquery:(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord,
parsedquery_toString:+(**title:ac^30.0),
explain:{
  :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
 [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
 0.4375 = fieldNorm(doc=469)\n,
  :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
 [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
 0.4375 = fieldNorm(doc=470)\n,
  :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
 [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
 0.4375 = fieldNorm(doc=471)\n,
  :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
 [DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
 0.4375 = fieldNorm(doc=472)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=331)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=332)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=335)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=336)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=337)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=393)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=425)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=426)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
 = fieldNorm(doc=429)\n,
  :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 430)
 [DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 430,
 product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
 termFreq=1.0\n4.2628927 = idf(docFreq=39,

Re: had query regarding the indexing and analysers

2013-04-01 Thread Jack Krupansky

Yes, if there is only a single analyzer or an index analyzer is specified 
and the Porter stemmer is used in it.


-- Jack Krupansky

-Original Message- 
From: Rohan Thakur

Sent: Monday, April 01, 2013 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: had query regarding the indexing and analysers

hi

does this means that while indexing also ace is been stored as ac in solr
index?

thanks
regards
Rohan

On Fri, Mar 22, 2013 at 9:49 AM, Jack Krupansky 
j...@basetechnology.comwrote:



Actually, it's the Porter Stemmer that is turning ace into ac.

Try making a copy of text_en_splitting and delete the
PorterStemFilterFactory filter from both the query and index analyzers.


-- Jack Krupansky

-Original Message- From: Rohan Thakur
Sent: Wednesday, March 20, 2013 8:39 AM

To: solr-user@lucene.apache.org
Subject: Re: had query regarding the indexing and analysers

hi jack

I have been using text_en_splitting initially but what it was doing is it
is changing by query aswell
for example:
if i am searching for ace term it is taking it as ac thus giving split
ac higher score...
see debug statment:

debug:{
   rawquerystring:ace,
   querystring:ace,
   parsedquery:(+**DisjunctionMaxQuery((title:ac^**30.0)))/no_coord,
   parsedquery_toString:+(**title:ac^30.0),
   explain:{
 :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 469)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 469,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=469)\n,
 :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 470)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 470,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=470)\n,
 :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 471)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 471,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=471)\n,
 :\n1.8650155 = (MATCH) weight(title:ac^30.0 in 472)
[DefaultSimilarity], result of:\n  1.8650155 = fieldWeight in 472,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n
0.4375 = fieldNorm(doc=472)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 331)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 331,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=331)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 332)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 332,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=332)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 335)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 335,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=335)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 336)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 336,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=336)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 337)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 337,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=337)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 393)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 393,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=393)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 425)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 425,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=425)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 426)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 426,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=426)\n,
 :\n1.5985848 = (MATCH) weight(title:ac^30.0 in 429)
[DefaultSimilarity], result of:\n  1.5985848 = fieldWeight in 429,
product of:\n1.0 = tf(freq=1.0), with freq of:\n  1.0 =
termFreq=1.0\n4.2628927 = idf(docFreq=39, maxDocs=1045)\n0.375
= fieldNorm(doc=429)\n,
 :\n1.5985848 =

Re: secure deployment of solr.war on jboss

2013-04-01 Thread adityab

Hi Ali, 

We have Solr 4.2 on Jboss running on a separate VM behind firewall. Only IT
Administration and our FrontEnd Application Server is able to access the
Solr servers in production. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/secure-deployment-of-solr-war-on-jboss-tp4052754p4052899.html
Sent from the Solr - User mailing list archive at Nabble.com.

How does solr 4.2 do in returning large datasets ?

2013-04-01 Thread Liz Sommers

I thought I remembered reading that Solr is not good for returning large
datasets.  We are currently using lucene 3.6.0 and returning datasets of
10,000 to 60,000 results.  In the future we might need to return even
larger datasets.

Would you all recommend going to Solr for this, or should we stick with
Lucene (which has given us no problems in this regard)?  I am a bit wary of
using a web service to return datasets of this size.

Thanks a lot
Liz
lizswo...@gmail.com

Re: secure deployment of solr.war on jboss

2013-04-01 Thread Ali, Saqib

Thanks. Are you using IP tables firewall on the jboss to prevent access
from other systems? Or are you using some jboss configuration for that?

Thanks,
Saqib


On Mon, Apr 1, 2013 at 6:25 AM, adityab aditya_ba...@yahoo.com wrote:

 Hi Ali,

 We have Solr 4.2 on Jboss running on a separate VM behind firewall. Only IT
 Administration and our FrontEnd Application Server is able to access the
 Solr servers in production.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/secure-deployment-of-solr-war-on-jboss-tp4052754p4052899.html
 Sent from the Solr - User mailing list archive at Nabble.com.

RE: How does solr 4.2 do in returning large datasets ?

2013-04-01 Thread Swati Swoboda

It really depends on what you are returning (how big is each document? Just a 
document ID? Pages and pages of data in fields?). 

It can take a long time for Solr to render an XML with 60,000 results. Solr 
will be serializing the data and then you'd (presumably) be de-serializing it. 
Depending on how big each field actually is, this could take a while or even 
cause DOS on your server.

Your client would also need a fair bit of memory to parse a document with 
60,000 results


-Original Message-
From: Liz Sommers [mailto:lizswo...@gmail.com] 
Sent: Monday, April 01, 2013 9:39 AM
To: solr-user
Subject: How does solr 4.2 do in returning large datasets ?

I thought I remembered reading that Solr is not good for returning large 
datasets.  We are currently using lucene 3.6.0 and returning datasets of
10,000 to 60,000 results.  In the future we might need to return even larger 
datasets.

Would you all recommend going to Solr for this, or should we stick with Lucene 
(which has given us no problems in this regard)?  I am a bit wary of using a 
web service to return datasets of this size.

Thanks a lot
Liz
lizswo...@gmail.com

Re: solr4.1 No live SolrServers available to handle this request

2013-04-01 Thread Joel Bernstein

Check the Solr logs for Zookeeper disconnects. It could be that as load is
increasing Solr is not able to respond to the Zookeeper pings which would
bring the nodes offline. If you see Zookeeper disconnects then you can
increase the zkClientTimeout set in solr.xml. But be aware that zk
disconnects can also be a sign that your servers are overload and/or under
resourced. Memory starvation and stop the world GC can often be the cause
of zk disconnects.


On Mon, Apr 1, 2013 at 6:18 AM, sling sling...@gmail.com wrote:

 hi,all. I am new to Solr.
 when i query solrcloud4.1 with solrj, the client throws exceptions as
 follows.
 there are 2 shards in my solrcloud.
 each shard is on a server with 4cpu/3G RAM, and jvm has 2G ram.
 when the query requests get more and more, the exception occers.
  [java] org.apache.solr.client.solrj.SolrServerException: No live
 SolrServers available to handle this request
  [java] at

 org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:486)
  [java] at

 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
  [java] at
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
  [java] at
 com.netease.index.service.impl.SearcherServiceImpl.search(Unknown Source)
  [java] at com.netease.index.util.ConSearcher.run(Unknown Source)
  [java] at

 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  [java] at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  [java] at java.lang.Thread.run(Thread.java:662)
  [java] Caused by: org.apache.solr.client.solrj.SolrServerException:
 IOException occured when talking to server at:
 http://cms.test.com/solr/doc
  [java] at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416)
  [java] at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
  [java] at

 org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:439)
  [java] ... 7 more
  [java] Caused by: org.apache.http.conn.ConnectionPoolTimeoutException:
 Timeout waiting for connection from pool
  [java] at

 org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:416)
  [java] at

 org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:299)
  [java] at

 org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:242)
  [java] at

 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:455)
  [java] at

 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
  [java] at

 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
  [java] at

 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
  [java] at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
  [java] ... 9 more

 ps: lbhttpsolrserver seems to allocate task imbalance...some node get a
 much
 heavy load, while others may be not.   i use nginx so that task could be
 more controllable.  is this right?


 please help me out, Thank you in advance. ^_^








 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr4-1-No-live-SolrServers-available-to-handle-this-request-tp4052862.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks

Re: highlight on same field with different fragsize

2013-04-01 Thread Erick Erickson

Why do you want to do this? Can't you just take the longer frag size
and cut it down for display in the app?

Best
Erick

On Mon, Apr 1, 2013 at 6:33 AM, meghana meghana.rav...@amultek.com wrote:
 We use Solr4.2 in our application. We need to return highlight on same field
 2 times, with different fragsize. Solr allows to highlight on different
 fields with different fragsize as mentioned below  , but do not work with
 same fields

 http://localhost:8080/solr/select?q=my
 searchhl=onhl.fl=content1,content2f.content1.hl.fragsize=400f.content2.hl.fragsize=100

 Solr4.0 also apply alias for field, but it do not seems to work for
 highlighting.

 Can anybody suggest me on this, how to achieve it?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/highlight-on-same-field-with-different-fragsize-tp4052863.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: DocValues vs stored fields?

2013-04-01 Thread David Smiley (@MITRE.org)

Otis,

DocValues are quite insufficient for true field updates.  DocValues is a
per-document value storage (hence the name); it's not uninverted/indexed. 
If you needed to search based on these values (e.g. find all docs that have
this value or between these values) then that's not going to work.  The most
promising field update work going on right now is
https://issues.apache.org/jira/browse/LUCENE-4258 Incremental Field Updates
through Stacked Segments.  In my opinion, that's the most exciting thing
happening in Lucene right now; but it appears stalled a little.

I do think a DocValues based hack could make a better replacement for Solr's
ExternalizableFileField.  It's for use in FunctionQueries.

Another questioner asked essentially why a field that has DocValues won't
have its value shown when the field is marked stored=false since the value
is stored per-document after all.  True, the disparity here is a bit
confusing.  DocValues are not intended as a replacement for stored fields in
places where you are using stored fields now.  It's basically to improve the
performance and memory use of function queries, sorting, and faceting.  It's
the new FieldCache under a different name, but hasn't strictly replaced the
FC (yet).  It's not enabled by default because it creates new data on disk
and Solr doesn't know that you want to use it.

As of Solr 4.2, DocValues is also multi-valued -- awesome!

All this said, I do think there's room for a proposed Solr DocTransformer to
expose the DocValues value as if it were a stored field in your search
results.  Actually... I wish if you explicitly ask for the field, and it's
not stored, then it would just go use docValues automatically.  That'd be
cool!

~ David


Otis Gospodnetic-5 wrote
 Hi,
 
 The current field update mechanism is not really a field update
 mechanism.  It just looks like that from the outside.  DocValues
 should make true field updates implementable.
 
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/
 
 
 
 
 
 On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki lt;

 mrzewucki@

 gt; wrote:
 Hi,
 Atomic updates (single field updates) do not depend on DocValues. They
 were
 implemented in Solr4.0 and works fine (but all fields have to be
 retrievable). DocValues are supposed to be more efficient than
 FieldCache.
 Why not enabled by default ? Maybe because they are not for all fields
 and
 because of their limitations (a field has to be single-valued, required
 or
 to have default value).
 Regards.



 On 29 March 2013 17:20, Timothy Potter lt;

 thelabdude@

 gt; wrote:

 Hi Jack,

 I've just started to dig into this as well, so sharing what I know but
 still some holes in my knowledge too.

 DocValues == Column Stride Fields (best resource I know of so far is
 Simon's preso from Lucene Rev 2011 -

 http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues
 ).
 It's pretty dense but some nuggets I've gleaned from this are:

 1) DocValues are more efficient in terms of memory usage and I/O
 performance for building an alternative to FieldCache (slide 27 is very
 impressive)
 2) DocValues has a more efficient way to store primitive types, such as
 packed ints
 3) Faster random access to stored values

 In terms of switch-over, you have to re-index to change your fields to
 use
 DocValues on disk, which is why they are not enabled by default.

 Lastly, another goal of DocValues is to allow updates to a single field
 w/o
 re-indexing the entire doc. That's not implemented yet but I think still
 planned.

 Cheers,
  Tim



 On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky lt;

 jack@

 gt; wrote:

  I’m still a little fuzzy on DocValues (maybe because I’m still
 grappling
  with how it does or doesn’t still relate to “Column Stride Fields”),
 so
 can
  anybody clue me in as to how useful DocValues is/are?
 
  Are DocValues simply an alternative to “stored fields”?
 
  If so, and if DocValues are so great, why aren’t we just switching
 Solr
  over to DocValues under the hood for all fields?
 
  And if there are “issues” with DocValues that would make such a
 complete
  switchover less than absolutely desired, what are those issues?
 
  In short, when should a user use DocValues over stored fields, and
 vice
  versa?
 
  As things stand, all we’ve done is make Solr more confusing than it
 was
  before, without improving its OOBE. OOBE should be job one in Solr.
 
  Thanks.
 
  P.S., And if I actually want to do Column Stride Fields, is there a
 way
 to
  do that?
 
  -- Jack Krupansky






-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DocValues-vs-stored-fields-tp4052406p4052966.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?

2013-04-01 Thread feroz_kh

Hi Shawn,

I tried optimizing using this command...

curl '
http://10.7.233.54:8088/solr/update?optimize=truemaxSegments=10waitFlush=true'

And i got this response within secs...

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime840/int/lst
/response

Is this a valid response that one should get ?
I checked the statistics link from /solr/admin page and it shows the
number segments got updated.
Would this be a good indication that optimization is complete ?
At the same time - I even noticed the number of files in data/index
directory hasn't reduced all files are not updated.
Since it took just couple of secs for the response(even with
waitFlush=true) - i am doubting if optimization really happened , but
details on statistics page shows me correct number of segments.

On Tue, Mar 12, 2013 at 8:34 PM, Shawn Heisey-4 [via Lucene]
ml-node+s472066n4046834...@n3.nabble.com wrote:

On 3/12/2013 4:17 PM, feroz_kh wrote:
Do we really need to optimize in order to reformat ?

The alternative would be to start with an empty index and just reindex
your data. That is actually the best way to go, if that option is
available to you.

If yes, What is the best way of optimizing index - Online or Offline ?
Can we do it online ? If yes -
1. What is the http request which we can use to invoke optimization -
How
long it takes ?
2. What is the command line command to invoked optimization - How long
this
one takes ?

The only way I know of to optimize an index that's offline is using
Luke, but it is difficult to find versions of Luke that work with
indexes after 4.0-ALPHA - the official Luke page doesn't have any newer
versions, and I have no idea why. Online is better. Solr 4.2 just got
released, you may want to consider skipping 4.1 and going with 4.2.

There would be no major speed difference between doing it offline or
online. Whatever else the machine is doing might be a factor. I can
only make guesses about how long it will take. You say your index in
3.5 is 14GB. I have experience with indexes that are 22GB in 3.5, which
takes 11 minutes to optimize. The equivalent index in 4.2 is 14GB and
takes 14 minutes, because of the extra compression/decompression step.
This is on RAID10, volumes with no RAID or with other RAID levels would
be slower. Also, if the structure of your index is significantly
different than mine, yours might go faster or slower than the size alone
would suggest.

There is a curl command that optimizes the index in the wiki:

http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_and_commitWithin_parameters_as_part_of_the_URL

You would want to leave off the maxSegments option so it optimizes
down to one segment. Whether to include waitFlush is up to you, but if
you don't include it, you won't know exactly when it finishes unless you
are looking at the index directory.

Thanks,
Shawn

--
If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4046834.html
To unsubscribe from Upgrade Solr3.5 to Solr4.1 - Index Reformat ?, click
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4046391code=ZmVyb3oua2gyMDAwQGdtYWlsLmNvbXw0MDQ2MzkxfDIwNzA2NTYxOTI=
.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context:
http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4052969.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?

2013-04-01 Thread feroz_kh

Hi Shawn,

I tried optimizing using this command...

curl
'http://localhost:/solr/update?optimize=truemaxSegments=10waitFlush=true'

And i got this response within secs...

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime840/int/lst
/response

Is this a valid response that one should get ?
I checked the statistics link from  /solr/admin page and it shows the number
segments got updated.
Would this be a good indication that optimization is complete ?
At the same time - I even noticed the number of files in data/index
directory hasn't reduced  all files are not updated.
Since it took just couple of secs for the response(even with waitFlush=true)
- i am doubting if optimization really happened , but details on statistics
page shows me correct number of segments.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4052970.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Making tika process mail attachments eludes me

2013-04-01 Thread Chris Hostetter

: I believe that the handling of the multipart MIME lacks some error checking, 
and 
: it is probably related to the content outside the MIME boundaries (in my 
: example, the text This is a multi-part message in MIME format.):
: 
: I really hope that some SOLR developer can have a look, we cannot be the only 
: ones having this problem. And I've spent almost twenty hours debugging this.

I am largely unfamiliar with the MailEntityProcessor, but IIRC it has not 
recieved much love over the years due to the lack of automated tests -- I 
believe all of the existing tests are disabled by default because they 
require an external IMAP server.

If anyone is interested in helping to contribute some tests that could 
be automated by using some sort of mock IMAP server library, that would go 
a long way towards being able to verify correctness  make improvemenets 
(even for people like me who are not familiar with the code and haven't 
thought very hard about MIME encapsulation in over 15 years)


-Hoss

Re: Upgrade Solr3.5 to Solr4.1 - Index Reformat ?

2013-04-01 Thread feroz_kh

Also, Is it absolutely necessary to set the maxSegments=1 , if we need to
reformat the whole index ?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-Solr3-5-to-Solr4-1-Index-Reformat-tp4046391p4052991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Filtering Search Cloud

2013-04-01 Thread Furkan KAMACI

I want to separate my cloud into two logical parts. One of them is indexer
cloud of SolrCloud. Second one is Searcher cloud of SolrCloud.

My first question is that. Does separating my cloud system make sense about
performance improvement. Because I think that when indexing, searching make
time to response and if I separate them I get a performance improvement. On
the other hand maybe using all Solr machines as whole (I mean not
partitioning as I mentioned) SolrCloud can make a better load balancing, I
would want to learn it.

My second question is that. Let's assume that I have separated my machines
as I mentioned. Can I filter some indexes to be searchable or not from
Searcher SolrCloud?

Top 10 Terms in Index (by date)

2013-04-01 Thread Andy Pickler

Our company has an application that is Facebook-like for usage by
enterprise customers.  We'd like to do a report of top 10 terms entered by
users over (some time period).  With that in mind I'm using the
DataImportHandler to put all the relevant data from our database into a
Solr 'content' field:

field name=content type=text_general indexed=true stored=false
multiValued=false required=true termVectors=true/

Along with the content is the 'dateCreated' for that content:

field name=dateCreated type=tdate indexed=true stored=false
multiValued=false required=true/

I'm struggling with the TermVectorComponent documentation to understand how
I can put together a query that answers the 'report' mentioned above.  For
each document I need each term counted however many times it is entered
(content of I think what I think would report 'think' as used twice).
 Does anyone have any insight as to whether I'm headed in the right
direction and then what my query would be?

Thanks,
Andy Pickler

Re: How does solr 4.2 do in returning large datasets ?

2013-04-01 Thread Otis Gospodnetic

Don't forget you could also tell Solr to return a CSV in response,
which will be much lighter in terms of response size, though disk IO
would still be there.  Is returning large result sets recommended? No.
:)  But the same can be said for Lucene.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Mon, Apr 1, 2013 at 10:26 AM, Swati Swoboda
sswob...@igloosoftware.com wrote:
 It really depends on what you are returning (how big is each document? Just a 
 document ID? Pages and pages of data in fields?).

 It can take a long time for Solr to render an XML with 60,000 results. Solr 
 will be serializing the data and then you'd (presumably) be de-serializing 
 it. Depending on how big each field actually is, this could take a while or 
 even cause DOS on your server.

 Your client would also need a fair bit of memory to parse a document with 
 60,000 results


 -Original Message-
 From: Liz Sommers [mailto:lizswo...@gmail.com]
 Sent: Monday, April 01, 2013 9:39 AM
 To: solr-user
 Subject: How does solr 4.2 do in returning large datasets ?

 I thought I remembered reading that Solr is not good for returning large 
 datasets.  We are currently using lucene 3.6.0 and returning datasets of
 10,000 to 60,000 results.  In the future we might need to return even larger 
 datasets.

 Would you all recommend going to Solr for this, or should we stick with 
 Lucene (which has given us no problems in this regard)?  I am a bit wary of 
 using a web service to return datasets of this size.

 Thanks a lot
 Liz
 lizswo...@gmail.com

Use of SolrJettyTestBase

2013-04-01 Thread Upayavira

I'm attempting to use SolrJettyTestBase to test a simple app that pushes
content to Solr.

I've subclassed SolrJettyTestBase, and added a test method (annotated
with @test). However, my test method is never called. I see the
following:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/upayavira/.m2/repository/org/slf4j/slf4j-log4j12/1.7.2/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/upayavira/.m2/repository/org/slf4j/slf4j-jdk14/1.6.4/slf4j-jdk14-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/upayavira/apache/lucene/solr/example/solr-webapp/webapp/WEB-INF/lib/slf4j-jdk14-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/upayavira/apache/lucene/solr/solrj/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
Test class requires enabled assertions, enable globally (-ea) or for
Solr/Lucene subpackages only: com.odoko.ArgPostActionTest
NOTE: test params are: codec=null, sim=null, locale=null,
timezone=(null)
NOTE: Mac OS X 10.8.2 x86_64/Oracle Corporation 1.7.0_17
(64-bit)/cpus=8,threads=1,free=85043008,total=96665600
NOTE: All tests run in this JVM: [ArgPostActionTest]
NOTE: reproduce with: ant test  -Dtestcase=ArgPostActionTest
-Dtests.seed=73C46F7785759CEF -Dtests.file.encoding=MacRoman

I'm sure I'm missing something obvious. Any ideas/suggestions as to what
I need to do to make my test actually run??

Upayavira

Re: Top 10 Terms in Index (by date)

2013-04-01 Thread Tomás Fernández Löbbe

So you have one document per user comment? Why not use faceting plus
filtering on the dateCreated field? That would count number of
documents for each term (so, in your case, if a term is used twice in one
comment it would only count once). Is that what you are looking for?

Tomás


On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com wrote:

 Our company has an application that is Facebook-like for usage by
 enterprise customers.  We'd like to do a report of top 10 terms entered by
 users over (some time period).  With that in mind I'm using the
 DataImportHandler to put all the relevant data from our database into a
 Solr 'content' field:

 field name=content type=text_general indexed=true stored=false
 multiValued=false required=true termVectors=true/

 Along with the content is the 'dateCreated' for that content:

 field name=dateCreated type=tdate indexed=true stored=false
 multiValued=false required=true/

 I'm struggling with the TermVectorComponent documentation to understand how
 I can put together a query that answers the 'report' mentioned above.  For
 each document I need each term counted however many times it is entered
 (content of I think what I think would report 'think' as used twice).
  Does anyone have any insight as to whether I'm headed in the right
 direction and then what my query would be?

 Thanks,
 Andy Pickler

Getting started with solr 4.2 and cassandra

2013-04-01 Thread Utkarsh Sengar

Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.elasticsearch.org/guide/reference/river/
which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas
pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set of
configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like this
jetty+nginx recommended:
http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

-- 
Thanks,
-Utkarsh

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Jack Krupansky

You might want to check out DataStax Enterprise, which actually integrates 
Cassandra and Solr. You keep the data in Cassandra, but as data is added and 
updated and deleted, the Solr index is automatically updated in parallel. 
You can add and update data and query using either the Cassandra API or the 
Solr API.


See:
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-Original Message- 
From: Utkarsh Sengar

Sent: Monday, April 01, 2013 6:34 PM
To: solr-user@lucene.apache.org
Subject: Getting started with solr 4.2 and cassandra

Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.elasticsearch.org/guide/reference/river/
which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas
pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set of
configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like this
jetty+nginx recommended:
http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

--
Thanks,
-Utkarsh

Re: Use of SolrJettyTestBase

2013-04-01 Thread Chris Hostetter

: I've subclassed SolrJettyTestBase, and added a test method (annotated
: with @test). However, my test method is never called. I see the

You got an immediate failure from the tests setup, because you don'th ave 
assertions enabled in your JVM (the Lucene  Solr test frameworks both 
require assertions enabled to run tests because so many important things 
can't be sanity checked w/o them)...

: Test class requires enabled assertions, enable globally (-ea) or for
: Solr/Lucene subpackages only: com.odoko.ArgPostActionTest

FYI: in addition to that txt being written to System.err, it would have 
immediately been thrown as an Exception as well.   (see 
TestRuleAssertionsRequired.java)



-Hoss

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Utkarsh Sengar

Thanks for the reply. So DSE is one of the options and I am looking into
that too.
Although, before diving into solr+cassandra integration (which comes out of
the box with DSE).

I am just trying to setup a solr instance on my local machine without the
bloat the example solr instance has to offer. Any suggestions about that?

Thanks,
-Utkarsh

On Mon, Apr 1, 2013 at 4:00 PM, Jack Krupansky j...@basetechnology.comwrote:

You might want to check out DataStax Enterprise, which actually integrates
Cassandra and Solr. You keep the data in Cassandra, but as data is added
and updated and deleted, the Solr index is automatically updated in
parallel. You can add and update data and query using either the Cassandra
API or the Solr API.

See:
http://www.datastax.com/what-**we-offer/products-services/**
datastax-enterprisehttp://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-Original Message- From: Utkarsh Sengar
Sent: Monday, April 01, 2013 6:34 PM
To: solr-user@lucene.apache.org
Subject: Getting started with solr 4.2 and cassandra

Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.**elasticsearch.org/guide/**reference/river/http://www.elasticsearch.org/guide/reference/river/

which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.**apache.org/solr/4_2_0/**tutorial.htmlhttp://lucene.apache.org/solr/4_2_0/tutorial.html
was

pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set of
configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like this
jetty+nginx recommended:
http://sacharya.com/nginx-**proxy-to-jetty-for-java-apps/http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

--
Thanks,
-Utkarsh

Solr Multiword Search

2013-04-01 Thread skmirch

We have a catalog of media content which is ingested into solr.   We are
trying to do a spell check on the title of the catalog item, to make sure
that the client is able to correctly predict and correct the (mis)typed
text. The requirement is that corrected text match a title in the catalog. 

I have been playing around with spellcheck component and the handler on SOLR
4.2 .  

solrconfig.xml
--
searchComponent name=spellcheck class=solr.SpellCheckComponent

   str name=queryAnalyzerFieldTypetext_spell/str

 lst name=spellchecker
   str name=namedefault/str
   str name=fieldmySpell/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=distanceMeasureinternal/str
   float name=accuracy0.5/float
   int name=maxEdits2/int
   int name=minPrefix1/int
   int name=maxInspections5/int
   int name=minQueryLength4/int
   float name=maxQueryFrequency0.01/float
   
 /lst
/searchComponent

  queryConverter name=queryConverter
class=com.foo.MultiWordSpellingQueryConverter/

  requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=dfmySpell/str
  
  
  str name=spellcheck.dictionarydefault/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count10/str
  str name=spellcheck.alternativeTermCount5/str
  str name=spellcheck.maxResultsForSuggest5/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.collateExtendedResultstrue/str
  str name=spellcheck.maxCollationTries10/str
  str name=spellcheck.maxCollations10/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

schema.xml

types
fieldType name=text_spell class=solr.TextField
sortMissingLast=true omitNorms=true omitTermFreqAndPositions=true
analyzer
tokenizer
class=solr.KeywordTokenizerFactory /
filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1 preserveOriginal=0 /
filter class=solr.LowerCaseFilterFactory
/
filter
class=solr.RemoveDuplicatesTokenFilterFactory /

/analyzer
/fieldType
   /types

fields
   field name=mySpell type=text_spell indexed=true stored=true
multiValued=true /
/fields
   copyField source=title dest=mySpell /

Notice that I am using a custom QueryConverter, with definitions as follows:

/* MultiWordSpellingQueryConverter.java */
package com.foo;

import org.apache.log4j.Logger;
import org.apache.lucene.analysis.Token;
import org.apache.solr.spelling.QueryConverter;

public class MultiWordSpellingQueryConverter extends QueryConverter {
private static Logger log =
Logger.getLogger(MultiWordSpellingQueryConverter.class);

static {
System.out.println(* Loading class
MultiWordSpellingQueryConverter);
log.fatal(* Loading class 
MultiWordSpellingQueryConverter);
}

/**
 * Converts the original query string to a collection of Lucene Tokens.
 * 
 * @param original the original query string
 * @return a Collection of Lucene Tokens
 */
public CollectionToken convert( String original ) {
if ( original == null ) {
return Collections.emptyList();
}
System.out.println(Original String : +original);
log.error(Original String : +original);
final Token token = new Token( original.toCharArray(), 0,
original.length(), 0, original.length() );
return Arrays.asList( token );
}

}

I have followed directions as per another thread :
http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tt3265257.html#a3281189
, because I feel this is what I really want.

I have tried both placing the jar in the ${solr.home}/lib directory and
un-jarring solr.war and adding the jar file created with the above Java
compiled code into the WEB-INF/lib directory and re jarring it and placing
it in the web-server deploy directory.   I cannot tell if this file is even
being invoked at spellcheck time.  I have queryConverter tag defined in the
solrconfig.xml file (refer to the solrconfig.xml definitions above).

Query:
http://localhost/solr/spell?q=((title:(charles%20and%20the%20chocolate%20factory)))spellcheck.q=charles%20and%20the%20chocolat%20factoryspellcheck=truespellcheck.collate=true

Of course I have spelt charles incorrectly.  There in fact exists in the
catalog, a title with the name Charlie and the chocolate factory and the
above query does not find it nor collate well enough to correct the
spelling.  I believe the error distance (or edits) is about 2.  Charles
should be spelt

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Jack Krupansky

The Solr example really is rather simple. Download, unzip, run, add data, 
query. It's really that simple. Make sure you are looking at the Solr 
tutorial:


http://lucene.apache.org/solr/4_2_0/tutorial.html

Download from here:
http://lucene.apache.org/solr/tutorial.html

-- Jack Krupansky

-Original Message- 
From: Utkarsh Sengar

Sent: Monday, April 01, 2013 7:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Getting started with solr 4.2 and cassandra

Thanks for the reply. So DSE is one of the options and I am looking into
that too.
Although, before diving into solr+cassandra integration (which comes out of
the box with DSE).

I am just trying to setup a solr instance on my local machine without the
bloat the example solr instance has to offer. Any suggestions about that?

Thanks,
-Utkarsh


On Mon, Apr 1, 2013 at 4:00 PM, Jack Krupansky 
j...@basetechnology.comwrote:



You might want to check out DataStax Enterprise, which actually integrates
Cassandra and Solr. You keep the data in Cassandra, but as data is added
and updated and deleted, the Solr index is automatically updated in
parallel. You can add and update data and query using either the Cassandra
API or the Solr API.

See:
http://www.datastax.com/what-**we-offer/products-services/**
datastax-enterprisehttp://www.datastax.com/what-we-offer/products-services/datastax-enterprise

-- Jack Krupansky

-Original Message- From: Utkarsh Sengar
Sent: Monday, April 01, 2013 6:34 PM
To: solr-user@lucene.apache.org
Subject: Getting started with solr 4.2 and cassandra


Hello,

I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
API, where data sits in cassandra.

Getting started with elasticsearch is pretty straight forward and I was
able to write an ES
riverhttp://www.**elasticsearch.org/guide/**reference/river/http://www.elasticsearch.org/guide/reference/river/


which pulls data from cassandra and indexes it in ES within a day.

Now, I trying to implement something similar with solr and compare both of
them.

Getting started with
solr/examplehttp://lucene.**apache.org/solr/4_2_0/**tutorial.htmlhttp://lucene.apache.org/solr/4_2_0/tutorial.html
was

pretty easy and an example solr instance works. But the example folder
contains whole bunch of stuff which I am not sure if I need:
http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
527 files

So my questions are:
1. How can I create a bare bone solr app up and running with minimum set 
of

configuration? (I will build over it when needed by taking reference from
/example)
2. What is a best practice to run solr in production? Am approach like 
this

jetty+nginx recommended:
http://sacharya.com/nginx-**proxy-to-jetty-for-java-apps/http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/?

Once I am done setting up a simple solr instance:
3. What is the general practice to import data to solr? For now, I am
writing a python script which will read data in bulk from cassandra and
throw it to solr.

--
Thanks,
-Utkarsh





--
Thanks,
-Utkarsh

Re: Suggestions for Customizing Solr Admin Page

2013-04-01 Thread Chris Hostetter


: I want to customize Solr Admin Page. I think that I will need more
: complicated things to manage my cloud. I will separate my Solr cluster into
: just indexing ones and just response ones. I will index my documents by
: categorical and I will index them at different collections.

A key design choice about the 4.x Solr Admin UI is that it is enitrely 
powered by javascript accessing machine parsable HTTP APIs under the 
covers -- so anything the Admin UI can do, you can also do in a custom UI 
by talking to Solr via HTTP and parsing the xml/json response.

If you have ideas for generic functionality that you think could benefit 
nay SolrClod user, i would suggest youu implement that functionality as a 
patch against hte existing UI, and submit it for inclusion in Solr...

https://wiki.apache.org/solr/HowToContribute

...if the functionality ou have in mind is very specific to your usecases, 
you *might* find the admin extra include cabability suitable enough for 
adding links/buttons/info into hte existing admin pages using javascript 
to trigger (local) HTTP API calls, but if not then impelementing a 
seperate application (in whatever langauge you choose) to talk to Solr via 
HTTP would be the best bet.

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/admin-extra.html
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/admin-extra.menu-top.html
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/admin-extra.menu-bottom.html



-Hoss

Re: Top 10 Terms in Index (by date)

2013-04-01 Thread Andy Pickler

I need total number of occurrences across all documents for each term.
Imagine this...

Post #1: I think, therefore I am like you
Reply #1: You think too much
Reply #2 I think that I think much as you

Each of those documents are put into 'content'.  Pretending I don't have
stop words, the top term query (not considering dateCreated in this
example) would result in something like...

think: 4
I: 4
you: 3
much: 2
...

Thus, just a number of documents approach doesn't work, because if a word
occurs more than one time in a document it needs to be counted that many
times.  That seemed to rule out faceting like you mentioned as well as the
TermsComponent (which as I understand also only counts documents).

Thanks,
Andy Pickler

On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe tomasflo...@gmail.com
 wrote:

 So you have one document per user comment? Why not use faceting plus
 filtering on the dateCreated field? That would count number of
 documents for each term (so, in your case, if a term is used twice in one
 comment it would only count once). Is that what you are looking for?

 Tomás


 On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com
 wrote:

  Our company has an application that is Facebook-like for usage by
  enterprise customers.  We'd like to do a report of top 10 terms entered
 by
  users over (some time period).  With that in mind I'm using the
  DataImportHandler to put all the relevant data from our database into a
  Solr 'content' field:
 
  field name=content type=text_general indexed=true stored=false
  multiValued=false required=true termVectors=true/
 
  Along with the content is the 'dateCreated' for that content:
 
  field name=dateCreated type=tdate indexed=true stored=false
  multiValued=false required=true/
 
  I'm struggling with the TermVectorComponent documentation to understand
 how
  I can put together a query that answers the 'report' mentioned above.
  For
  each document I need each term counted however many times it is entered
  (content of I think what I think would report 'think' as used twice).
   Does anyone have any insight as to whether I'm headed in the right
  direction and then what my query would be?
 
  Thanks,
  Andy Pickler

RE: [ANNOUNCE] Solr wiki editing change

2013-04-01 Thread Vaillancourt, Tim

I would also like to contribute to SolrCloud's wiki where possible. Please add 
myself (TimVaillancourt) when you have a chance.

Cheers,

Tim

-Original Message-
From: Trey Grainger [mailto:solrt...@gmail.com] 
Sent: Saturday, March 30, 2013 9:43 PM
To: d...@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: [ANNOUNCE] Solr wiki editing change

Please add TreyGrainger to the the contributors group.  Thanks!

-Trey


On Sun, Mar 24, 2013 at 11:18 PM, Steve Rowe sar...@gmail.com wrote:

 The wiki at http://wiki.apache.org/solr/ has come under attack by 
 spammers more frequently of late, so the PMC has decided to lock it 
 down in an attempt to reduce the work involved in tracking and removing spam.

 From now on, only people who appear on 
 http://wiki.apache.org/solr/ContributorsGroup will be able to 
 create/modify/delete wiki pages.

 Please request either on the solr-user@lucene.apache.org or on 
 d...@lucene.apache.org to have your wiki username added to the 
 ContributorsGroup page - this is a one-time step.

 Steve
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
 additional commands, e-mail: dev-h...@lucene.apache.org

Re: [ANNOUNCE] Solr wiki editing change

2013-04-01 Thread Steve Rowe

On Apr 1, 2013, at 9:40 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote:
 I would also like to contribute to SolrCloud's wiki where possible. Please 
 add myself (TimVaillancourt) when you have a chance.

Added to solr wiki ContributorsGroup.

Re: Getting started with solr 4.2 and cassandra

2013-04-01 Thread Otis Gospodnetic

Hi,

Solr doesn't have anything like ES River.  DIH (DataImportHandler)
feels like the closest thing in Solr, though it's not quite the same
thing.  DIH pulls in data like a typical River does, but most people
have external indexers that push data into Solr using one of its
client libraries to talk to Solr, such as SolrJ.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Mon, Apr 1, 2013 at 6:34 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
 Hello,

 I am evaluating solr 4.2 and ElasticSearch (I am new to both) for a search
 API, where data sits in cassandra.

 Getting started with elasticsearch is pretty straight forward and I was
 able to write an ES
 riverhttp://www.elasticsearch.org/guide/reference/river/
 which pulls data from cassandra and indexes it in ES within a day.

 Now, I trying to implement something similar with solr and compare both of
 them.

 Getting started with
 solr/examplehttp://lucene.apache.org/solr/4_2_0/tutorial.htmlwas
 pretty easy and an example solr instance works. But the example folder
 contains whole bunch of stuff which I am not sure if I need:
 http://pastebin.com/Gv660mRT . I am sure I don't need 53 directories and
 527 files

 So my questions are:
 1. How can I create a bare bone solr app up and running with minimum set of
 configuration? (I will build over it when needed by taking reference from
 /example)
 2. What is a best practice to run solr in production? Am approach like this
 jetty+nginx recommended:
 http://sacharya.com/nginx-proxy-to-jetty-for-java-apps/ ?

 Once I am done setting up a simple solr instance:
 3. What is the general practice to import data to solr? For now, I am
 writing a python script which will read data in bulk from cassandra and
 throw it to solr.

 --
 Thanks,
 -Utkarsh

Re: solr4.1 No live SolrServers available to handle this request

2013-04-01 Thread sling

thx for your reply.
my solr.xml is like this:
solr persistent=true
  cores adminPath=/admin/cores defaultCoreName=doc
host=${host:cms1.test.com} hostPort=${jetty.port:9090} 
hostContext=${hostContext:}
zkClientTimeout=${zkClientTimeout:3}
leaderVoteWait=${leaderVoteWait:2}
core name=doc instanceDir=doc/ loadOnStartup=true
transient=false collection=docCollection /
  /cores
/solr

i have change the zkclienttimeout from 15s to 30s,  but this exception still
shows.
and the load on solrcloud servers   are not too heavy, they are 1.4 1.5 1.

and these disconnects appear in solrj logs, while the solrcloud is fine.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-1-No-live-SolrServers-available-to-handle-this-request-tp4052862p4053075.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need Help in Patching OPENNLP

2013-04-01 Thread karthicrnair

Hi Erick, 

thank you so much or the help and support.

As you have mentioned I have made the svn set up and while trying to connect
using check out option am getting this error;

C:\binsvn co https://svn.apache.org/repos/asf/lucene/dev/
svn: E175002: Unable to connect to a repository at URL
'https://svn.apache.org/r
epos/asf/lucene/dev'
svn: E175002: OPTIONS of 'https://svn.apache.org/repos/asf/lucene/dev':
could no
t connect to server (https://svn.apache.org)

Is this anything to do with the firewall set up? please advice me on the
further steps.

Thanks,
KRN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362p4053089.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need Help in Patching OPENNLP

2013-04-01 Thread Gora Mohanty

On 2 April 2013 11:00, karthicrnair karthicrn...@gmail.com wrote:
 Hi Erick,

 thank you so much or the help and support.

 As you have mentioned I have made the svn set up and while trying to connect
 using check out option am getting this error;

 C:\binsvn co https://svn.apache.org/repos/asf/lucene/dev/

Please read http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code
carefully. You need to add a branch name to the SVN URL.
You probably want something like
  svn co http://svn.apache.org/repos/asf/lucene/dev/trunk

Regards,
Gora

Re: Need Help in Patching OPENNLP

2013-04-01 Thread Gora Mohanty

On 2 April 2013 11:08, Gora Mohanty g...@mimirtech.com wrote:
 On 2 April 2013 11:00, karthicrnair karthicrn...@gmail.com wrote:
 Hi Erick,

 thank you so much or the help and support.

 As you have mentioned I have made the svn set up and while trying to connect
 using check out option am getting this error;

 C:\binsvn co https://svn.apache.org/repos/asf/lucene/dev/

 Please read 
 http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code
 carefully. You need to add a branch name to the SVN URL.
 You probably want something like
   svn co http://svn.apache.org/repos/asf/lucene/dev/trunk

Though svn co https://svn.apache.org/repos/asf/lucene/dev/;
also works just fine.Are you sure that there is no network issue
at your end? Are you able to ping svn.apache.org?

Regards,
Gora

Re: Need Help in Patching OPENNLP

2013-04-01 Thread karthicrnair

Thanks Gora!!

when I tried with ping command all my request got timed out. am able to
access the svn through my explorer though.

What could be the issue now? :(

Thanks,
krn



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362p4053092.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need Help in Patching OPENNLP

2013-04-01 Thread Gora Mohanty

On 2 April 2013 11:17, karthicrnair karthicrn...@gmail.com wrote:
 Thanks Gora!!

 when I tried with ping command all my request got timed out. am able to
 access the svn through my explorer though.

 What could be the issue now? :(

Hard to tell. My guess would be that your network is blocking
some things like ICMP. Not sure what Explorer you are referring
to, but if you can access svn.apache.org, svn co should work.

Regards,
Gora

43 matches

Mail list logo