Timeout when calling Luke request handler after migrating from Solr 3.5 to 3.6.1

2012-11-19 Thread Jose Aguilar
Hi all,

As part of our business logic we query the Luke request handler to extract the 
fields in the index from our code using the following url:

http://server:8080/solr/admin/luke?wt=json&numTerms=0

This worked fine with Solr 3.5, but now with 3.6.1 this call never returns, it 
hangs, and there is no error message in the server logs. Has any one seen this, 
or has an idea of what may be causing this?

The Luke request handler is configured by default, we didn't change the 
configuration for this. If I go to solr/admin/stats.jsp, it is shown:

name: /admin/luke
class: org.apache.solr.handler.admin.LukeRequestHandler
version: $Revision: 1242152 $
description: Lucene Index Browser. Inspired and modeled after Luke: 
http://www.getopt.org/luke/
stats: handlerStart : 1353373022984
requests : 0
errors : 0
timeouts : 0
totalTime : 0
avgTimePerRequest : NaN
avgRequestsPerSecond : 0.0

We are running Apache Tomcat 6.0.35 with JDK 1.7.0_03, in case that rings a 
bell. The index has about

Alternatively, our requirement is to get the list of fields in the index, 
including dynamic fields – is there any other way to obtain this at runtime? It 
is an application that runs on a separate process from Solr, and may even run 
on a separate box, thus the Luke call.

Thank you for any help you can provide.

Jose.


configuring solr xml as a datasource

2012-11-19 Thread Leena Jawale
Hi,

I am new to solr. I am trying to use solr xml data source for solr search 
engine.
I have created test.xml file as
-


leena1
101



I have created data-config.xml file








   



And added below code in solrconfig.xml :


  C:\solr\conf\data-config.xml
  
  

But when I go to this link  
http://localhost:8080/solr/dataimport?command=full-import
Its showing Total Rows Fetched=0 , Total Documents Processed=0.
How can I solve this problem? Please provide me the solution.


Thanks & Regards,
Leena Jawale
Software Engineer Trainee
BFS BU
Phone No. - 9762658130
Email - leena.jaw...@lntinfotech.com



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"


Re: Custom ranking solutions?

2012-11-19 Thread Floyd Wu
Hi Otis,

I'm doing some test like this,

http://localhost:8983/solr/select/?fl=score,_l_unique_key&defType=func&q=product(abs(rankingField),abs(score))

and I get following response,


can not use FieldCache on unindexed field: score
400


if change score to rankingField like this

http://localhost:8983/solr/select/?fl=score,_l_unique_key&defType=func&q=product(abs(rankingField),abs(rankingField))



211
2500.0


223
4.0


222
0.01001



Seems like score could not put into function query?

Floyd




2012/11/20 Otis Gospodnetic 

> Hi Floyd,
>
> Use &debugQuery=true and let's see it.:)
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu  wrote:
>
> > Hi there,
> >
> > Before ExternalFielField introduced, change document boost value to
> achieve
> > custom ranking. My client app will update each boost value for documents
> > daily and seem to worked fine.
> > Actual ranking could be predicted based on boost value. (value is
> > calculated based on click, recency, and rating ).
> >
> > I'm now try to use ExternalFileField to do some ranking, after some
> test, I
> > did not get my expectation.
> >
> > I'm doing a sort like this
> >
> > sort=product(score,abs(rankingField))+desc
> > But the query result ranking won't change anyway.
> >
> > The external file as following
> > doc1=3
> > doc2=5
> > doc3=9
> >
> > The original score get from Solr result as fllowing
> > doc1=41.042
> > doc2=10.1256
> > doc3=8.2135
> >
> > Expected ranking
> > doc1
> > doc3
> > doc2
> >
> > What wrong in my test, please kindly help on this.
> >
> > Floyd
> >
>


Re: solr autocomplete requirement

2012-11-19 Thread Sujatha Arun
Anyone with suggestions on this?


On Mon, Nov 19, 2012 at 10:13 PM, Sujatha Arun  wrote:

> Hi,
>
> Our requirement for auto complete is slightly complicated , We need two
> types of auto complete
>
> 1. Meta data Auto complete
> 2. Full text Content Auto complete
>
> In addition the metadata fields are multi-valued & we need to filter the
> results for certain auto-complete both types
>
> After trying different approaches like
>
> 1)Suggester  -We cannot filter results
> 2)Terms Comp - We cannot filter
> 3)Facets on Full text Content with Tokenized fields - Expensive
> 4)Same core with n-gram Indexing and storing the results and using the
> highlight component to fetch the snippet for autosuggest.
>
> The last approach  which we are leaning towards has 2 draw backs -
>
> One- it returns duplicates data as ,some meta data is the same across
> documents
> Two- words are getting truncated at character when results are returned
> with highlight
>
>
> Mitigation for the above 2 issue could be :  Remove duplicates after
>  obtaining results at Application (issue could be additional time for this)
>Use fast
> vector highlight that can help with full word snippets (could be heavy on
> the Index Size)
>
> Anybody body has any suggestion / had similar requirements with successful
> implementation?
>
> Other question ,what would be impact of serving the suggestions out of the
> same core as the one we are searching while using highlight component for
> fetching snippets.
>
> For our full text search requirements ,we are doing the highlight outside
> solr, in our application and we would be storing and using the highlight ,
> only for suggestion.
>
> Thanks
> Sujatha
>
>
>
>
>
>
>


Weird Behaviour on Solr 5x (SolrCloud)

2012-11-19 Thread deniz
Hi all, 

after Mark Miller made it clear to me that 5x is supporting cloud with
ramdir, I have started playing with it and it seemed working smoothly,
except a weird behaviour.. here is the story of it:

Basically, I have pulled the code and built solr 5x, and the replace the war
file in webapps dir of my current installation... then i have started my
zookeeper servers..

after that i have started solr instances with the params below:

java -Djetty.port=7574 -DzkHost=zkserver2:2182 -jar start.jar (running on a
remote machine)
java -Dbootstrap_conf=true -DzkHost=zkserver1:2181 -jar start.jar (running
on local)

after both of them are up, i have indexed some docs, and both of the solr
instances were updated succesfully. after this point, i have killed one of
the solr (running on remote, not leader) and then restarted it again. there
was no errors in the log and everything seemed normal in the logs...

however, when i have checked the web interface for the one i have restarted
it showed 0 docs.. after that I ran q=*:* few times... 
and thats the point which surprises me... randomly it returned 0 results and
then it returned correct numbers.. each time i make the same query, i get an
empty result set randomly... I have no idea why this is happening


here is the logs 

for the one running on remote (which was restarted)

Nov 20, 2012 11:32:11 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=false&wt=javabin&rows=10&version=2&df=text&fl=id,score&shard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/&NOW=1353382331589&start=0&q=*:*&isShard=true&fsv=true}
hits=0 status=0 QTime=0 
Nov 20, 2012 11:32:11 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xml&q=*:*} hits=0
status=0 QTime=7 
Nov 20, 2012 11:32:22 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=false&wt=javabin&rows=10&version=2&df=text&fl=id,score&shard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/&NOW=1353382342238&start=0&q=*:*&isShard=true&fsv=true}
hits=0 status=0 QTime=0 
Nov 20, 2012 11:32:22 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xml&q=*:*} hits=0
status=0 QTime=7 
Nov 20, 2012 11:32:27 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=false&wt=javabin&rows=10&version=2&df=text&fl=id,score&shard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/&NOW=1353382347438&start=0&q=*:*&isShard=true&fsv=true}
hits=0 status=0 QTime=0 
Nov 20, 2012 11:32:27 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xml&q=*:*} hits=0
status=0 QTime=14 
Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=false&wt=javabin&rows=10&version=2&df=text&fl=id,score&shard.url=10.60.0.54:8983/solr/collection1/|remote:7574/solr/collection1/&NOW=1353382348255&start=0&q=*:*&isShard=true&fsv=true}
hits=0 status=0 QTime=1 
Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xml&q=*:*} hits=0
status=0 QTime=7 
Nov 20, 2012 11:32:28 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select params={wt=xml&q=*:*} hits=32
status=0 QTime=14 


and for the same query, here is the log, from my local (leader, not
restarted)

Nov 20, 2012 11:31:46 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=false&wt=javabin&rows=10&version=2&df=text&fl=id,score&shard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/&NOW=1353382306472&start=0&q=*:*&isShard=true&fsv=true}
hits=32 status=0 QTime=0 
Nov 20, 2012 11:31:46 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={df=text&shard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/&NOW=1353382306472&q=*:*&ids=SP2514N,GB18030TEST,apple,F8V7067-APL-KIT,adata,6H500F0,MA147LL/A,ati,IW-02,asus&distrib=false&isShard=true&wt=javabin&rows=10&version=2}
status=0 QTime=1 
Nov 20, 2012 11:32:00 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={distrib=false&wt=javabin&rows=10&version=2&df=text&fl=id,score&shard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/&NOW=1353382320738&start=0&q=*:*&isShard=true&fsv=true}
hits=32 status=0 QTime=0 
Nov 20, 2012 11:32:00 AM org.apache.solr.core.SolrCore execute
INFO: [collection1] webapp=/solr path=/select
params={df=text&shard.url=localhost:8983/solr/collection1/|remoteserver:7574/solr/collection1/&NOW=1353382320738&q=*:*&ids=SP2514N,GB18030TEST,apple,F8V7067-APL-KIT,adata,6H500F0,MA147LL/A,ati,IW-02,asus&distrib=false&isShard=true&wt=javabin&rows=10&version=2}
status=0 QTime=1 
No

Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

2012-11-19 Thread Floyd Wu
Hi Otis,

There is no error in console nor in log file. I'm using Solr-4.0.
The External file name is external_rankingField.txt and exist is directory
 "C:\solr-4.0.0\example\solr\collection1\data\external_rankingField.txt"

External file should work as well because when I issue query
"sort=sqrt(rankingField)+desc" or "sort=sqrt(rankingField)+asc" or
"sort=sqrt(rankingField)+desc"

Things will change accordingly.

By the way, I first try external field according document here
http://lucidworks.lucidimagination.com/display/solr/Working+with+External+Files+and+Processes

"Format of the External File

The file itself is located in Solr's index directory, which by default is
$SOLR_HOME/data/index. The name of the file should beexternal_*fieldname*
 or external_*fieldname*.*. For the example above, then, the file could be
named external_entryRankFile orexternal_entryRankFile.txt.
"

But actually the external file should put in
$SOLR_HOME/data/

Floyd




2012/11/20 Otis Gospodnetic 

> Hi,
>
> Do you see any errors?
> Which version of Solr?
> What does debugQuery=true say?
> Are you sure your file with ranks is being used? (remove it, put some junk
> in it, see if that gives an error)
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Mon, Nov 19, 2012 at 10:16 PM, Floyd Wu  wrote:
>
> > Thanks Otis,
> >
> > But the sort=product(score, rankingField) is not working in my test. What
> > probably wrong?
> >
> > Floyd
> >
> >
> > 2012/11/20 Otis Gospodnetic 
> >
> > > Hi,
> > >
> > > 3. yes, you can sort by function -
> > > http://search-lucene.com/?q=solr+sort+by+function
> > > 2. this will sort by score only when there is a tie in ranking (two
> docs
> > > have the same rank value)
> > > 1. the reverse of 2.
> > >
> > > Otis
> > > --
> > > Performance Monitoring - http://sematext.com/spm/index.html
> > > Search Analytics - http://sematext.com/search-analytics/index.html
> > >
> > >
> > >
> > >
> > > On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu  wrote:
> > >
> > > > Hi  there,
> > > >
> > > > I have a field(which is externalFileField, called rankingField) and
> > that
> > > > value(type=float) is calculated by client app.
> > > >
> > > > For the solr original scoring model, affect boost value will result
> > > > different ranking. So I think product(score,rankingField) may
> > equivalent
> > > to
> > > > solr scoring model.
> > > >
> > > > What I curious is which will be better in practice and the different
> > > > meanings on these three solutions?
> > > >
> > > > 1. sort=score+desc,ranking+desc
> > > > 2. sort=ranking+desc,score+desc
> > > > 3. sort=product(score,ranking) -->is this possible?
> > > >
> > > > I'd like to hear your thoughts.
> > > >
> > > > Many thanks
> > > >
> > > > Floyd
> > > >
> > >
> >
>


Re: Custom ranking solutions?

2012-11-19 Thread Floyd Wu
HI Otis,
The debug information as following, seems there is no "product() process" .


_l_all:"測試"
_l_all:"測試"
PhraseQuery(_l_all:"測 試")
_l_all:"測 試"


41.11747 = (MATCH) weight(_l_all:"測 試" in 0) [DefaultSimilarity], result
of: 41.11747 = fieldWeight in 0, product of: 4.1231055 = tf(freq=17.0),
with freq of: 17.0 = phraseFreq=17.0 1.4246359 = idf(), sum of: 0.71231794
= idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 7.0 =
fieldNorm(doc=0)


14.246359 = (MATCH) weight(_l_all:"測 試" in 0) [DefaultSimilarity], result
of: 14.246359 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = phraseFreq=1.0 1.4246359 = idf(), sum of: 0.71231794 =
idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 10.0 =
fieldNorm(doc=0)


10.073696 = (MATCH) weight(_l_all:"測 試" in 0) [DefaultSimilarity], result
of: 10.073696 = fieldWeight in 0, product of: 1.4142135 = tf(freq=2.0),
with freq of: 2.0 = phraseFreq=2.0 1.4246359 = idf(), sum of: 0.71231794 =
idf(docFreq=3, maxDocs=3) 0.71231794 = idf(docFreq=3, maxDocs=3) 5.0 =
fieldNorm(doc=0)


LuceneQParser

6.0

0.0

0.0


0.0


0.0


0.0


0.0


0.0



6.0

3.0


0.0


0.0


0.0


0.0


3.0






2012/11/20 Otis Gospodnetic 

> Hi Floyd,
>
> Use &debugQuery=true and let's see it.:)
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu  wrote:
>
> > Hi there,
> >
> > Before ExternalFielField introduced, change document boost value to
> achieve
> > custom ranking. My client app will update each boost value for documents
> > daily and seem to worked fine.
> > Actual ranking could be predicted based on boost value. (value is
> > calculated based on click, recency, and rating ).
> >
> > I'm now try to use ExternalFileField to do some ranking, after some
> test, I
> > did not get my expectation.
> >
> > I'm doing a sort like this
> >
> > sort=product(score,abs(rankingField))+desc
> > But the query result ranking won't change anyway.
> >
> > The external file as following
> > doc1=3
> > doc2=5
> > doc3=9
> >
> > The original score get from Solr result as fllowing
> > doc1=41.042
> > doc2=10.1256
> > doc3=8.2135
> >
> > Expected ranking
> > doc1
> > doc3
> > doc2
> >
> > What wrong in my test, please kindly help on this.
> >
> > Floyd
> >
>


Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

2012-11-19 Thread Otis Gospodnetic
Hi,

Do you see any errors?
Which version of Solr?
What does debugQuery=true say?
Are you sure your file with ranks is being used? (remove it, put some junk
in it, see if that gives an error)

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 10:16 PM, Floyd Wu  wrote:

> Thanks Otis,
>
> But the sort=product(score, rankingField) is not working in my test. What
> probably wrong?
>
> Floyd
>
>
> 2012/11/20 Otis Gospodnetic 
>
> > Hi,
> >
> > 3. yes, you can sort by function -
> > http://search-lucene.com/?q=solr+sort+by+function
> > 2. this will sort by score only when there is a tie in ranking (two docs
> > have the same rank value)
> > 1. the reverse of 2.
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm/index.html
> > Search Analytics - http://sematext.com/search-analytics/index.html
> >
> >
> >
> >
> > On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu  wrote:
> >
> > > Hi  there,
> > >
> > > I have a field(which is externalFileField, called rankingField) and
> that
> > > value(type=float) is calculated by client app.
> > >
> > > For the solr original scoring model, affect boost value will result
> > > different ranking. So I think product(score,rankingField) may
> equivalent
> > to
> > > solr scoring model.
> > >
> > > What I curious is which will be better in practice and the different
> > > meanings on these three solutions?
> > >
> > > 1. sort=score+desc,ranking+desc
> > > 2. sort=ranking+desc,score+desc
> > > 3. sort=product(score,ranking) -->is this possible?
> > >
> > > I'd like to hear your thoughts.
> > >
> > > Many thanks
> > >
> > > Floyd
> > >
> >
>


Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

2012-11-19 Thread Floyd Wu
Thanks Otis,

But the sort=product(score, rankingField) is not working in my test. What
probably wrong?

Floyd


2012/11/20 Otis Gospodnetic 

> Hi,
>
> 3. yes, you can sort by function -
> http://search-lucene.com/?q=solr+sort+by+function
> 2. this will sort by score only when there is a tie in ranking (two docs
> have the same rank value)
> 1. the reverse of 2.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu  wrote:
>
> > Hi  there,
> >
> > I have a field(which is externalFileField, called rankingField) and that
> > value(type=float) is calculated by client app.
> >
> > For the solr original scoring model, affect boost value will result
> > different ranking. So I think product(score,rankingField) may equivalent
> to
> > solr scoring model.
> >
> > What I curious is which will be better in practice and the different
> > meanings on these three solutions?
> >
> > 1. sort=score+desc,ranking+desc
> > 2. sort=ranking+desc,score+desc
> > 3. sort=product(score,ranking) -->is this possible?
> >
> > I'd like to hear your thoughts.
> >
> > Many thanks
> >
> > Floyd
> >
>


Re: Best way to retrieve 20 specific documents

2012-11-19 Thread Otis Gospodnetic
I wanted to be explicit for the OP.

Vut wouldn't that depend on mm if you are using (e)dismax?

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 6:37 PM, Upayavira  wrote:

> In fact, you shouldn't need OR:
>
> id:(123 456 789)
>
> will default to OR.
>
> Upayavira
>
> On Mon, Nov 19, 2012, at 10:45 PM, Shawn Heisey wrote:
> > On 11/19/2012 1:49 PM, Dotan Cohen wrote:
> > > On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic
> > >  wrote:
> > >> Hi,
> > >>
> > >> How about id1 OR id2 OR id3? :)
> > > Thank, Otis. This was my first inclination (id:123 OR 456), but it
> > > didn't work when I tried. At your instigation I tried then id:123 OR
> > > id:456. This does work. Thanks.
> >
> > You can also use this query format:
> >
> > id:(123 OR 456 OR 789)
> >
> > This does get expanded internally by the query parser to the format that
> > has the field name on every clause, but it is sometimes easier to write
> > code that produces the above form.
> >
> > Thanks,
> > Shawn
> >
>


Re: is it possible to save the search query?

2012-11-19 Thread Otis Gospodnetic
Hi,

Document ID would be a field in your document.  A unique field that you
specify when indexing.
You can collect it by telling Solr to return it in the search results by
including it in the &fl= parameter.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 9:31 PM, Romita Saha
wrote:

> Hi,
>
> Thanks for your guidance. I am unable to figure out what is a doc ID and
> how can i collect all the doc IDs.
>
> Thanks and regards,
> Romita Saha
>
>
>
> From:   Otis Gospodnetic 
> To: solr-user@lucene.apache.org,
> Date:   11/09/2012 12:33 AM
> Subject:Re: is it possible to save the search query?
>
>
>
> Hi,
>
> Aha, I think I understand.  Yes, you could collect all doc IDs from each
> query and find the differences.  There is nothing in Solr that can find
> those differences or that would store doc IDs of returned hits in the
> first
> place, so you would have to implement this yourself.  Sematext's Search
> Analytics service my be of help here in the sense that all data you
> need (queries, doc IDs, etc.) are collected, so it would be a matter of
> providing an API to get the data for off-line analysis.  But this data
> collection+diffing is also something you could implement yourself.  One
> thing to think about - what do you do when a query returns a lrge
> number of hits.  Do you really want/need to get IDs for all of them, or
> only a page at a time.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha
> wrote:
>
> > Hi,
> >
> > The following is the example;
> > 1st query:
> >
> >
> >
>
> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
>
> > ^2
> > id&start=0&rows=11&fl=data,id
> >
> > Next query:
> >
> >
> >
>
> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
>
> > id^2&start=0&rows=11&fl=data,id
> >
> > In the 1st query the the field 'data' is boosted by 2. However may be
> the
> > user was not satisfied with the response. Thus in the next query he
> > boosted the field 'id' by 2.
> >
> > I want to record both the queries and compare between the two, meaning,
> > what are the changes implemented on the 2nd query which are not present
> in
> > the previous one.
> >
> > Thanks and regards,
> > Romita Saha
> >
> >
> >
> > From:   Otis Gospodnetic 
> > To: solr-user@lucene.apache.org,
> > Date:   11/08/2012 01:35 PM
> > Subject:Re: is it possible to save the search query?
> >
> >
> >
> > Hi,
> >
> > Compare in what sense?  An example will help.
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm
> > On Nov 7, 2012 8:45 PM, "Romita Saha" 
> > wrote:
> >
> > > Hi All,
> > >
> > > Is it possible to record a search query in solr and then compare it
> with
> > > the previous search query?
> > >
> > > Thanks and regards,
> > > Romita Saha
> > >
> >
> >
>
>


Re: Custom ranking solutions?

2012-11-19 Thread Otis Gospodnetic
Hi Floyd,

Use &debugQuery=true and let's see it.:)

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 9:29 PM, Floyd Wu  wrote:

> Hi there,
>
> Before ExternalFielField introduced, change document boost value to achieve
> custom ranking. My client app will update each boost value for documents
> daily and seem to worked fine.
> Actual ranking could be predicted based on boost value. (value is
> calculated based on click, recency, and rating ).
>
> I'm now try to use ExternalFileField to do some ranking, after some test, I
> did not get my expectation.
>
> I'm doing a sort like this
>
> sort=product(score,abs(rankingField))+desc
> But the query result ranking won't change anyway.
>
> The external file as following
> doc1=3
> doc2=5
> doc3=9
>
> The original score get from Solr result as fllowing
> doc1=41.042
> doc2=10.1256
> doc3=8.2135
>
> Expected ranking
> doc1
> doc3
> doc2
>
> What wrong in my test, please kindly help on this.
>
> Floyd
>


Re: Ranking by sorting score and rankingField better or by product(score, rankingField)?

2012-11-19 Thread Otis Gospodnetic
Hi,

3. yes, you can sort by function -
http://search-lucene.com/?q=solr+sort+by+function
2. this will sort by score only when there is a tie in ranking (two docs
have the same rank value)
1. the reverse of 2.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 9:40 PM, Floyd Wu  wrote:

> Hi  there,
>
> I have a field(which is externalFileField, called rankingField) and that
> value(type=float) is calculated by client app.
>
> For the solr original scoring model, affect boost value will result
> different ranking. So I think product(score,rankingField) may equivalent to
> solr scoring model.
>
> What I curious is which will be better in practice and the different
> meanings on these three solutions?
>
> 1. sort=score+desc,ranking+desc
> 2. sort=ranking+desc,score+desc
> 3. sort=product(score,ranking) -->is this possible?
>
> I'd like to hear your thoughts.
>
> Many thanks
>
> Floyd
>


Re: SolrCloud Error after leader restarts

2012-11-19 Thread deniz
Mark Miller-3 wrote
> On Nov 19, 2012, at 9:11 PM, deniz <

> denizdurmus87@

> > wrote:
> 
>> so in case i use ramdir with 5x cloud, it will still not do the recovery?
>> i
>> mean it will not get the data from the leader and fill its ramdir again?
> 
> Yes, in 5x RAM directory should be able to recover.
> 
> - Mark

thank you so much for your patience with me :) 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021209.html
Sent from the Solr - User mailing list archive at Nabble.com.


Ranking by sorting score and rankingField better or by product(score, rankingField)?

2012-11-19 Thread Floyd Wu
Hi  there,

I have a field(which is externalFileField, called rankingField) and that
value(type=float) is calculated by client app.

For the solr original scoring model, affect boost value will result
different ranking. So I think product(score,rankingField) may equivalent to
solr scoring model.

What I curious is which will be better in practice and the different
meanings on these three solutions?

1. sort=score+desc,ranking+desc
2. sort=ranking+desc,score+desc
3. sort=product(score,ranking) -->is this possible?

I'd like to hear your thoughts.

Many thanks

Floyd


Re: SolrCloud Error after leader restarts

2012-11-19 Thread Mark Miller

On Nov 19, 2012, at 9:11 PM, deniz  wrote:

> so in case i use ramdir with 5x cloud, it will still not do the recovery? i
> mean it will not get the data from the leader and fill its ramdir again?

Yes, in 5x RAM directory should be able to recover.

- Mark


Re: is it possible to save the search query?

2012-11-19 Thread Romita Saha
Hi,

Thanks for your guidance. I am unable to figure out what is a doc ID and 
how can i collect all the doc IDs.

Thanks and regards,
Romita Saha



From:   Otis Gospodnetic 
To: solr-user@lucene.apache.org, 
Date:   11/09/2012 12:33 AM
Subject:Re: is it possible to save the search query?



Hi,

Aha, I think I understand.  Yes, you could collect all doc IDs from each
query and find the differences.  There is nothing in Solr that can find
those differences or that would store doc IDs of returned hits in the 
first
place, so you would have to implement this yourself.  Sematext's Search
Analytics service my be of help here in the sense that all data you
need (queries, doc IDs, etc.) are collected, so it would be a matter of
providing an API to get the data for off-line analysis.  But this data
collection+diffing is also something you could implement yourself.  One
thing to think about - what do you do when a query returns a lrge
number of hits.  Do you really want/need to get IDs for all of them, or
only a page at a time.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha 
wrote:

> Hi,
>
> The following is the example;
> 1st query:
>
>
> 
http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data

> ^2
> id&start=0&rows=11&fl=data,id
>
> Next query:
>
>
> 
http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data

> id^2&start=0&rows=11&fl=data,id
>
> In the 1st query the the field 'data' is boosted by 2. However may be 
the
> user was not satisfied with the response. Thus in the next query he
> boosted the field 'id' by 2.
>
> I want to record both the queries and compare between the two, meaning,
> what are the changes implemented on the 2nd query which are not present 
in
> the previous one.
>
> Thanks and regards,
> Romita Saha
>
>
>
> From:   Otis Gospodnetic 
> To: solr-user@lucene.apache.org,
> Date:   11/08/2012 01:35 PM
> Subject:Re: is it possible to save the search query?
>
>
>
> Hi,
>
> Compare in what sense?  An example will help.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Nov 7, 2012 8:45 PM, "Romita Saha" 
> wrote:
>
> > Hi All,
> >
> > Is it possible to record a search query in solr and then compare it 
with
> > the previous search query?
> >
> > Thanks and regards,
> > Romita Saha
> >
>
>



Re: SolrCloud Error after leader restarts

2012-11-19 Thread deniz
i know facts about ramdirectory actually.. just running some perf tests on
our dev env right now..

so in case i use ramdir with 5x cloud, it will still not do the recovery? i
mean it will not get the data from the leader and fill its ramdir again?



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021203.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More Like this without a document?

2012-11-19 Thread Chris Hostetter

: If I want to use MoreLikeThis algorithm I need to add this documents in the
: index? The MoreLikeThis will work with soft commits? Is there a solution to
: do a MoreLikeThis without adding the document in the index?

you can feed the MoreLikeThisHandler a ContentStream (ie: POST data, or 
file upload, or "stream.body" request param) of text instead of sending it 
a query and it will use that raw text to find "more like this"

http://wiki.apache.org/solr/MoreLikeThisHandler

-Hoss


Re: All-wildcard query performance

2012-11-19 Thread Shawn Heisey
> Hi,
>
> Our application sometimes generates queries with one of the constraints:
>  field:[* TO *]
>
> I expected this query performance to be the same as if we omitted the
> "field" constraint completely. However, I see the performance of the two
> queries to differ drastically (3ms without all-wildcard constraint,
> 200ms with it).
>
> Could someone explain the source of the difference, please?
>
> I am fixing the application not to generate such queries, obviously, but
> still would like to understand the logic here. We use Solr 3.6.1. Thanks.

That query does not mean all docs. It means something slightly different -
all documents for which "field" is present. If this field happens to exist
in every document, then it amounts to the same thing, but Solr still must
check every document.

Thanks,
Shawn







solr4 MULTIPOLYGON search syntax

2012-11-19 Thread jend
Does anybody have any info on how to property construct a multipolygon
search?

Im very interested in

Polygon (search all documents within a shape)
Multipolygon (search all documents within 2+ shapes)
Multipolygon (search all documents with 2+ shapes but not within an area
within a shape - if you can image a donut where you dont search within the
hole in the center)

Im trying to search 2 shapes but get errors at the moment. Polygon searches
work just fine so I have everything installed correctly, but 2 shapes in the
one search as per below is not working. I cant find anything on the net to
try and debug Multipolygons.

My multipolygon query looks like this.
fq=geo:"Intersects(MULTIPOLYGON ((149.4023 -34.6072, 149.4023 -34.8690,
149.9022 -34.8690, 149.9022 -34.6072, 149.4023 -34.6072)), ((151.506958
-33.458943, 150.551147 -33.60547, 151.00708 -34.257216, 151.627808
-33.861293, 151.506958 -33.458943)))"

And I get this error.
ERROR 500

error reading WKT

But a polygon search works fine.
fq=geo:"Intersects(POLYGON((149.4023 -34.6072, 149.4023 -34.8690, 149.9022
-34.8690, 149.9022 -34.6072, 149.4023 -34.6072)))" 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr4-MULTIPOLYGON-search-syntax-tp4021199.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Error after leader restarts

2012-11-19 Thread Mark Miller
It's generally not a good choice to use ram directory.

4x solrcloud does not work with it no - 5x does, but in any case, ram dir is 
not persistent. So when you restart Solr you will lose the data.

MMap is generally the right dir to use.

- Mark

On Nov 19, 2012, at 6:52 PM, deniz  wrote:

> yea, i am using ram.
> 
> solrcloud is not working with ram directory? 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985p4021194.html
> Sent from the Solr - User mailing list archive at Nabble.com.



All-wildcard query performance

2012-11-19 Thread Aleksey Vorona

Hi,

Our application sometimes generates queries with one of the constraints:
field:[* TO *]

I expected this query performance to be the same as if we omitted the 
"field" constraint completely. However, I see the performance of the two 
queries to differ drastically (3ms without all-wildcard constraint, 
200ms with it).


Could someone explain the source of the difference, please?

I am fixing the application not to generate such queries, obviously, but 
still would like to understand the logic here. We use Solr 3.6.1. Thanks.


-- Aleksey


Re: Best way to retrieve 20 specific documents

2012-11-19 Thread Upayavira
In fact, you shouldn't need OR:

id:(123 456 789) 

will default to OR.

Upayavira

On Mon, Nov 19, 2012, at 10:45 PM, Shawn Heisey wrote:
> On 11/19/2012 1:49 PM, Dotan Cohen wrote:
> > On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic
> >  wrote:
> >> Hi,
> >>
> >> How about id1 OR id2 OR id3? :)
> > Thank, Otis. This was my first inclination (id:123 OR 456), but it
> > didn't work when I tried. At your instigation I tried then id:123 OR
> > id:456. This does work. Thanks.
> 
> You can also use this query format:
> 
> id:(123 OR 456 OR 789)
> 
> This does get expanded internally by the query parser to the format that 
> has the field name on every clause, but it is sometimes easier to write 
> code that produces the above form.
> 
> Thanks,
> Shawn
> 


Re: Preventing accepting queries while custom QueryComponent starts up?

2012-11-19 Thread Chris Hostetter

: I have several custom QueryComponents that have high one-time startup costs
: (hashing things in the index, caching things from a RDBMS, etc...)

you need to provide more details about how your custom components work -- 
in particular: where in teh lifecycle of your components is this 
high-startup cost happening?

: Is there a way to prevent solr from accepting connections before all
: QueryComponents are "ready"?

Define "ready" ? ... things that happen in the init() and inform(SolrCore) 
methods will completley prevent the SolrCore from being available for 
queries.

Likewise: if you are using "firstSearcher" warming queries, then the 
"useColdSearcher" option in solrconfig.xml can be used to control wether 
or not external requests will "block" until the searcher is available or 
not -- however this doesn't prevent the servlet container from "accepting" 
the HTTP connection.  but as mentioned, this is where things like the 
PingRequestHandler and the enable/disable commands can be used to take 
servers in and out of rotation with your load balancer -- assuming that 
your load balanver can be configured to monitor the ping URL.   
Alternatively you can just use native features of your load balancer to 
control this independent of solr (but the ping handler is a nice way of 
letting one set of dev/ops folks own the solr servers and control their 
availability even if they don't have the ability to control the load 
blaancer itself)


-Hoss


Re: Execute an independent query from the main query

2012-11-19 Thread Indika Tantrigoda
Hi Otis,

Yes, that seems like one solution, however I  have multiple opening and
closing hours, within the same day. Therefore it might become somewhat
complicated to manage the index. For now I shifted the business logic to
the client and a second query is made to get the additional data. Thanks
for the suggestion.

Indika

On 20 November 2012 02:50, Otis Gospodnetic wrote:

> Hi Indika,
>
> So my suggestion was to maybe consider changing the index structure and
> pull open/close times into 1 or more fields in the main record, so you
> don't have this problem all together.
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Sun, Nov 18, 2012 at 10:39 PM, Indika Tantrigoda  >wrote:
>
> > Hi Otis,
> >
> > Actually I maintain a separate document for each open/close time along
> with
> > the date (i.e. Sunday =1, Monday =2). I was thinking if it would be
> > possible to query Solr asking, give the next day's (can be current_day
> +1)
> > minimum opening time as a response field.
> >
> > Thanks,
> > Indika
> >
> > On 19 November 2012 04:50, Otis Gospodnetic  > >wrote:
> >
> > > Hi,
> > >
> > > Maybe your index needs to have a separate field for each day open/close
> > > time. No join or extra query needed then.
> > >
> > > Otis
> > > --
> > > Performance Monitoring - http://sematext.com/spm
> > > On Nov 18, 2012 5:35 PM, "Indika Tantrigoda" 
> wrote:
> > >
> > > > Thanks for the response.
> > > >
> > > > Erick,
> > > > My use case is related to restaurant opening hours, In the same
> request
> > > to
> > > > Solr I'd like to get the time when the restaurant opens the next
> > > > day, preferably part of the fields returned, and this needs to be
> > > > independent of the main queries search params.
> > > >
> > > > Yes, the Join wouldn't be suitable in this use case.
> > > >
> > > > Luis,
> > > > I had thought of having the logic in the client side, but before
> that I
> > > > wanted to see if I could get the result from Solr itself. I
> > > > am currently using SolrJ along with Spring.
> > > >
> > > > Thanks,
> > > > Indika
> > > >
> > > > On 18 November 2012 21:49, Luis Cappa Banda 
> > wrote:
> > > >
> > > > > Hello!
> > > > >
> > > > > When queries become more and more complex and you need to apply one
> > > > second
> > > > > query with the resultant docs from the first one, or re-sort
> results,
> > > or
> > > > > maybe add some promotional or special docs to the response, I
> > recommend
> > > > to
> > > > > develop a Web App module that implements that complex business
> logic
> > > and
> > > > > dispatches queries from your Client App to your Solr back-end. That
> > > > module,
> > > > > let's call Search Engine, lets you play with all those special use
> > > cases.
> > > > > If you are familiar with Java I suggest you to have a look at the
> > > > > combination between SolrJ and Spring framework or Jersey.
> > > > >
> > > > > Regards,
> > > > >
> > > > > - Luis Cappa.
> > > > > El 18/11/2012 15:15, "Indika Tantrigoda" 
> > > escribió:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to get results of an query that is different from
> the
> > > main
> > > > > > query as a new field. This query needs to be independent from any
> > > > filter
> > > > > > queries applied to the main query. I was trying to achieve this
> by
> > > > > > fl=_external_query_result:query($myQuery), however that result
> > seems
> > > to
> > > > > be
> > > > > > governed by any filter queries applied to the main query ? Is it
> > > > possible
> > > > > > to have a completely separate query in the fl list and return its
> > > > result
> > > > > > along with the results (per results), or would I need to create a
> > > > > separate
> > > > > > query on the client side to get the results of the independent
> > query
> > > > > (based
> > > > > > on the results from the first query) ?
> > > > > >
> > > > > > Thanks in advance,
> > > > > > Indika
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Best way to retrieve 20 specific documents

2012-11-19 Thread Shawn Heisey

On 11/19/2012 1:49 PM, Dotan Cohen wrote:

On Mon, Nov 19, 2012 at 10:27 PM, Otis Gospodnetic
 wrote:

Hi,

How about id1 OR id2 OR id3? :)

Thank, Otis. This was my first inclination (id:123 OR 456), but it
didn't work when I tried. At your instigation I tried then id:123 OR
id:456. This does work. Thanks.


You can also use this query format:

id:(123 OR 456 OR 789)

This does get expanded internally by the query parser to the format that 
has the field name on every clause, but it is sometimes easier to write 
code that produces the above form.


Thanks,
Shawn



Re: Order by hl.snippets count

2012-11-19 Thread Koji Sekiguchi

(12/11/20 1:50), Gabriel Croitoru wrote:

Hello,
I'm using  Solr 1.3 with http://wiki.apache.org/solr/HighlightingParameters 
options.
The client just asked us to change the order from the default score to the 
number of hl.snippets per
document.

It's this posibble from Solr configuration? (without implementing a custom 
scoring algorithm)?


I don't think it is possible.

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html


Re: solr cloud shards and servers issue

2012-11-19 Thread Tomás Fernández Löbbe
Maybe it would be better if Solr checked the live nodes and not all the
existing nodes in zk. If a server dies and you need to start a new one, it
would go straight to the correct shard without one needing to specify it
manually. Of course, the problem could be if a server goes down for a
minute and then comes back up, maybe a new node was added to the shard in
the interim, but I still think it would be better this way.

Tomás


On Mon, Nov 19, 2012 at 1:51 PM, Mark Miller  wrote:

>
> On Nov 19, 2012, at 11:24 AM, joe.cohe...@gmail.com wrote:
>
> > Hi
> > I have the following scenario:
> > I have 1 collection across 10 servers. Num of shards: 10.
> > Each server has 2 solr instances running. replication is 2.
> >
> > I want to move one of the instances to another server. meaning, kill the
> > solr process in server X and start a new solr process in server Y
> instead.
> > When I kill the solr process in server X, I can still see that instance
> in
> > the solr-cloud-graph (marked differently).
> > When I run the instance on server Y, it get attahced to another shard,
> > instead of getting into the shard that is now actually missing an
> instance.
> >
> > 1. Any way to tell solr/zookeeper  - "Forget about that instance"?
>
> Unload the SolrCores involved.
>
> > 2. when running a new solr instance - any way to tell solr/zookeper -
> "add
> > this instance to shard X"?
>
> Specify a shardId when creating the core or configuring it in solr.xml and
> make it match the shard you want to add to.
>
> - Mark
>
>


Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Chris Hostetter

: Is anyone using Cacti to track trends over time in Solr and Tomcat 
: metrics?  We have Nagios set up for alerts, but want to track trends 
: over time.

A key thing to remember is that all of the "stats" you can get from solr 
via HTTP are also available via JMX...

http://wiki.apache.org/solr/SolrJmx

...so anytime if you have favotire monitoring tool WizWat and you're 
wondering if anyone has tips on using WizWat to monitor Solr, start 
by checking if WizWat has any docs on monitoring apps using JMX.


-Hoss


Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Upayavira
A single zookeeper node could be a single point of failure. It is
recommended that you have at least one three zookeeper nodes running as
an ensemble.

Zookeeper has a simple rule - over half of your nodes must be available
to achieve quorum and thus be functioning. This is to avoid
'split-brain'. Thus, with three servers, you could handle the loss of
one zookeeper node. Five would allow the loss of two nodes.

More to the point, you're pushing the static configuration from being a
list of solr nodes, to being a list of Zookeeper nodes. The expectation
is clearly that you'll need to scale your Zookeeper nodes far less often
than you'd need to do it with Solr.

Upayavira

On Mon, Nov 19, 2012, at 09:39 PM, Marcin Rzewucki wrote:
> OK, got it. Thanks.
> 
> On 19 November 2012 15:00, Mark Miller  wrote:
> 
> > Nodes stop accepting updates if they cannot talk to Zookeeper, so the
> > external load balancer is no advantage there.
> >
> > CloudSolrServer will be smart about knowing who the leaders are,
> > eventually will do hashing, will auto add/remove nodes from rotation based
> > on the cluster state in Zookeeper, and is probably out of the box more
> > intelligent about retrying on some responses (for example responses that
> > are returned on shutdown or startup).
> >
> > - Mark
> >
> > On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki  wrote:
> >
> > > Hi,
> > >
> > > As far as I know CloudSolrServer is recommended to be used for indexing
> > to
> > > SolrCloud. I wonder what are advantages of this approach over external
> > > load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
> > +
> > > 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
> > > load-balancer and send updates to any existing node. In former case it
> > > seems that ZooKeeper is a single point of failure - indexing is not
> > > possible if it is down. In latter case I can still indexing data even if
> > > some nodes are down (no data outage). What is better for reliable
> > indexing
> > > - CloudSolrServer, load-balancer or you know some different methods worth
> > > to consider ?
> > >
> > > Regards.
> >
> >


Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Marcin Rzewucki
OK, got it. Thanks.

On 19 November 2012 15:00, Mark Miller  wrote:

> Nodes stop accepting updates if they cannot talk to Zookeeper, so the
> external load balancer is no advantage there.
>
> CloudSolrServer will be smart about knowing who the leaders are,
> eventually will do hashing, will auto add/remove nodes from rotation based
> on the cluster state in Zookeeper, and is probably out of the box more
> intelligent about retrying on some responses (for example responses that
> are returned on shutdown or startup).
>
> - Mark
>
> On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki  wrote:
>
> > Hi,
> >
> > As far as I know CloudSolrServer is recommended to be used for indexing
> to
> > SolrCloud. I wonder what are advantages of this approach over external
> > load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas)
> +
> > 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
> > load-balancer and send updates to any existing node. In former case it
> > seems that ZooKeeper is a single point of failure - indexing is not
> > possible if it is down. In latter case I can still indexing data even if
> > some nodes are down (no data outage). What is better for reliable
> indexing
> > - CloudSolrServer, load-balancer or you know some different methods worth
> > to consider ?
> >
> > Regards.
>
>


Re: solr cloud shards and servers issue

2012-11-19 Thread Otis Gospodnetic
Joe,

Can you remove it from the config and have it gone when you restart Solr?
Or restart Solr and unload as described on
http://wiki.apache.org/solr/CoreAdmin ?

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 11:57 AM, joe.cohe...@gmail.com <
joe.cohe...@gmail.com> wrote:

> How can I unload a solrCore after i killed the running process?
>
>
> Mark Miller-3 wrote
> > On Nov 19, 2012, at 11:24 AM,
>
> > joe.cohen.m@
>
> >  wrote:
> >
> >> Hi
> >> I have the following scenario:
> >> I have 1 collection across 10 servers. Num of shards: 10.
> >> Each server has 2 solr instances running. replication is 2.
> >>
> >> I want to move one of the instances to another server. meaning, kill the
> >> solr process in server X and start a new solr process in server Y
> >> instead.
> >> When I kill the solr process in server X, I can still see that instance
> >> in
> >> the solr-cloud-graph (marked differently).
> >> When I run the instance on server Y, it get attahced to another shard,
> >> instead of getting into the shard that is now actually missing an
> >> instance.
> >>
> >> 1. Any way to tell solr/zookeeper  - "Forget about that instance"?
> >
> > Unload the SolrCores involved.
> >
> >> 2. when running a new solr instance - any way to tell solr/zookeper -
> >> "add
> >> this instance to shard X"?
> >
> > Specify a shardId when creating the core or configuring it in solr.xml
> and
> > make it match the shard you want to add to.
> >
> > - Mark
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101p402.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Odd behaviour for case insensitive searches

2012-11-19 Thread shemszot
Hello Everyone,

I've been having issues with odd SOLR behavior when searching for case
insensitive data.  Let's take a vanilla SOLR config (from the example). 
Then I uploaded the default solr.xml document with a slight modification to
the field with name 'name'.  I added "Thomas NOSQL.



  SOLR1000
  Solr, the Enterprise Search Server Thomas NOSQL




Then when I search for 
nosql~

I got the record returned in the search

However, when I seach for NOSQL~ no records are returned.

You can see my solr admin interface here:

http://skatingboutique.com [PORT 8080] /solr/#/tracks

Why is this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Odd-behaviour-for-case-insensitive-searches-tp4021171.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Walter Underwood
We (Chegg) are using New Relic, even for the dev systems. It is pretty good, 
but only reports averages, when we need median and 90th percentile.

Our next step is putting something together with the Metrics server from Coda 
Hale (http://metrics.codahale.com/) and Graphite 
(http://graphite.wikidot.com/). This looks far more capable than New Relic, but 
more work.

wunder

On Nov 19, 2012, at 12:36 PM, Andy Lester wrote:

> 
> On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic  
> wrote:
> 
>> My favourite topic ;)  See my sig below for SPM for Solr. At my last
>> company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
>> some graphite, some newrelic, some SPM, some nothing!
> 
> 
> SPM looks mighty tasty, but we must have it in-house on our own servers, for 
> monitoring internal dev systems, and we'd like it to be open source.
> 
> We already have Cacti up and running, but it's possible we could use 
> something else.
> 
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
> 






Re: Solr Delta Import Handler not working

2012-11-19 Thread Lance Norskog
|  dataSource="null"

I think this should not be here. The datasource should default to the 
 listing. And 'rootEntity=true' should be in the 
XPathEntityProcessor block, because you are adding each file as one document.

- Original Message -
| From: "Spadez" 
| To: solr-user@lucene.apache.org
| Sent: Sunday, November 18, 2012 7:34:34 AM
| Subject: Re: Solr Delta Import Handler not working
| 
| Update! Thank you to Lance for the help. Based on your suggestion I
| have
| fixed up a few things.
| 
| *My Dataconfig now has the filename pattern fixed and root
| entity=true*
| /
|   
|   
| 
|   
|   
| 
|   
| /
| 
| *My data.xml has a corrected date format with "T":*
| /
| 
| 123
|   Delta Import 2
| This is my long description
|   This is
| 
| Google
| England
| 2007-12-31T22:29:59
| Google
| www.google.com
| 45.17614,45.17614
| 
| /
| 
| 
| 
| --
| View this message in context:
| 
http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897p4020925.html
| Sent from the Solr - User mailing list archive at Nabble.com.
| 


Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester

On Nov 19, 2012, at 1:46 PM, Otis Gospodnetic  
wrote:

> My favourite topic ;)  See my sig below for SPM for Solr. At my last
> company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
> some graphite, some newrelic, some SPM, some nothing!


SPM looks mighty tasty, but we must have it in-house on our own servers, for 
monitoring internal dev systems, and we'd like it to be open source.

We already have Cacti up and running, but it's possible we could use something 
else.

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



Re: Best way to retrieve 20 specific documents

2012-11-19 Thread Tomás Fernández Löbbe
If you are in Solr 4 you could use realtime get and list the ids that you
need. For example:
http://host:port/solr/mycore/get?ids=my_id_1,my_id_2...

See http://lucidworks.lucidimagination.com/display/solr/RealTime+Get

Tomás


On Mon, Nov 19, 2012 at 5:27 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> How about id1 OR id2 OR id3? :)
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
>
>
>
>
> On Mon, Nov 19, 2012 at 2:40 PM, Dotan Cohen  wrote:
>
> > Suppose that an application needs to retrieve about 20-30 solr
> > documents by id. The application could simply run 20 queries to
> > retrieve them, but is there a better way? The id field is stored and
> > indexed, of course. It is of type solr.StrField, and is configured as
> > the uniqueKey.
> >
> > Thank you for any insight.
> >
> > --
> > Dotan Cohen
> >
> > http://gibberish.co.il
> > http://what-is-what.com
> >
>


Re: Per user document exclusions

2012-11-19 Thread Otis Gospodnetic
Hi Christian,

Since you didn't explicitly mention it, I'm not sure if you are aware of it
- ManifoldCF has ACL support built in.  This may be what you are after.

Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html




On Mon, Nov 19, 2012 at 12:05 AM, Christian Jensen
wrote:

> Hi,
>
> We have a need to allow each user to 'exclude' individual documents in the
> results. We can easily do this now within the RDBMS using a FTS index and a
> query with 'OUTER LEFT JOIN WHERE NULL' type of thing.
>
> Can Solr do this somehow? Heavy customization is not a problem - I would
> bet this has already been done. I would like to avoid multiple trips back
> and forth from either the DB or SOLR if possible.
>
> Thanks!
> Christian
>
> --
>
> *Christian Jensen*
> 724 Ioco Rd
> Port Moody, BC V3H 2W8
> +1 (778) 996-4283
> christ...@jensenbox.com
>


Re: Solr4.0 / SolrCloud queries

2012-11-19 Thread shreejay
Hi all , 

I have managed to successfully index around 6 million documents, but while
indexing (and even now after the indexing has stopped), I am running into a
bunch of errors. 

The most common error I see is 
/" null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Server refused connection
at: http://ABC:8983/solr/xyzabc"/

I have made sure that the servers are able to communicate with each other
using the same names. 

Another error I keep getting is that the leader stops recovering and goes
"red" / recovery failed.
/"Error while trying to recover.
core=ABC123:org.apache.solr.common.SolrException: We are not the leader"/


The servers intermittently go offline taking down one of the shards and in
turn stopping all search queries. 

The configuration I have 

Shard1:
Server1 -  Memory - 22GB , JVM - 8gb 
Server2 - Memory - 22GB , JVM - 10gb  (This one is on "recovery failed"
status, but still acting as a leader). 

Shard2:
Server1 -  Memory - 22GB , JVM - 8 GB (This one is on "recovery failed"
status, but still acting as a leader). 
Server2 - Memory -  22 GB, JVM - 8 GB

Shard3 
Server1 - Memory -  22 GB, JVM - 10 GB
Server2 - Memory -  22 GB, JVM - 8 GB

While typing his post I did a "Reload" from the Core Admin page, and both
servers (Shard1-Server2 and Shard2-Server1)came back up again. 

Has anyone else encountered these issues? Any steps to prevent these? 

Thanks. 


--Shreejay






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-0-SolrCloud-queries-tp4016825p4021154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Per user document exclusions

2012-11-19 Thread SUJIT PAL
Hi Christian,

Since customization is not a problem in your case, how about writing out the 
userId and excluded document ids to the database when it is excluded, and then 
for each query from the user (possibly identified by a userid parameter), 
lookup the database by userid, construct a NOT filter out of the excluded 
docIds, then send to Solr as the fq?

We are using a variant of this approach to allow database style wildcard search 
on document titles.

-sujit
 
On Nov 18, 2012, at 9:05 PM, Christian Jensen wrote:

> Hi,
> 
> We have a need to allow each user to 'exclude' individual documents in the
> results. We can easily do this now within the RDBMS using a FTS index and a
> query with 'OUTER LEFT JOIN WHERE NULL' type of thing.
> 
> Can Solr do this somehow? Heavy customization is not a problem - I would
> bet this has already been done. I would like to avoid multiple trips back
> and forth from either the DB or SOLR if possible.
> 
> Thanks!
> Christian
> 
> -- 
> 
> *Christian Jensen*
> 724 Ioco Rd
> Port Moody, BC V3H 2W8
> +1 (778) 996-4283
> christ...@jensenbox.com



Re: Can Solr v1.4 and v4.0 co-exist in Tomcat?

2012-11-19 Thread James Jory
Hi Ken-

We've been running 1.3 and 4.0 as separate web apps within the same Tomcat 
instance for the last 3 weeks with no issues. The only challenge for us was 
refactoring our app client code to use SolrJ 4.0 to access both the the 1.3 and 
4.0 backends. The calls to the 1.3 backend use the XML response format while 
the 4.0 backend use the Java binary format.

-James

On Nov 19, 2012, at 11:40 AM, kfdroid  wrote:

> I have an existing v1.4 implementation of Solr that supports 2 lines of
> business. For a third line of business the need to do Geo searching requires
> using Solr 4.0. I'd like to minimize the impact to the existing lines of
> business (let them upgrade at their own pace), however I want to share
> hardware if possible. 
> 
> Can I have Solr 4.0 and Solr 1.4 co-exist in the same Tomcat instance? If
> so, are there any potential side-effects to the existing Solr implementation
> I should be aware of?
> 
> Thanks,
> Ken
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-Solr-v1-4-and-v4-0-co-exist-in-Tomcat-tp4021146.html
> Sent from the Solr - User mailing list archive at Nabble.com.



RE: How do I best detect when my DIH load is done?

2012-11-19 Thread Dyer, James
I'm not sure.  But there are at least a few jira issues open with differing 
ideas on how to improve this.  For instance,

SOLR-1554
SOLR-2728
SOLR-2729

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: geeky2 [mailto:gee...@hotmail.com] 
Sent: Monday, November 19, 2012 1:52 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I best detect when my DIH load is done?

James,

was it you (cannot remember) that replied to one of my queries on this
subject and mentioned that there was consideration being given to "cleaning"
up the response codes to remove ambiguity?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html
Sent from the Solr - User mailing list archive at Nabble.com.




Inserting many documents and update relations

2012-11-19 Thread uwe72
Hi there,

i have a principal question.

We have arround 5 million lucene documents. 

At the beginning we have arround 4000 XML-files which we transform to
SolrInputDocuemnts by using solrj and adding them to the index.

A document is also related to other documents, so while adding a document we
have to do some queries (at least one) to identiy if there are related
documents already in the cache in order to do the association to the related
document. The related document also has a "backlink", so we have to update
also the related document (means load, update, delete and re-add).

We are using solr 3.6.1.

The performance is quite slow because of this queries and modfifications of
already existing documents in the cache.

Are there some configuration issues what we can do, or anything else?

Thanks a lot in advance.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inserting-many-documents-and-update-relations-tp4021151.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I best detect when my DIH load is done?

2012-11-19 Thread geeky2
James,

was it you (cannot remember) that replied to one of my queries on this
subject and mentioned that there was consideration being given to "cleaning"
up the response codes to remove ambiguity?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Otis Gospodnetic
Hi Andy,

My favourite topic ;)  See my sig below for SPM for Solr. At my last
company we used Cacti but it felt very 1990s almost. Some ppl use zabbix,
some graphite, some newrelic, some SPM, some nothing!

Otis
--
Solr Performance Monitoring - http://sematext.com/spm
On Nov 19, 2012 2:18 PM, "Andy Lester"  wrote:

> Is anyone using Cacti to track trends over time in Solr and Tomcat
> metrics?  We have Nagios set up for alerts, but want to track trends over
> time.
>
> I've found a couple of examples online, but none have worked completely
> for me.  I'm looking at this one next:
> http://forums.cacti.net/viewtopic.php?f=12&t=19744&start=15  It looks
> promising although it doesn't monitor Solr itself.
>
> Suggestions?
>
> Thanks,
> Andy
>
> --
> Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance
>
>


Re: How do I best detect when my DIH load is done?

2012-11-19 Thread geeky2
Hello Andy,

i had a similar question on this some time ago.

http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-td3987110.html#a3987123

http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-td3801327.html#a3803658

i ended up writing my own shell based polling application that runs from our
*nx batch server that handles all of our Control-M work.  

+1 on the idea of making this a more formal part of the API.

let me know if you want concrete example code.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-best-detect-when-my-DIH-load-is-done-tp4021121p4021148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can Solr v1.4 and v4.0 co-exist in Tomcat?

2012-11-19 Thread kfdroid
I have an existing v1.4 implementation of Solr that supports 2 lines of
business. For a third line of business the need to do Geo searching requires
using Solr 4.0. I'd like to minimize the impact to the existing lines of
business (let them upgrade at their own pace), however I want to share
hardware if possible. 

Can I have Solr 4.0 and Solr 1.4 co-exist in the same Tomcat instance? If
so, are there any potential side-effects to the existing Solr implementation
I should be aware of?

Thanks,
Ken



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-Solr-v1-4-and-v4-0-co-exist-in-Tomcat-tp4021146.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I best detect when my DIH load is done?

2012-11-19 Thread Shawn Heisey

On 11/19/2012 11:52 AM, Dyer, James wrote:

Andy,

I use an approach similar to yours.  There may be something better, however.  You might 
be able to write an "onImportEnd" listener to tell you when it ends.

See http://wiki.apache.org/solr/DataImportHandler#EventListeners for a little 
documentation

See also https://issues.apache.org/jira/browse/SOLR-938 and 
https://issues.apache.org/jira/browse/SOLR-1081 for the background on this 
feature.

If you do end up using this let us know how it works and if there is anything 
you could see to improve it.


I think it would be a good idea to provide a SolrJ API out of the box 
(similar to CoreAdminRequest) for gathering the status URL from Solr and 
obtaining the following information:


1) Determining import status
-a) never started (idle)
-b) finished successful (idle)
-c) finished with error, canceled, etc. (idle)
-d) in progress. (busy)
2) Determining how many documents have been added.
3) Determining how long the import took or has taken so far.
4) Any other commonly gathered information.

There may be some reluctance to do this simply because DIH is a contrib 
module.  Perhaps there could be a contrib module for SolrJ?


Thanks,
Shawn



Re: Search using the result returned from the spell checking component

2012-11-19 Thread Roni
And performance-wise: is asking for 0 rows the same as asking for 100 rows?

On Mon, Nov 19, 2012 at 9:22 PM, Walter Underwood [via Lucene] <
ml-node+s472066n4021143...@n3.nabble.com> wrote:

> You can even request zero rows. That will still return the number of
> matches.  --wunder
>
> On Nov 19, 2012, at 11:12 AM, Roni wrote:
>
> > Thank you.
> >
> > I was wondering - what if a make a first request, and ask it to return
> only
> > 1 result - will it still return the spell suggestions while avoiding the
> > overhead of returning all relevant results?
> >
> > Than I could make a second request to get all the results i need.
> >
> > Would that work?
>
>
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021143.html
>  To unsubscribe from Search using the result returned from the spell
> checking component, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021144.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search using the result returned from the spell checking component

2012-11-19 Thread Walter Underwood
You can even request zero rows. That will still return the number of matches.  
--wunder

On Nov 19, 2012, at 11:12 AM, Roni wrote:

> Thank you.
> 
> I was wondering - what if a make a first request, and ask it to return only
> 1 result - will it still return the spell suggestions while avoiding the
> overhead of returning all relevant results?
> 
> Than I could make a second request to get all the results i need.
> 
> Would that work?





Cacti monitoring of Solr and Tomcat

2012-11-19 Thread Andy Lester
Is anyone using Cacti to track trends over time in Solr and Tomcat metrics?  We 
have Nagios set up for alerts, but want to track trends over time.

I've found a couple of examples online, but none have worked completely for me. 
 I'm looking at this one next: 
http://forums.cacti.net/viewtopic.php?f=12&t=19744&start=15  It looks promising 
although it doesn't monitor Solr itself.

Suggestions?

Thanks,
Andy

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



RE: Search using the result returned from the spell checking component

2012-11-19 Thread Roni
Thank you.

I was wondering - what if a make a first request, and ask it to return only
1 result - will it still return the spell suggestions while avoiding the
overhead of returning all relevant results?

Than I could make a second request to get all the results i need.

Would that work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135p4021140.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Search using the result returned from the spell checking component

2012-11-19 Thread Dyer, James
What you want isn't supported.  You always will need to issue that second 
request.  This would be a nice feature to add though.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Roni [mailto:r...@socialarray.com] 
Sent: Monday, November 19, 2012 12:54 PM
To: solr-user@lucene.apache.org
Subject: Search using the result returned from the spell checking component

Hi,

I've successfully configured the spell check component and it works well.

I couldn't find an answer to my question so any help would be much
appreciated: 

Can i send a single request to Solr, and make it so that if any part of the
query was misspelled, than the search would be performed using the first
spell suggestion that returns?

I want to make only one request, e.g. submit a query only once, if that is
possible.

For example: if a user searched for "jaca" than the search would be
performed only once - for "java".

Thanks an advance for any answer or a link to a relevant resource (I
couldn't find any).

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html
Sent from the Solr - User mailing list archive at Nabble.com.




Search using the result returned from the spell checking component

2012-11-19 Thread Roni
Hi,

I've successfully configured the spell check component and it works well.

I couldn't find an answer to my question so any help would be much
appreciated: 

Can i send a single request to Solr, and make it so that if any part of the
query was misspelled, than the search would be performed using the first
spell suggestion that returns?

I want to make only one request, e.g. submit a query only once, if that is
possible.

For example: if a user searched for "jaca" than the search would be
performed only once - for "java".

Thanks an advance for any answer or a link to a relevant resource (I
couldn't find any).

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-using-the-result-returned-from-the-spell-checking-component-tp4021135.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I best detect when my DIH load is done?

2012-11-19 Thread Dyer, James
Andy,

I use an approach similar to yours.  There may be something better, however.  
You might be able to write an "onImportEnd" listener to tell you when it ends.  

See http://wiki.apache.org/solr/DataImportHandler#EventListeners for a little 
documentation

See also https://issues.apache.org/jira/browse/SOLR-938 and 
https://issues.apache.org/jira/browse/SOLR-1081 for the background on this 
feature.

If you do end up using this let us know how it works and if there is anything 
you could see to improve it.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy Lester [mailto:a...@petdance.com] 
Sent: Monday, November 19, 2012 10:29 AM
To: solr-user@lucene.apache.org
Subject: How do I best detect when my DIH load is done?

A little while back, I needed a way to tell if my DIH load was done, so I made 
up a little Ruby program to query /dih?command=status .  The program is here: 
http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/

Is this the best way to do it?  Is there some other tool or interface that I 
should be using instead?

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



RE: Architecture Question

2012-11-19 Thread Buttler, David
If you just want to store the data, you can dump it into HDFS sequence files.  
While HBase is really nice if you want to process and serve data real-time, it 
adds overhead to use it as pure storage.
Dave

-Original Message-
From: Cool Techi [mailto:cooltec...@outlook.com] 
Sent: Friday, November 16, 2012 8:26 PM
To: solr-user@lucene.apache.org
Subject: RE: Architecture Question

Hi Otis,

Thanks for your reply, just wanted to check what NoSql structure would be best 
suited to store data and use the least amount of memory, since for most of my 
work Solr would be sufficient and I want to store data just in case we want to 
reindex and as a backup.

Regards,
Ayush

> Date: Fri, 16 Nov 2012 15:47:40 -0500
> Subject: Re: Architecture Question
> From: otis.gospodne...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Hello,
> 
> 
> 
> > I am not sure if this is the right forum for this question, but it would
> > be great if I could be pointed in the right direction. We have been using a
> > combination of MySql and Solr for all our company full text and query
> > needs.  But as our customers have grow so has the amount of data and MySql
> > is just not proving to be a right option for storing/querying.
> >
> > I have been looking at Solr Cloud and it looks really impressive, but and
> > not sure if we should give away our storage system. So, I have been
> > exploring DataStax but a commercial option is out of question. So we were
> > thinking of using hbase to store the data and at the same time index the
> > data into Solr cloud, but for many reasons this design doesn't seem
> > convincing (Also seen basic of Lilly).
> >
> > 1) Would it be recommended to just user Solr cloud with multiple
> > replication or hbase-solr seems like good option
> >
> 
> If you trust SolrCloud with replication and keep all your fields stored
> then you could live without an external DB.  At this point I personally
> would still want an external DB.  Whether HBase is the right DB for the job
> I can't tell because I don't know anything about your data, volume, access
> patterns, etc.  I can tell you that HBase does scale well - we have tables
> with many billions of rows stored in it for instance.
> 
> 
> > 2) How much strain would be to keep both Solr Shard and Hbase node on the
> > same machine
> >
> 
> HBase loves memory.  So does Solr.  They both dislike disk IO (who
> doesn't!).  Solr can use a lot of CPU for indexing/searching, depending on
> the volume.  HBase RegionServers can use a lot of CPU if you run MapReuce
> on data in HBase.
> 
> 
> > 3) if there a calculation on what kind of machine configuration would I
> > need to store 500-1000 million records. Most of these with be social data
> > (Twitter/facebook/blogs etc) and how many shards.
> >
> 
> No recipe here, unfortunately.  You'd have to experiment and test, do load
> and performance testing, etc.  If you need help with Solr + HBase, we
> happen to have a lot of experience with both and have even used them
> together for some of our clients.
> 
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
  


RE: inconsistent number of results returned in solr cloud

2012-11-19 Thread Buttler, David
Answers inline below

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, November 17, 2012 6:40 AM
To: solr-user@lucene.apache.org
Subject: Re: inconsistent number of results returned in solr cloud

Hmmm, first an aside. If by "commit after every batch of documents " you
mean after every call to server.add(doclist), there's no real need to do
that unless you're striving for really low latency. the usual
recommendation is to use commitWithin when adding and commit only at the
very end of the run. This shouldn't actually be germane to your issue, just
an FYI.

DB> Good point.  The code for committing docs to solr is fairly old.  I will 
update it since I don't have a latency requirement.

So you're saying that the inconsistency is permanent? By that I mean it
keeps coming back inconsistently for minutes/hours/days?

DB> Yes, it is permanent.  I have collections that have been up for weeks, and 
are still returning inconsistent results, and I haven't been adding any 
additional documents.
DB> Related to this, I seem to have a discrepancy between the number of 
documents I think I am sending to solr, and the number of documents it is 
reporting.  I have tried reducing the number of shards for one of my small 
collections, so I deleted all references to this collections, and reloaded it. 
I think I have 260 documents submitted (counted from a hadoop job).  Solr 
returns a count of ~430 (it varies), and the first returned document is not 
consistent.

I guess if I were trying to test this I'd need to know how you added
subsequent collections. In particular what you did re: zookeeper as you
added each collection.

DB> These are my steps
DB> 1. Create the collection via the HTTP API: 
http://:/solr/admin/collections?action=CREATE&name=&numShards=6&%20collection.configName=
DB> 2. Relaunch one of my JVM processes, bootstrapping the collection: 
DB> java -Xmx16g -Dcollection.configName= -Djetty.port= 
-DzkHost= -Dsolr.solr.home= -DnumShards=6 
-Dbootstrap_confdir=conf -jar start.jar
DB> load data

DB> Let me know if something is unclear.  I can run through the process again 
and document it more carefully.
DB>
DB> Thanks for looking at it,
DB> Dave

Best
Erick


On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David  wrote:

> My typical way of adding documents is through SolrJ, where I commit after
> every batch of documents (where the batch size is configurable)
>
> I have now tried committing several times, from the command line (curl)
> with and without openSearcher=true.  It does not affect anything.
>
> Dave
>
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Friday, November 16, 2012 11:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: inconsistent number of results returned in solr cloud
>
> How did you do the final commit? Can you try a lone commit (with
> openSearcher=true) and see if that affects things?
>
> Trying to determine if this is a known issue or not.
>
> - Mark
>
> On Nov 16, 2012, at 1:34 PM, "Buttler, David"  wrote:
>
> > Hi all,
> > I buried an issue in my last post, so let me pop it up.
> >
> > I have a cluster with 10 collections on it.  The first collection I
> loaded works perfectly.  But every subsequent collection returns an
> inconsistent number of results for each query.  The queries can be simply
> *:*, or more complex facet queries.  If I go to individual cores and issue
> the query, with distrib=false, I get a consistent number of results.  I am
> wondering if there is some delay in returning results from my shards, and
> the queried node just times out and displays the number of results that it
> has received so far.  If there is such a timeout, it must be very small, as
> my QTime is around 11 ms.
> >
> > Dave
>
>


Re: solr cloud shards and servers issue

2012-11-19 Thread joe.cohe...@gmail.com
How can I unload a solrCore after i killed the running process?


Mark Miller-3 wrote
> On Nov 19, 2012, at 11:24 AM, 

> joe.cohen.m@

>  wrote:
> 
>> Hi
>> I have the following scenario:
>> I have 1 collection across 10 servers. Num of shards: 10.
>> Each server has 2 solr instances running. replication is 2.
>> 
>> I want to move one of the instances to another server. meaning, kill the
>> solr process in server X and start a new solr process in server Y
>> instead.
>> When I kill the solr process in server X, I can still see that instance
>> in
>> the solr-cloud-graph (marked differently).
>> When I run the instance on server Y, it get attahced to another shard,
>> instead of getting into the shard that is now actually missing an
>> instance.
>> 
>> 1. Any way to tell solr/zookeeper  - "Forget about that instance"?
> 
> Unload the SolrCores involved.
> 
>> 2. when running a new solr instance - any way to tell solr/zookeper -
>> "add
>> this instance to shard X"?
> 
> Specify a shardId when creating the core or configuring it in solr.xml and
> make it match the shard you want to add to.
> 
> - Mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101p402.html
Sent from the Solr - User mailing list archive at Nabble.com.


Order by hl.snippets count

2012-11-19 Thread Gabriel Croitoru

Hello,
I'm using  Solr 1.3 with 
http://wiki.apache.org/solr/HighlightingParameters options.
The client just asked us to change the order from the default score to 
the number of hl.snippets per document.


It's this posibble from Solr configuration? (without implementing a 
custom scoring algorithm)?


Thanks,
--
*Gabriel-Cristian CROITORU*

Senior Software Engineer
www.zitec.com
Tel. +40 (0)31 71 00 114

We are hiring! www.zitec.com/join-zitec


Re: solr cloud shards and servers issue

2012-11-19 Thread Mark Miller

On Nov 19, 2012, at 11:24 AM, joe.cohe...@gmail.com wrote:

> Hi
> I have the following scenario:
> I have 1 collection across 10 servers. Num of shards: 10.
> Each server has 2 solr instances running. replication is 2.
> 
> I want to move one of the instances to another server. meaning, kill the
> solr process in server X and start a new solr process in server Y instead.
> When I kill the solr process in server X, I can still see that instance in
> the solr-cloud-graph (marked differently).
> When I run the instance on server Y, it get attahced to another shard,
> instead of getting into the shard that is now actually missing an instance.
> 
> 1. Any way to tell solr/zookeeper  - "Forget about that instance"?

Unload the SolrCores involved.

> 2. when running a new solr instance - any way to tell solr/zookeper - "add
> this instance to shard X"?

Specify a shardId when creating the core or configuring it in solr.xml and make 
it match the shard you want to add to.

- Mark



How do I best detect when my DIH load is done?

2012-11-19 Thread Andy Lester
A little while back, I needed a way to tell if my DIH load was done, so I made 
up a little Ruby program to query /dih?command=status .  The program is here: 
http://petdance.com/2012/07/a-little-ruby-program-to-monitor-solr-dih-imports/

Is this the best way to do it?  Is there some other tool or interface that I 
should be using instead?

Thanks,
xoa

--
Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance



solr cloud shards and servers issue

2012-11-19 Thread joe.cohe...@gmail.com
Hi
I have the following scenario:
I have 1 collection across 10 servers. Num of shards: 10.
Each server has 2 solr instances running. replication is 2.

I want to move one of the instances to another server. meaning, kill the
solr process in server X and start a new solr process in server Y instead.
When I kill the solr process in server X, I can still see that instance in
the solr-cloud-graph (marked differently).
When I run the instance on server Y, it get attahced to another shard,
instead of getting into the shard that is now actually missing an instance.

1. Any way to tell solr/zookeeper  - "Forget about that instance"?
2. when running a new solr instance - any way to tell solr/zookeper - "add
this instance to shard X"?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-cloud-shards-and-servers-issue-tp4021101.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Solr indexer/searcher

2012-11-19 Thread Smiley, David W.
FWIW I helped someone a few days ago about a similar problem and similarly 
advised modifying SpatialPrefixTree:
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tt4020445.html

IMO GeoHashField should be deprecated because it ads no value.

~ David

On Nov 16, 2012, at 1:49 PM, Scott Smith wrote:

> Thanks for the suggestions.  I'll take a look at these things.
> 
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
> Sent: Thursday, November 15, 2012 11:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Custom Solr indexer/searcher
> 
> Scott,
> It sounds like you need to look into few samples of similar things in Lucene. 
> On top of my head FuzzyQuery from 4.0, which finds terms similar to the given 
> in FST for query expansion. Generic query expansion is done via 
> MultiTermQuery. Index time terms expansion is shown in TrieField and btw 
> NumericRangeQuery (it should match with your goal a lot). All these are 
> single dimension samples, but AFAIK KD-tree is multidimensional, look into 
> GeoHashField which puts two dimensional points into single terms with ability 
> to build ranges on them see GeoHashField.createSpatialQuery().
> 
> Happy hacking!
> 
> 
> On Fri, Nov 16, 2012 at 10:34 AM, John Whelan  wrote:
> 
>> Scott,
>> 
>> I probably have no idea as to what I'm saying, but if you're looking 
>> for finding results in a N-dimensional space, you might look at 
>> creating a field of type 'point'. Point-type fields have a dimension 
>> attribute; I believe that it can be set to a large integer value.
>> 
>> Barring that, there is also a 'dist()' function that can be used to 
>> work with multiple numeric fields in order sort results based on 
>> closeness to a desired coordinate. The 'dist function takes a 
>> parameter to specify the means of calculating the distance. (For example, 2 
>> -> 'Euclidean distance'.
>> I don't know the other options.)
>> 
>> In the worst case, my response is worthless, but pops your question 
>> back up in the e-mails...
>> 
>> Regards,
>> John
>> 
> 
> 
> 
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> 
> 



SolrCloud and exernal file fields

2012-11-19 Thread Simone Gianni
Hi all,
I'm planning to move a quite big Solr index to SolrCloud. However, in this
index, an external file field is used for popularity ranking.

Does SolrCloud supports external file fields? How does it cope with
sharding and replication? Where should the external file be placed now that
the index folder is not local but in the cloud?

Are there otherwise other best practices to deal with the use cases
external file fields were used for, like popularity/ranking, in SolrCloud?
Custom ValueSources going to something external?

Thanks in advance,
Simone


Re: CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Mark Miller
Nodes stop accepting updates if they cannot talk to Zookeeper, so the external 
load balancer is no advantage there.

CloudSolrServer will be smart about knowing who the leaders are, eventually 
will do hashing, will auto add/remove nodes from rotation based on the cluster 
state in Zookeeper, and is probably out of the box more intelligent about 
retrying on some responses (for example responses that are returned on shutdown 
or startup).

- Mark

On Nov 19, 2012, at 6:54 AM, Marcin Rzewucki  wrote:

> Hi,
> 
> As far as I know CloudSolrServer is recommended to be used for indexing to
> SolrCloud. I wonder what are advantages of this approach over external
> load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) +
> 1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
> load-balancer and send updates to any existing node. In former case it
> seems that ZooKeeper is a single point of failure - indexing is not
> possible if it is down. In latter case I can still indexing data even if
> some nodes are down (no data outage). What is better for reliable indexing
> - CloudSolrServer, load-balancer or you know some different methods worth
> to consider ?
> 
> Regards.



Re: SolrCloud Error after leader restarts

2012-11-19 Thread Mark Miller
Your using ram dir?

Sent from my iPhone

On Nov 19, 2012, at 1:21 AM, deniz  wrote:

> Hello,
> 
> for test purposes, I am running two zookeepers on ports 2181 and 2182. and i
> have two solr instances running on different machines...
> 
> For the one which is running on my local and acts as leader:
> java -Dbootstrap_conf=true -DzkHost=localhost:2181 -jar start.jar
> 
> and for the one which acts as follower, on a remote machine:
> java -Djetty.port=7574 -DzkHost=:2182 -jar start.jar
> 
> until this point everything is smooth and i can see the configs on both
> zookeeper hosts when i connect with zkCli.sh. 
> 
> just to see what happens and check recovery stuff, i have killed the solr
> which is running on my local and tried to index some files by using the
> follewer, which was failed... this is normal as writes are routed into the
> leader...
> 
> the point that i dont understand is here:
> 
> when i restart the leader with the same command on terminal, after normal
> logs, it start showing this 
> 
> 
> Nov 19, 2012 2:15:18 PM org.apache.solr.common.SolrException log
> SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch
> failed : 
>at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400)
>at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:151)
>at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:405)
>at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
> Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file
> found in org.apache.lucene.store.RAMDirectory@1e75e89
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@128e909: files: []
>at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
>at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
>at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
>at org.apache.lucene.index.IndexWriter.(IndexWriter.java:639)
>at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:75)
>at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62)
>at
> org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:191)
>at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:77)
>at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:354)
>... 4 more
> 
> Nov 19, 2012 2:15:18 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException:
> Replication for recovery failed.
>at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:154)
>at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:405)
>at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
> 
> 
> it fails to recover after shutdown... why does this happen? 
> 
> 
> 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-Error-after-leader-restarts-tp4020985.html
> Sent from the Solr - User mailing list archive at Nabble.com.


CloudSolrServer or load-balancer for indexing

2012-11-19 Thread Marcin Rzewucki
Hi,

As far as I know CloudSolrServer is recommended to be used for indexing to
SolrCloud. I wonder what are advantages of this approach over external
load-balancer ? Let's say I have 4 nodes SolrCloud (2 shards + replicas) +
1 server running ZooKeeper. I can use CloudSolrServer for indexing or use
load-balancer and send updates to any existing node. In former case it
seems that ZooKeeper is a single point of failure - indexing is not
possible if it is down. In latter case I can still indexing data even if
some nodes are down (no data outage). What is better for reliable indexing
- CloudSolrServer, load-balancer or you know some different methods worth
to consider ?

Regards.


Re: Reduce QueryComponent prepare time

2012-11-19 Thread Mikhail Khludnev
Markus,

It's hard to suggest anything until you provide a profiler snapshot which
says what it spends time in prepare for. As far as I know in prepare it
parses queries e.g. we have a really heavy query parsers, but I don't think
it's really common.


On Mon, Nov 19, 2012 at 3:08 PM, Markus Jelsma
wrote:

> I'd also like to know which parts of the entire query constitute the
> prepare time and if it would matter significantly if we extend the edismax
> plugin and hardcode the parameters we pass into (reusable) objects.
>
> Thanks,
> Markus
>
> -Original message-
> > From:Markus Jelsma 
> > Sent: Fri 16-Nov-2012 15:57
> > To: solr-user@lucene.apache.org
> > Subject: Reduce QueryComponent prepare time
> >
> > Hi,
> >
> > We're seeing high prepare times for the QueryComponent, obviously due to
> the vast amount of field and queries. It's common to have a prepare time of
> 70-80ms while the process times drop significantly due to warmed searchers,
> OS cache etc. The prepare time is a recurring issue and i'd hope if there
> are people here that can share some thoughts or hints.
> >
> > We're using a recent check out on a 10 node test cluster with SSD's
> (although this is no IO issue) and edismax on about a hundred different
> fields, this includes phrase searches over most of those fields and
> SpanFirst queries on about 25 fields.  We'd like to see how we can avoid
> doing the same prepare procedure over and over again ;)
> >
> > Thanks,
> > Markus
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


configuring data source in apache tomcat

2012-11-19 Thread Leena Jawale
Hi,

I have configured apche solr with tomcat for that I have deployed .war file in 
tomcat. I have created the solr home directory at C:\solr. And after starting 
tomcat solr.war file get extracted and a folder is get created in webapps. In 
that in WEB-INF/web.xml I had written



 solr/home

 C:\solr

 java.lang.String

  

So after this solr admin is working.

Now I want to configure xml data source. How can I configure xml data source.?

Thanks & Regards,
Leena Jawale



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"


RE: Reduce QueryComponent prepare time

2012-11-19 Thread Markus Jelsma
I'd also like to know which parts of the entire query constitute the prepare 
time and if it would matter significantly if we extend the edismax plugin and 
hardcode the parameters we pass into (reusable) objects.

Thanks,
Markus
 
-Original message-
> From:Markus Jelsma 
> Sent: Fri 16-Nov-2012 15:57
> To: solr-user@lucene.apache.org
> Subject: Reduce QueryComponent prepare time
> 
> Hi,
> 
> We're seeing high prepare times for the QueryComponent, obviously due to the 
> vast amount of field and queries. It's common to have a prepare time of 
> 70-80ms while the process times drop significantly due to warmed searchers, 
> OS cache etc. The prepare time is a recurring issue and i'd hope if there are 
> people here that can share some thoughts or hints.
> 
> We're using a recent check out on a 10 node test cluster with SSD's (although 
> this is no IO issue) and edismax on about a hundred different fields, this 
> includes phrase searches over most of those fields and SpanFirst queries on 
> about 25 fields.  We'd like to see how we can avoid doing the same prepare 
> procedure over and over again ;)
> 
> Thanks,
> Markus
> 


Re: error opening index solr 4.0 with lukeall-4.0.0-ALPHA.jar

2012-11-19 Thread Bernd Fehling
I just downloaded, compiled and opened an optimized solr 4.0 index
in read only without problems.
Could browse through the docs, search with different analyzers, ...
Looks good.


Am 19.11.2012 08:49, schrieb Toke Eskildsen:
> On Mon, 2012-11-19 at 08:10 +0100, Bernd Fehling wrote:
>> I think there is already a BETA available:
>> http://luke.googlecode.com/svn/trunk/
> 
>> You might try that one.
> 
> That doesn't work either for Lucene 4.0.0 indexes, same for source
> trunk. I did have some luck with downloading the source and changing the
> dependencies to Lucene 4.0.0 final (4 or 5 JARs, AFAIR). It threw a
> non-fatal exception upon index open, something about subReaders not
> being accessible throught the metod it used (sorry for being vague, it
> was on my home machine and some days ago), so I'm guessing that not all
> functionality works. It was possible to inspect some documents and that
> was what I needed at the time.
>