Default query operator OR wont work in some cases

2013-08-26 Thread smanad
Hi, 

I have some documents with keywords egg and some with salad and some
with egg salad.
When I search for egg salad, I expect to see egg results + salad results. I
dont see them. 
egg and salad queries individually work fine. 
I am using whitespacetokenizer.

Not sure if I am missing something.
Thanks, 
-Manasi 
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom names for replicas in solrcloud

2013-08-26 Thread smanad
Is coreNodeName exposed via collections api?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Default query operator OR wont work in some cases

2013-08-26 Thread smanad
here is keywords field for 3 docs, 

Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger
Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl

Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs

DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce

Here is my debug query:
str name=parsedquery(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1)
DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2)
DisjunctionMaxQuery((keywords:egg salad)~0.1) /no_coord/str

Here is my fieldtype definition for keywords,
fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
types=word-delim-types.txt /
filter class=solr.EnglishMinimalStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
types=word-delim-types.txt /
filter class=solr.EnglishMinimalStemFilterFactory/
  /analyzer
/fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Default query operator OR wont work in some cases

2013-08-26 Thread smanad
I am not searching for phrase query, I am not sure why it shows up in
parsedquery.
lst name=responseHeader
  int name=status0/int
  int name=QTime3/int
  lst name=params
str name=debugQuerytrue/str
str name=indenttrue/str
str name=qegg salad
/str
str name=_1377569284170/str
str name=wtxml/str
  /lst
/lst



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html
Sent from the Solr - User mailing list archive at Nabble.com.


custom names for replicas in solrcloud

2013-08-22 Thread smanad
Hi, 

I am using Solr 4.3 with 3 solr hosta and with an external zookeeper
ensemble of 3 servers. And just 1 shard currently.

When I create collections using collections api it creates collections with
names, 
collection1_shard1_replica1, collection1_shard1_replica2,
collection1_shard1_replica3.
Is there any way to pass a custom name? or can I have all the replicas with
same name?

Any pointers will be much appreciated. 
Thanks, 
-Manasi 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205.html
Sent from the Solr - User mailing list archive at Nabble.com.


entity classification solr

2013-08-06 Thread smanad
I have the following situation when using Solr 4.3. 
My document contains entities for example peanut butter. I have a list
of such entities. These are items that go together and are not to be treated
as two individual words. During indexing, I want solr to realize this and
treat peanut butter as an entity. For example if someone searches for

peanut

then documents that have the word peanut should rank higher than documents
that have the word peanut butter. However if someone searches for

peanut butter

then the document that has peanut butter should show up higher than ones
that have just peanut. Is there a config setting somewhere which can be
modified such that the entity list can be specified in a file and Solr would
do the needful?

Should I be using KeepWordFilterFactory for this? 

Any pointers will be much appreciated.
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/entity-classification-solr-tp4082923.html
Sent from the Solr - User mailing list archive at Nabble.com.


no servers hosting shard

2013-07-31 Thread smanad
I have setup solr cloud and when I try to access documents I get this error,

lst name=errorstr name=msgno servers hosting shard: /strint
name=code503/int/lst

However if I add shards=shard1 param it works.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/no-servers-hosting-shard-tp4081783.html
Sent from the Solr - User mailing list archive at Nabble.com.


debian package for solr with jetty

2013-07-31 Thread smanad
Hi, 

I am trying to create a debian package for solr 4.3 (default installation
with jetty). 
Is there anything already available?

Also, I need 3 different cores so plan to create corresponding packages for
each of them to create solr core using admin/cores or collections api. 

I also want to use, solrcloud setup with external zookeeper ensemble, whats
the best way to create a debian package for updating zookeeper config files
as well?

Please suggest. Any pointers will be helpful.

Thanks, 
-Manasi





--
View this message in context: 
http://lucene.472066.n3.nabble.com/debian-package-for-solr-with-jetty-tp4081784.html
Sent from the Solr - User mailing list archive at Nabble.com.


Use same spell check dictionary across different collections

2013-07-22 Thread smanad
I have 2 collections, lets say coll1 and coll2.

I configured solr.DirectSolrSpellChecker in coll1 solrconfig.xml and works
fine. 

Now, I want to configure coll2 solrconfig.xml to use SAME spell check
dictionary index created above. (I do not want coll2 prepare its own
dictionary index but just do spell check against the coll1 Spell dictionary
index)

Is it possible to do it? Tried out with IndexBasedSpellChecker but could not
get it working. 

Any suggestions?
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-same-spell-check-dictionary-across-different-collections-tp4079566.html
Sent from the Solr - User mailing list archive at Nabble.com.


spellcheck and search in a same solr request

2013-07-22 Thread smanad
Hey, 

Is there a way to do spellcheck and search (using suggestions returned from
spellcheck) in a single Solr request?

I am seeing that if my query is spelled correctly, i get results but if
misspelled, I just get suggestions.

Any pointers will be very helpful.
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-and-search-in-a-same-solr-request-tp4079571.html
Sent from the Solr - User mailing list archive at Nabble.com.


Exception when using File based and Index based SpellChecker

2013-07-18 Thread smanad
I am trying to use Filebased and index based spell checker and getting this
exception All checkers need to use the same StringDistance.

They work fine as expected individually but not together. 
Any pointers?

-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-when-using-File-based-and-Index-based-SpellChecker-tp4078773.html
Sent from the Solr - User mailing list archive at Nabble.com.


Spellcheck questions

2013-07-18 Thread smanad
Exploring various SpellCheckers in solr and have a few questions, 
1. Which algorithm is used for generating suggestions when using
IndexBasedSpellChecker. I know its Levenshtein (with edit distance=2 -
default) in DirectSolrSpellChecker.
2. If i have 2 indices, can I setup multiple IndexBasedSpellCheckers to
point to different spellcheck dictionaries to generate suggestions from
both.
3. Can I use IndexBasedSpellChecker and FileBasedSpellChecker together? I
tried doing it and ran into an exception All checkers need to use the same
StringDistance.

Any help will be much apprecited.
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-questions-tp4078985.html
Sent from the Solr - User mailing list archive at Nabble.com.


Switching to using SolrCloud with tomcat7 and embedded zookeeper

2013-07-17 Thread smanad
I am running solr 4.3 with tomcat 7 (with non SolrCloud) and have 4 solr
cores running. 

Switching to start using SolrCloud with tomcat7 and embedded zookeeper I
updated JAVA_OPTS in this file tomcat7/bin/setenv.sh to following,

JAVA_OPTS=-Djava.awt.headless=true -Xms2048m -Xmx4096m
-XX:+UseConcMarkSweepGC -Dbootstrap_confdir=/path/solr/collection1/conf/
-Dcollection.configName=collection1_conf -DnumShards=1 -DzkRun

Now, all my cores (collections) are set to the default collection1 schema. 
Not sure why?

solr.xml is set to correct instanceDir settings.

Any pointers?
Thanks, 
-Manasi




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Switching-to-using-SolrCloud-with-tomcat7-and-embedded-zookeeper-tp4078524.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Switching to using SolrCloud with tomcat7 and embedded zookeeper

2013-07-17 Thread smanad
Originally i was running a single solr 4.3 instance with 4 cores ... and now
starting to learn about solrCloud and thought I could setup number of
shards=1 (since its a single instance) and same 4 cores can be converted to
4 collections on the same single shard same single instance. 

How do I define each -Dcollection.configName as a part of setenv.sh?
should I point my -Dbootstrap_confdir to parent directory instead?

-Manasi


Daniel Collins wrote
 You've specified bootstrap_confdir and the same collection.configName on
 all your cores, so as each of them start, each will be uploading its own
 configuration to the collection1_conf area of ZK, so they will all be
 overwriting each other.
 
 Are your 4 cores replicas of the same collection or are they distinct
 collections?  If the latter, then why put them all in SolrCloud (there
 really isn't any benefit?) but assuming you wanted to do it for expansion
 reasons (to add replicas later on), then each one will need to have a
 distinct collection.configName parameter, so that ZK knows to keep the
 configs separate.
 
 
 
 On 17 July 2013 07:44, smanad lt;

 smanad@

 gt; wrote:
 
 I am running solr 4.3 with tomcat 7 (with non SolrCloud) and have 4 solr
 cores running.

 Switching to start using SolrCloud with tomcat7 and embedded zookeeper I
 updated JAVA_OPTS in this file tomcat7/bin/setenv.sh to following,

 JAVA_OPTS=-Djava.awt.headless=true -Xms2048m -Xmx4096m
 -XX:+UseConcMarkSweepGC -Dbootstrap_confdir=/
 path
 /solr/collection1/conf/
 -Dcollection.configName=collection1_conf -DnumShards=1 -DzkRun

 Now, all my cores (collections) are set to the default collection1
 schema.
 Not sure why?

 solr.xml is set to correct instanceDir settings.

 Any pointers?
 Thanks,
 -Manasi




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Switching-to-using-SolrCloud-with-tomcat7-and-embedded-zookeeper-tp4078524.html
 Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Switching-to-using-SolrCloud-with-tomcat7-and-embedded-zookeeper-tp4078524p4078538.html
Sent from the Solr - User mailing list archive at Nabble.com.


select in clause in solr

2013-07-16 Thread smanad
I am using solr 4.3 and have 2 collections coll1, coll2.

After searching in coll1 I get field1 values which is a comma separated list
of strings like, val1, val2, val3,... valN.
How can I use that list to match field2 in coll2 with those values separated
by an OR clause.
So i want to return all documents in coll2 with field2=val1 or field2=val2
or field2=val3 ... or field2=valN

In short looking for select in  type clause in solr. 

Any pointers will be much appreciated. 
-Manasi
 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/select-in-clause-in-solr-tp4078255.html
Sent from the Solr - User mailing list archive at Nabble.com.


Normalizing/Returning solr scores between 0 to 1

2013-06-27 Thread smanad
Hi, 
We have a need where we would want normalized scores from score ranging
between 0 to 1 rather than a free range.
I read about it @ http://wiki.apache.org/lucene-java/ScoresAsPercentages and
seems like thats not something that is recommended. 

However, is there still a way to set some config in solrconfig to make sure
scores are always between 0 to 1?
OR i will have to implement that logic in my code after I get the results
from Solr.

Any pointers will be much appreciated.
Thanks, 
-M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: update solr.xml dynamically to add new cores

2013-06-21 Thread smanad
Gr8! thanks a lot!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800p4072190.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: update solr.xml dynamically to add new cores

2013-06-20 Thread smanad
Thanks Michael, both the reasons make sense.

Currently I am not planning on using SolrCloud so as you suggested if I can
use http://wiki.apache.org/solr/CoreAdmin api. 
While doing that did you mean running a curl command similar to this, 
http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=data
as a part of 'postinst' script? or running it manually on the host after the
index package is installed? ( I would love to do it as a part of pkg
installation.)

Also, there will be two cases here, if I am installing a new index package
in that case create will work however, if I am updating a package with
some tweaks to configs and schema then I need to check status to see if
core is available and if yes, use reload else create. Does this make
sense?


Michael Della Bitta-2 wrote
 Hi,
 
 I wouldn't edit solr.xml directly for two reasons. One being that an
 already running Solr installation won't update with changes to that file,
 and might actually overwrite the changes that you make to it. And two,
 it's
 going away in a future release of Solr.
 
 Instead, I'd make the package that installed the Solr webapp and brought
 it
 up as you described, and have your independent index packages use either
 the CoreAdmin API or Collection API to create the indexes, depending on
 whether you're using Solr Cloud or not:
 
 http://wiki.apache.org/solr/CoreAdmin
 https://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
 
 
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions lt;https://twitter.com/Appinionsgt; | g+:
 plus.google.com/appinions
 w: appinions.com lt;http://www.appinions.com/gt;
 
 
 On Wed, Jun 19, 2013 at 8:27 PM, smanad lt;

 smanad@

 gt; wrote:
 
 Hi,
 Is there a way to edit solr.xml as a part of debian package installation
 to
 add new cores.
 In my use case, there 4 solr indexes and they are managed/configured by
 different teams.
 The way I am thinking packages will work is as described below,
 1. There will be a solr-base debian package which comes with solr
 installtion with tomcat setup (I am planning to use solr 4.3)
 2. There will be individual index debian packages like,
 solr-index1, solr-index2 which will be dependent on solr-base.
 Each package's DEBIAN postinst script will have a logic to edit solr.xml
 to
 add new index like index1, index2, etc.

 Does this sound good? or is there a better/different way to do this?
 Any pointers will be much appreciated.
 Thanks,
 -M



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html
 Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800p4071970.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Partial update using solr 4.3 with csv input

2013-06-20 Thread smanad
Thanks for confirming. 

So if my input is a csv file, I will need a script to read the delta
changes one by one, convert it to json and then use 'update' handler with
that piece of json data. 
Makes sense?


Jack Krupansky-2 wrote
 Correct, no atomic update for CSV format. There just isn't any place to
 put 
 the atomic update options in such a simple text format.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: smanad
 Sent: Wednesday, June 19, 2013 8:30 PM
 To: 

 solr-user@.apache

 Subject: Partial update using solr 4.3 with csv input
 
 I was going through this link
 http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of
 the comments is about support for csv.
 
 Since the comment is almost a year old, just wondering if this is still
 true
 that, partial updates are possible only with xml and json input?
 
 Thanks,
 -M
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html
 Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html
Sent from the Solr - User mailing list archive at Nabble.com.


update solr.xml dynamically to add new cores

2013-06-19 Thread smanad
Hi, 
Is there a way to edit solr.xml as a part of debian package installation to
add new cores.
In my use case, there 4 solr indexes and they are managed/configured by
different teams. 
The way I am thinking packages will work is as described below,
1. There will be a solr-base debian package which comes with solr
installtion with tomcat setup (I am planning to use solr 4.3)
2. There will be individual index debian packages like, 
solr-index1, solr-index2 which will be dependent on solr-base. 
Each package's DEBIAN postinst script will have a logic to edit solr.xml to
add new index like index1, index2, etc.

Does this sound good? or is there a better/different way to do this?
Any pointers will be much appreciated.
Thanks, 
-M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html
Sent from the Solr - User mailing list archive at Nabble.com.


Partial update using solr 4.3 with csv input

2013-06-19 Thread smanad
I was going through this link 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of
the comments is about support for csv. 

Since the comment is almost a year old, just wondering if this is still true
that, partial updates are possible only with xml and json input?

Thanks, 
-M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html
Sent from the Solr - User mailing list archive at Nabble.com.


Need help with search in multiple indexes

2013-06-12 Thread smanad
Hi, 

I am thinking of using Solr to implement Search on our site. Here is my use
case, 
1. We will have multiple 4-5 indexes based on different data
types/structures and data will be indexed into these by several processes,
like cron, on demand, thru message queue applications, etc. 
2. A single web service needs to search across all these indexes and return
results. 

I am thinking of using Solr 4.2.1 or may be 4.3 with single instance -
multicore setup. 
I read about distributed search and I believe I should be able to search
across multiple indices using shards parameters. However in my case, all
shards will be on same host/port but with different core name. 

Is my understanding correct? Or is there any better alternative to this
approach?

Please suggest. 
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
Thanks for the reply Michael. 

In some cases schema is similar but not all of them. So lets go with
assumption schema NOT being similar.

I am not quite sure what you mean by you're probably stuck coordinating the
results externally.  Do you mean, searching in each index and then somehow
merge results manually? will I still be able to use shards parameters? or
no?

Also, I was planning to use php library SolrClient. Do you see any downside?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070049.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
Is this a limitation of solr/lucene, should I be considering using other
option like using Elasticsearch (which is also based on lucene)? 
But I am sure search in multiple indexes is kind of a common problem.

Also, i as reading this post
http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set
in one of the comments it says, 
So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with
fields documentId,fieldC,fieldD. Then I create another core, lets say Core3
with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be
importing data into this core? And then create a query handler, that
includes the shard parameter. So when I query Core3, it will never really
contain indexed data, but because of the shard searching it will fetch the
results from the other to cores, and present it on the 3rd core? Thanks
for the help! 

Is that what I should be doing? So all the indexing still happens in
separate cores but searching happens in a one single core?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with search in multiple indexes

2013-06-12 Thread smanad
In my case, different teams will be updating indexes at different intervals
so having separate cores gives more control. However, I can still
update(add/edit/delete) data with conditions like check for doc type.

Its just that, using shards sounds much cleaner and readable.

However, I am not yet sure if there might be any performance issues.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070061.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scheduling DataImports

2013-05-24 Thread smanad
Thanks for the reply. 

Regarding second question, actually thats what I am looking for. 

My use case is, my DIH runs for 2 httpdatasources, api1 and api2 with
different ttls returned. I was thinking of saving this in a file something
like, 
url:api1, timestamp:100, expires: 60
url:api2, timestamp:101, expires: 30

Then, a cron job will run every min to see what entries are expiring in next
30 secs? entry#2 will be expiring so it will re-index that entry by running
DIH curl command for that entity.

Is there a better of scheduling DIH imports automatically? 

I read about NRT, is that related to this problem at all?

Thanks, 
-M




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435p4065873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Scheduling DataImports

2013-05-22 Thread smanad
Hi, 

I am new to Solr and recently started exploring it for search/sort needs in
our webapp. 
I have couple of questions as below, (I am using solr 4.2.1 with default
core named collection1)
1. We have a use case where we would like to index data every 10 mins (avg).
Whats the best way to schedule data import every 10 mins or so? cron job?
2. Also, We are indexing data returned from an api which returns different
cache ttls. How can I re-index after ttl its expired? some process which
polls for the expiring soon entries and issues data-import command?

Any pointers will be much appreciated.
Thanks, 
-M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html
Sent from the Solr - User mailing list archive at Nabble.com.