More Number of characters???

2012-01-26 Thread Jörg Agatz
Ausgangssprache: Deutsch javascript:void(0)
Geben Sie Text oder eine Website-Adresse ein oder lassen Sie ein Dokument
übersetzen http://translate.google.de/?tr=fhl=de.
Abbrechen http://translate.google.de/?tr=thl=de
is it posible to get more Number of characters?

I have a problem with too many characters in the search, my Think Tank is
very long, but this also be the case.
Unfortunately I can not find a setting that is responsible.


Re: Currency field type

2012-01-26 Thread darul
Thank you Erik, I think about taking time to be more involved in solr
development.

In the meantime, I will choose to store prices and currency in a normalized
way.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Currency-field-type-tp3684682p3690076.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query for documents that have ONLY a certain value in a multivalued field

2012-01-26 Thread bilal dadanlar
I am having a similar problem and would appreciate any useful explanation
on this topic.
I couldn't find a way of querying for exact match in multivalued or
normal text fields

On Thu, Jan 26, 2012 at 3:14 AM, Garrett Conaty gcon...@gmail.com wrote:

 Does anyone know if there's a way using the SOLR query syntax to filter
 documents that have only a certain value in a multivalued field?  As an
 example if I have some field country that's multivalued and I want

 q=id:[* TO *]fq=country:brazil   where 'brazil' is the only value
 present.

 I've run through a few possibilities to do this, but I think it would be
 more common and a better solution would exist:

 1) On index creation time, aggregate my source data and create a
 count_country field that contains the number of terms in the country
 field.  Then the query would be q=id:[* TO
 *]fq=country:brazilfq=count_country=1

 2) In the search client, use the terms component to retrieve all terms for
 country and then do the exclusions in the client and construct the query
 as follows q=id:[* TO
 *]fq=country:brazilfq=-country:canadafq=-country:us   etc.

 3) Write a function query or similar that could capture the info.



 Thanks in advance,
 Garrett Conaty




-- 
Bilal Dadanlar


is it posible to get more Number of characters?

2012-01-26 Thread Jörg Agatz
is it posible to get more Number of characters?

I have a problem with too many characters in the search, my Think Tank is
very long, but this also be the case.
Unfortunately I can not find a setting that is responsible.


Re: is it posible to get more Number of characters?

2012-01-26 Thread Otis Gospodnetic
Hi Jörg,

Hmmm, do you mind rephrasing the question?

Otis 

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html 


- Original Message -
 From: Jörg Agatz joerg.ag...@googlemail.com
 To: solr-user@lucene.apache.org
 Cc: 
 Sent: Thursday, January 26, 2012 5:23 AM
 Subject: is it posible to get more Number of characters?
 
 is it posible to get more Number of characters?
 
 I have a problem with too many characters in the search, my Think 
 Tank is
 very long, but this also be the case.
 Unfortunately I can not find a setting that is responsible.



Re: Size of index to use shard

2012-01-26 Thread Dmitry Kan
@Erick:
Thanks for the detailed explanation. On this note, we have 75GB for *.fdt
and *.fdx out of 99GB index. The search is still not that fast, if cache
size is small. But giving more cache led to OOMs. Partitioning to shards is
not an option either, as at the moment we try to run as less machines as
possible.

@Vadim:
Thanks for the info! For the 6GB of heap size I assume you cache are not
that big? We had filterCache (used heavily compared to other cache types in
facet and non-facet queries according to our measurements) in the order of
20 thousand entries and heap size 22GB and observed OOM. So we decided to
lower the cache params down substantially.

Dmitry

On Tue, Jan 24, 2012 at 10:25 PM, Vadim Kisselmann 
v.kisselm...@googlemail.com wrote:

 @Erick
 thanks:)
 i´m with you with your opinion.
 my load tests show the same.

 @Dmitry
 my docs are small too, i think about 3-15KB per doc.
 i update my index all the time and i have an average of 20-50 requests
 per minute (20% facet queries, 80% large boolean queries with
 wildcard/fuzzy) . How much docs at a time= depends from choosed
 filters, from 10 to all 100Mio.
 I work with very small caches (strangely, but if my index is under
 100GB i need larger caches, over 100GB smaller caches..)
 My JVM has 6GB, 18GB for I/O.
 With few updates a day i would configure very big caches, like Tim
 Burton (see HathiTrust´s Blog)

 Regards Vadim



 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com:
  Thanks for the explanation Erick :)
 
  2012/1/24, Erick Erickson erickerick...@gmail.com:
  Talking about index size can be very misleading. Take
  a look at
 http://lucene.apache.org/java/3_5_0/fileformats.html#file-names.
  Note that the *.fdt and *.fdx files are used to for stored fields, i.e.
  the verbatim copy of data put in the index when you specify
  stored=true. These files have virtually no impact on search
  speed.
 
  So, if your *.fdx and *.fdt files are 90G out of a 100G index
  it is a much different thing than if these files are 10G out of
  a 100G index.
 
  And this doesn't even mention the peculiarities of your query mix.
  Nor does it say a thing about whether your cheapest alternative
  is to add more memory.
 
  Anderson's method is about the only reliable one, you just have
  to test with your index and real queries. At some point, you'll
  find your tipping point, typically when you come under memory
  pressure. And it's a balancing act between how much memory
  you allocate to the JVM and how much you leave for the op
  system.
 
  Bottom line: No hard and fast numbers. And you should periodically
  re-test the empirical numbers you *do* arrive at...
 
  Best
  Erick
 
  On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos
  anderson.v...@gmail.com wrote:
  Apparently, not so easy to determine when to break the content into
  pieces. I'll investigate further about the amount of documents, the
  size of each document and what kind of search is being used. It seems,
  I will have to do a load test to identify the cutoff point to begin
  using the strategy of shards.
 
  Thanks
 
  2012/1/24, Dmitry Kan dmitry@gmail.com:
  Hi,
 
  The article you gave mentions 13GB of index size. It is quite small
 index
  from our perspective. We have noticed, that at least solr 3.4 has some
  sort
  of choking point with respect to growing index size. It just becomes
  substantially slower than what we need (a query on avg taking more
 than
  3-4
  seconds) once index size crosses a magic level (about 80GB following
 our
  practical observations). We try to keep our indices at around 60-70GB
 for
  fast searches and above 100GB for slow ones. We also route majority of
  user
  queries to fast indices. Yes, caching may help, but not necessarily we
  can
  afford adding more RAM for bigger indices. BTW, our documents are very
  small, thus in 100GB index we can have around 200 mil. documents. It
  would
  be interesting to see, how you manage to ensure q-times under 1 sec
 with
  an
  index of 250GB? How many documents / facets do you ask max. at a time?
  FYI,
  we ask for a thousand of facets in one go.
 
  Regards,
  Dmitry
 
  On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann 
  v.kisselm...@googlemail.com wrote:
 
  Hi,
  it depends from your hardware.
  Read this:
 
 
 http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
  Think about your cache-config (few updates, big caches) and a good
  HW-infrastructure.
  In my case i can handle a 250GB index with 100mil. docs on a I7
  machine with RAID10 and 24GB RAM = q-times under 1 sec.
  Regards
  Vadim
 
 
 
  2012/1/24 Anderson vasconcelos anderson.v...@gmail.com:
   Hi
   Has some size of index (or number of docs) that is necessary to
 break
   the index in shards?
   I have a index with 100GB of size. This index increase 10GB per
 year.
   (I don't have information how many docs they have) and the docs
 never
   will be deleted.  Thinking in 30 

Re: Query for documents that have ONLY a certain value in a multivalued field

2012-01-26 Thread Garrett Conaty
Thought of another way to do this which will at least work for one field,
and that is by mapping all of the values into a simple string field and
then querying for an exact match in the string (one term).  This is similar
to having a 'count' field, but for our index creation process we could
reuse a string field we already had made (for sorting).  Still, I'd like to
see if the community has any other options from within Solr itself.


On Thu, Jan 26, 2012 at 2:05 AM, bilal dadanlar bi...@fizy.com wrote:

 I am having a similar problem and would appreciate any useful explanation
 on this topic.
 I couldn't find a way of querying for exact match in multivalued or
 normal text fields

 On Thu, Jan 26, 2012 at 3:14 AM, Garrett Conaty gcon...@gmail.com wrote:

  Does anyone know if there's a way using the SOLR query syntax to filter
  documents that have only a certain value in a multivalued field?  As an
  example if I have some field country that's multivalued and I want
 
  q=id:[* TO *]fq=country:brazil   where 'brazil' is the only value
  present.
 
  I've run through a few possibilities to do this, but I think it would be
  more common and a better solution would exist:
 
  1) On index creation time, aggregate my source data and create a
  count_country field that contains the number of terms in the country
  field.  Then the query would be q=id:[* TO
  *]fq=country:brazilfq=count_country=1
 
  2) In the search client, use the terms component to retrieve all terms
 for
  country and then do the exclusions in the client and construct the
 query
  as follows q=id:[* TO
  *]fq=country:brazilfq=-country:canadafq=-country:us   etc.
 
  3) Write a function query or similar that could capture the info.
 
 
 
  Thanks in advance,
  Garrett Conaty
 



 --
 Bilal Dadanlar



RE: Using multiple DirectSolrSpellcheckers for a query

2012-01-26 Thread Dyer, James
Nalini,

Right now the best you can do is to use copyField to combine everything into 
a catch-all for spellchecking purposes.  While this seems wasteful, this often 
has to be done anyhow because typically you'll need less/different analysis for 
spellchecking than for searching.  But rather than having separate copyFields 
to create multiple dictionaries, put everything into one field to create a 
single master dictionary.

From there, you need to set spellcheck.collate to true and also 
spellcheck.maxCollationTries greater than zero (5-10 usually works).  The 
first parameter tells it to generate re-written queries with spelling 
suggestions (collations).  The second parameter tells it to weed out any 
collations that won't generate hits if you re-query them.  This is important 
because having unrelated keywords in your master dictionary will increase the 
chances the spellchecker will pick the wrong words as corrections.

There is a significant caveat to this:  The spellchecker typically only 
suggests for words in the dictionary.  So by creating a huge, master dictionary 
you might find that many misspelled words won't generate suggestions.  See this 
thread for some workarounds:  
http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-td3658411.html
  

I think having multiple, per-field dictionaries as you suggest might be a good 
way to go.  While this is not supported, I don't think its because of 
performance concerns.  (There would be an overhead cost to this but I think it 
would still be practical).  It just hasn't been implemented yet.  But we might 
be getting to a possible start to this type of functionality.  In 
https://issues.apache.org/jira/browse/SOLR-2585 a separate spellchecker is 
added that just corrects wordbreak (or is it word break?) problems, then a 
ConjunctionSolrSpellChecker combines the results from the main spellchecker 
and the wordbreak spellcheker.  I could see a next step beyond this being to 
support per-field dictionaries, checking them separately, then combining the 
results.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Wednesday, January 25, 2012 11:56 AM
To: solr-user@lucene.apache.org
Subject: Using multiple DirectSolrSpellcheckers for a query

Hi,

We are trying to use the DirectSolrSpellChecker to get corrections for
mis-spelled query terms directly from fields in the Solr index.

However, we need to use multiple fields for spellchecking a query. It looks
looks like you can only use one spellchecker for a request and so the
workaround for this it to create a copy field from the fields required for
spell correction?

We'd like to avoid this because we allow users to perform different kinds
of queries on different sets of fields and so to provide meaningful
corrections we'd have to create multiple copy fields - one for each query
type.

Is there any reason why Solr doesn't support using multiple spellcheckers
for a query? Is it because of performance overhead?

Thanks,
Nalini


Re: is it posible to get more Number of characters?

2012-01-26 Thread Jörg Agatz
no,

i have a lot of charecter in my url.. it looks like a stop at xyz
charakters, so i hope to find a way use mor character






On Thu, Jan 26, 2012 at 3:11 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Hi Jörg,

 Hmmm, do you mind rephrasing the question?

 Otis
 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html


 - Original Message -
  From: Jörg Agatz joerg.ag...@googlemail.com
  To: solr-user@lucene.apache.org
  Cc:
  Sent: Thursday, January 26, 2012 5:23 AM
  Subject: is it posible to get more Number of characters?
 
  is it posible to get more Number of characters?
 
  I have a problem with too many characters in the search, my Think
  Tank is
  very long, but this also be the case.
  Unfortunately I can not find a setting that is responsible.
 



decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
Hello Folks,
i want to decrease the max. number of terms for my fields to 500.
I thought what the maxFieldLength parameter in solrconfig.xml is
intended for this.
In my case it doesn't work.

The half of my text fields includes longer text(about 1 words).
With 100 docs in my index i had an segment size of 1140KB for indexed
data and 270KB for stored data (.fdx, .fdt).
After a change from default maxFieldLength1/maxFieldLength to
maxFieldLength500/maxFieldLength,
delete(index folder), restarting Tomcat and reindex, i see the same
segment sizes (1140KB for indexed and 270KB for stored data).

Please tell me if I made an error in reasoning.

Regards
Vadim


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
P.S.:
i use Solr 4.0 from trunk.
Is maxFieldLength deprecated in Solr 4.0 ?
If so, do i have an alternative to decrease the number of terms during indexing?
Regards
Vadim



2012/1/26 Vadim Kisselmann v.kisselm...@googlemail.com:
 Hello Folks,
 i want to decrease the max. number of terms for my fields to 500.
 I thought what the maxFieldLength parameter in solrconfig.xml is
 intended for this.
 In my case it doesn't work.

 The half of my text fields includes longer text(about 1 words).
 With 100 docs in my index i had an segment size of 1140KB for indexed
 data and 270KB for stored data (.fdx, .fdt).
 After a change from default maxFieldLength1/maxFieldLength to
 maxFieldLength500/maxFieldLength,
 delete(index folder), restarting Tomcat and reindex, i see the same
 segment sizes (1140KB for indexed and 270KB for stored data).

 Please tell me if I made an error in reasoning.

 Regards
 Vadim


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Ahmet Arslan
 i want to decrease the max. number of terms for my fields to
 500.
 I thought what the maxFieldLength parameter in
 solrconfig.xml is
 intended for this.
 In my case it doesn't work.
 
 The half of my text fields includes longer text(about 1
 words).
 With 100 docs in my index i had an segment size of 1140KB
 for indexed
 data and 270KB for stored data (.fdx, .fdt).
 After a change from default
 maxFieldLength1/maxFieldLength to
 maxFieldLength500/maxFieldLength,
 delete(index folder), restarting Tomcat and reindex, i see
 the same
 segment sizes (1140KB for indexed and 270KB for stored
 data).
 
 Please tell me if I made an error in reasoning.

What version of solr are you using?

Could it be 
http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html?

http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Sean Adams-Hiett
Vadim,

Is it possible that your solrconfig.xml is using maxFieldLength in both the
indexDefaults and mainIndex?

If so the mainIndex config overwrites the other.  See this issue:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html

Sean

On Thu, Jan 26, 2012 at 10:15 AM, Vadim Kisselmann 
v.kisselm...@googlemail.com wrote:

 P.S.:
 i use Solr 4.0 from trunk.
 Is maxFieldLength deprecated in Solr 4.0 ?
 If so, do i have an alternative to decrease the number of terms during
 indexing?
 Regards
 Vadim



 2012/1/26 Vadim Kisselmann v.kisselm...@googlemail.com:
  Hello Folks,
  i want to decrease the max. number of terms for my fields to 500.
  I thought what the maxFieldLength parameter in solrconfig.xml is
  intended for this.
  In my case it doesn't work.
 
  The half of my text fields includes longer text(about 1 words).
  With 100 docs in my index i had an segment size of 1140KB for indexed
  data and 270KB for stored data (.fdx, .fdt).
  After a change from default maxFieldLength1/maxFieldLength to
  maxFieldLength500/maxFieldLength,
  delete(index folder), restarting Tomcat and reindex, i see the same
  segment sizes (1140KB for indexed and 270KB for stored data).
 
  Please tell me if I made an error in reasoning.
 
  Regards
  Vadim




-- 
Sean Adams-Hiett
Owner, Web Geeks For Hire
phone: (361) 433.5748
email: s...@webgeeksforhire.com
web: www.webgeeksforhire.com
twitter: @geekbusiness http://twitter.com/geekbusiness


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
Sean, Ahmet,
thanks for response:)

I use Solr 4.0 from trunk.
In my solrconfig.xml is only one maxFieldLength param.
I think it is deprecated in Solr Versions 3.5+...

But LimitTokenCountFilterFactory works in my case :)
Thanks!

Regards
Vadim



2012/1/26 Ahmet Arslan iori...@yahoo.com:
 i want to decrease the max. number of terms for my fields to
 500.
 I thought what the maxFieldLength parameter in
 solrconfig.xml is
 intended for this.
 In my case it doesn't work.

 The half of my text fields includes longer text(about 1
 words).
 With 100 docs in my index i had an segment size of 1140KB
 for indexed
 data and 270KB for stored data (.fdx, .fdt).
 After a change from default
 maxFieldLength1/maxFieldLength to
 maxFieldLength500/maxFieldLength,
 delete(index folder), restarting Tomcat and reindex, i see
 the same
 segment sizes (1140KB for indexed and 270KB for stored
 data).

 Please tell me if I made an error in reasoning.

 What version of solr are you using?

 Could it be 
 http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html?

 http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html


Solr Join query with fq not correctly filtering results?

2012-01-26 Thread Mike Hugo
Hello,

I'm trying out the Solr JOIN query functionality on trunk.  I have the
latest checkout, revision #1236272 - I did the following steps to get the
example up and running:

cd solr
ant example
java -jar start.jar
cd exampledocs
java -jar post.jar *.xml

Then I tried a few of the sample queries on the wiki page
http://wiki.apache.org/solr/Join.  In particular, this is one that I'm
interest in

Find all manufacturer docs named belkin, then join them against (product)
 docs and filter that list to only products with a price less than 12 dollars

 http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkinfq=price:%5B%2A+TO+12%5D


However, when I run that query, I get two results, one with a price of
19.95 and another with a price of 11.5  Because of the filter query, I'm
only expecting to see one result - the one with a price of 11.99.

I was also able to replicate this in a unit test added to
org.apache.solr.TestJoin:

  @Test
  public void testJoin_withFilterQuery() throws Exception {
assertU(add(doc(id, 1,name, john, title, Director,
dept_s,Engineering)));
assertU(add(doc(id, 2,name, mark, title, VP,
dept_s,Marketing)));
assertU(add(doc(id, 3,name, nancy, title, MTS,
dept_s,Sales)));
assertU(add(doc(id, 4,name, dave, title, MTS,
dept_s,Support, dept_s,Engineering)));
assertU(add(doc(id, 5,name, tina, title, VP,
dept_s,Engineering)));

assertU(add(doc(id,10, dept_id_s, Engineering, text,These
guys develop stuff)));
assertU(add(doc(id,11, dept_id_s, Marketing, text,These guys
make you look good)));
assertU(add(doc(id,12, dept_id_s, Sales, text,These guys
sell stuff)));
assertU(add(doc(id,13, dept_id_s, Support, text,These guys
help customers)));

assertU(commit());

//***
//This works as expected - the correct number of results are found
//***
// find people that develop stuff
assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop,
fl,id)

,/response=={'numFound':3,'start':0,'docs':[{'id':'1'},{'id':'4'},{'id':'5'}]}
);

*//
*// this fails - the response returned finds all three people - it
should only find John*
*//expected =/response=={numFound:1,start:0,docs:[{id:1}]}
*
*//response = {*
*//responseHeader:{*
*//  status:0,*
*//  QTime:4},*
*//response:{numFound:3,start:0,docs:[*
*//  {*
*//id:1},*
*//  {*
*//id:4},*
*//  {*
*//id:5}]*
*//}}*
*//
*// find people that develop stuff - but limit via filter query to a
name of john*
*assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop,
fl,id, fq, name:john)*
*,/response=={'numFound':1,'start':0,'docs':[{'id':'1'}]}*
*);*

  }


Interestingly, I know this worked at some point.  I had a snapshot build in
my ivy cache from 10/2/2011 and it was working with that
build maven_artifacts/org/apache/solr/
solr/4.0-SNAPSHOT/solr-4.0-20111002.161157-1.pom


Mike


Solr and TF-IDF

2012-01-26 Thread Nejla Karacan
Hey there,

I'm using Solr for my thesis, where I have to implement a content-based
recommender system for movies.

I have indexed about 20thousand movies with their informations:
movie-id
title
genre
plot/movie-description - !!!
cast

I've enabled the TermvektorComponent for the fields genre, description and
cast.
So I can get the tf-idf-values for the terms of every movie.

With these term-TfIdfValue-couples I have to compute the similarities
between movies by using the cosine similarity.
I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
solution, I have to
implement the CosineSimilarity in java myself.

Now I have some problems/questions:
I get the responses in XML-format, which I read out with an XML-reader in
Java,
where it wriggle trough every child-node in order to reach the right node.
Is there a better way, to get these values in Node-Attributes or node-texts?
I have tried it with wt=csv but for the requests I get
responses only with the Movie-ID's, nothing more.
By XML-responseWriter my request is for example this:
http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=true
I get the right response with all terms and tf-tdf's - in xml.

And if I add csv-notation
http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=truewt=csv
I get only this:
id
1800180382

Maybe my request is wrong?

Another problem is, if I get the terms and their tfidf-values, I store
them in a map.
But there isn't a succession in the values. I want e.g. store only the 10
chief terms,
so 10 terms with the highest tfidf-values. Can I sort them in a descending
succession?
I haven't find anything therefor. If its not possible, I must sort them
later in the map.

My last question is:
any movie has a genre - often more than one.
Its like the cat-field (category) in the exampledocs with ipod/monitor
etc. and its an important pointfor the movies.
How can I integrate this factor?
I changed the boost-attribute in the Solr-Xml-Schema like this:
field name=genre type=string indexed=true stored=true
multiValued=true omitNorms=false boost=3 termVectors=true
termPositions=true termOffsets=true/
Is that enough or is there any other possibility?

Perhaps you see, that I am a beginner in Solr,
at the beginning a few weeks ago it was even more difficult for me but now
it goes better.
I would be very grateful for any help, ideas, tips or suggestions!

Many regards
Nejla



Re: Solr and TF-IDF

2012-01-26 Thread Walter Underwood
Why are you using a search engine to build a recomender? None of the leading 
teams in the Netflix Prize used search engines as a base technology.

Start with the recommender algorithms in Mahout: http://mahout.apache.org/

wunder

On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote:

 Hey there,
 
 I'm using Solr for my thesis, where I have to implement a content-based
 recommender system for movies.
 
 I have indexed about 20thousand movies with their informations:
 movie-id
 title
 genre
 plot/movie-description - !!!
 cast
 
 I've enabled the TermvektorComponent for the fields genre, description and
 cast.
 So I can get the tf-idf-values for the terms of every movie.
 
 With these term-TfIdfValue-couples I have to compute the similarities
 between movies by using the cosine similarity.
 I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
 solution, I have to
 implement the CosineSimilarity in java myself.
 
 Now I have some problems/questions:
 I get the responses in XML-format, which I read out with an XML-reader in
 Java,
 where it wriggle trough every child-node in order to reach the right node.
 Is there a better way, to get these values in Node-Attributes or node-texts?
 I have tried it with wt=csv but for the requests I get
 responses only with the Movie-ID's, nothing more.
 By XML-responseWriter my request is for example this:
 http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=true
 I get the right response with all terms and tf-tdf's - in xml.
 
 And if I add csv-notation
 http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=truewt=csv
 I get only this:
 id
 1800180382
 
 Maybe my request is wrong?
 
 Another problem is, if I get the terms and their tfidf-values, I store
 them in a map.
 But there isn't a succession in the values. I want e.g. store only the 10
 chief terms,
 so 10 terms with the highest tfidf-values. Can I sort them in a descending
 succession?
 I haven't find anything therefor. If its not possible, I must sort them
 later in the map.
 
 My last question is:
 any movie has a genre - often more than one.
 Its like the cat-field (category) in the exampledocs with ipod/monitor
 etc. and its an important pointfor the movies.
 How can I integrate this factor?
 I changed the boost-attribute in the Solr-Xml-Schema like this:
 field name=genre type=string indexed=true stored=true
 multiValued=true omitNorms=false boost=3 termVectors=true
 termPositions=true termOffsets=true/
 Is that enough or is there any other possibility?
 
 Perhaps you see, that I am a beginner in Solr,
 at the beginning a few weeks ago it was even more difficult for me but now
 it goes better.
 I would be very grateful for any help, ideas, tips or suggestions!
 
 Many regards
 Nejla
 





Re: is it posible to get more Number of characters?

2012-01-26 Thread Erick Erickson
You still haven't given us much to go on. It would be helpful
to give some sample inputs, what you see when you query
(the output after adding debugQuery=on is helpful), and the
fieldType definition from schema.xml for the field in question.

You might also try looking at the admin/analysis page to see
how your analysis chain breaks up the incoming stream
into tokens, that's often helpful

Best
Erick

On Thu, Jan 26, 2012 at 7:24 AM, Jörg Agatz joerg.ag...@googlemail.com wrote:
 no,

 i have a lot of charecter in my url.. it looks like a stop at xyz
 charakters, so i hope to find a way use mor character






 On Thu, Jan 26, 2012 at 3:11 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

 Hi Jörg,

 Hmmm, do you mind rephrasing the question?

 Otis
 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html


 - Original Message -
  From: Jörg Agatz joerg.ag...@googlemail.com
  To: solr-user@lucene.apache.org
  Cc:
  Sent: Thursday, January 26, 2012 5:23 AM
  Subject: is it posible to get more Number of characters?
 
  is it posible to get more Number of characters?
 
  I have a problem with too many characters in the search, my Think
  Tank is
  very long, but this also be the case.
  Unfortunately I can not find a setting that is responsible.
 



Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Erick Erickson
Nothing jumps out at me, but you might you might get some insight
from http://wiki.apache.org/solr/DataImportHandler, see the
interactive development mode section. The dataimport.jsp
page is helpful.

It *looks* like you're sql statement is having problems, but
I confess I only glanced at the output...

Best
Erick

On Wed, Jan 25, 2012 at 2:17 PM, Egonsith egons...@gmail.com wrote:
 I have tried to search for my specific problem but have not found solution. I
 have also read the wiki on the DHI and seem to have everything set up right
 but my Query still fails. Thank you for your help

 I am running Solr 3.1 with Tomcat 6.0
 Windows server 2003 r2 and SQL 2008

 I have the sqljdbc4.jar sitting in C:\Program Files\Apache Software
 Foundation\Tomcat 6.0\lib

 /My solrconfig.xml/
 - requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 - lst name=defaults
  str name=configdb-data-config.xml/str
  /lst
  /requestHandler

 /My db-data-config.xml/
 - dataConfig
  dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=://localhost:1433;DatabaseName=KnowledgeBase_DM user=user
 password=password /
 - document
 - entity dataSource=ds1 name=Titles query=SELECT mrID, mrTitle from
 KnowledgeBase_DM.dbo.AskMe_Data
  field column=mrID name=id /
  field column=mrTitle name=title /
 - entity name=Desc query=select meDescription from
 KnowledgeBase_DM.dbo.AskMe_Data
  field column=meDescription name=description /
  /entity
  /entity
  /document
  /dataConfig


 /My logfile Output /
 Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImportHandler
 processConfiguration
 INFO: Processing configuration from solrconfig.xml:
 {config=db-data-config.xml}
 Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter
 loadDataConfig
 INFO: Data Configuration loaded successfully
 Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 INFO: Starting Full Import
 Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.SolrWriter
 readIndexerProperties
 *WARNING: Unable to read: dataimport.properties*
 Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 call
 INFO: Creating a connection for entity Titles with URL:
 ://localhost:1433;DatabaseName=KnowledgeBase_DM
 Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
 call
 INFO: Time taken for getConnection(): 0
 Jan 25, 2012 2:17:37 PM org.apache.solr.common.SolrException log
 *SEVERE: Exception while processing: Titles document :
 SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to execute query: SELECT mrID, mrTitle from
 KnowledgeBase_DM.dbo.AskMe_Data Processing Document # 1*
        at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
        at
 org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:188)
        at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
        at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
        at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
        at
 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:205)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        

ord/rord with a function

2012-01-26 Thread entdeveloper
Is it possible for ord/rord to work with a function? I'm attempting to use
rord with a spatial function like the following as a bf:

bf=rord(geodist())

If there's no way for this to work, is there a way to simulate the same
behavior?

For some background, I have two sets of documents: one set applies to a
location in NY and another in LA. I want to boost documents that are closer
to where the user is searching from. But I only need these sets to be ranked
1  2. In other words, the actual distance should not be used to boost the
documents, just if you are closer or farther. We may add more locations in
the future, so I'd like to be able to rank the locations from closest to
furthest.

I need some way to rank the distances, and rord is the right idea, but
doesn't seem to work with functions.

I'm running Solr 3.4, btw.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/ord-rord-with-a-function-tp3691138p3691138.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Advice - evaluating Solr for categorization keyword search

2012-01-26 Thread Erick Erickson
See below...

On Wed, Jan 25, 2012 at 2:38 PM, Becky Neil be...@lovemachineinc.com wrote:
 Hi all,
 I've been tasked with evaluating whether Solr is the right solution for my
 company's search needs.  If this isn't the right forum for this kind of
 question, please let me know where to go instead!

 We are currently using sql queries to find mysql db results that match a
 single keyword in one short text field, so our search is pretty crude.

Be a little careful here. Often, when people come from a DB background
they think in terms of normalized data. If each of your tables is
independent of all other tables, then the simple map the rows into
documents approach works. More likely, you'll combine bits from
several tables into each Solr document and your reflexive distaste
for de-normalizing data will trip you up. Get over it G..

 What we hope that Solr can do initially is:
 1 enable more flexible search (booleans, more than one field
 searched/matched, etc)
This is OOB functionality. But do note that Solr/Lucene query
parsing is not a true boolean process, see:
http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

 2 live search results (eg new records get added to the index upon creation)
As you indicated below, you'd need some process that noticed that
your DB changed and then indexed the changed records. Once the
records are indexed, Solr will pick up the changes automatically
but you have to control the indexing process from outside.

 3 search rankings (eg most relevant - least relevant)
OOB functionality with lots of knobs to turn for tuning. See
edismax

 4 categorize our db (take records and at least group them, better if it
 could assign a label to each record)
Depending on what the details are here, this may be OOB. See
faceting and grouping/field collapsing. See:
http://wiki.apache.org/solr/SolrFacetingOverview
http://wiki.apache.org/solr/FieldCollapsing

 5 locate nearby results (geospatial search)
OOB, although you need to store the lat/lon. See:
http://wiki.apache.org/solr/SpatialSearch

 What I hope you can advise on is:
 A How would you go about #2 - making sure that new documents are
 added/indexed asap, based on a new rows to the db? Is that as simple as a
 setting in Solr, or does it take some coding (eg a listener object, a kron
 job, etc.).  I tried looking at the wiki  tutorial but wasn't able to find
 answers - I couldn't make sense of how to use UpdateRequestProcessor to do
 it. (http://wiki.apache.org/solr/UpdateRequestProcessor)
What you'll be doing here is either using Data Import Handler or
SolrJ (Java client) to push solr documents into Solr. This is
straight-forward once you know the magic. A trivial SolrJ program
that indexes documents from a DB is maybe 100 lines, including
imports. It *uses* the updatehandler, but you don't see that, you see
something like solrServer.add(ListOfSolrInputDocuments);

 B What's the status of document clustering? The wiki says it's not been
 fully implemented. Would we be able to achieve any of #4 yet? If not, what
 else should we consider?
I don't think you're really thinking about document clustering here. I suspect
that grouping and/or faceting will be where you start. At least I'd look at
that first although clustering may be exactly what you want. Half the battle
is learning the right vocabulary G

 C Would you use Solr over say Google Maps api to run location aware
 searches?
*shrugs*

 D How long should we expect it to take to configure Solr on our servers
 with our db, get the initial index set up, and enable live search results?
  Are we talking one week, or one month? Our db is not tiny, but it's not
 huge - say around 8k records in each of ~20 tables. Most tables have around
 10 fields, including at least one large text field and then a variety of
 dates, numbers, and small text.
Too many variables for you to count on this estimate, but:
*If* you can use Data Import Handler and starting from scratch, probably a week.
Someone who already knows Solr maybe a day. But whenever I start something
new, I usually chase a number of blind alleys.

Once set up, indexing your entire corpus will probably be a matter of
less than an hour (and I'm being quite conservative here. On my laptop,
Solr can index 7K documents/second from the English wiki dump). But
at times the database connection is the limiting factor

By the way, I recommend that if DIH starts getting hard to use, especially
due to the relationships between tables, consider jumping to SolrJ earlier
rather than later.

Your index size is pretty small by Solr standards, so you probably won't have
to shard or do some of the other complex kinds of things that come up when
you have lots of data.

Note that this is *just* for setting up Solr and being able to query
through, say,
the admin page. It does not exclude all the work for the UI you'll need to front
the app. Count on tweaking your configuration files (e.g. schema.xml and
solrconfig.xml) and 

Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
I'm on a project where we have 1B docs sharded across 20 servers. We're not
in production yet and we're doing load tests now. We're sending load to hit
100qps per server. As the load increases we're seeing query times
sporadically increasing to 10 seconds, 20 seconds, etc. at times. What
we're trying to do is set a shard timeout so that responses longer than 2
seconds are discarded. We can live with less results in these cases. We're
not replicating yet as we want to see how the 20 shards perform first (plus
we're waiting on the massive amount of hardware)

I've tried setting the following config in our default req. handler:
int name=shard-socket-timeout2000/int
int name=shard-connection-timeout2000/int

I've just added these, and am testing now, but this doesn't look promising
either:
int name=timeAllowed2000/int
bool name=partialResultstrue/bool

Couldn't find much on the wiki about these params - I'm looking for more
details about how these work. I'll be happy to update the wiki with more
details based on the discussion here.

Any details about exactly how I can achieve my goal of timing out and
disregarding queries longer that 2 seconds would be greatly appreciated.

The index is insanely lean - no stored fields, no norms, no stop words,
etc. RAM buffer is 128, and we're using the standard search req. handler.
Essentially we're running Solr as a nosql data store, which suits this
project, but we need responses to be no longer than 2 seconds at the max.

Thanks,
-Jay


social123 Data Appending Service

2012-01-26 Thread Aaron Biddar
Hi there-

I was on your site today and was not sure who to reach out to.  My Company,
Social123, provides Social Data Appending for companies that provide
lists.  In a nutshell, we add Facebook, LinkedIn and Twitter contact
information to your current lists. Its a great way to easily offer a new
service or add on to your current offerings.  Providing social media
contact information to your customers will allow them to interact with
their customers on a whole new level.

If you are the right person to speak with, please let me know your
availability for a quick 5-minute demo or check out our tour at
www.social123.com.  If you are not the right person, would you mind passing
this e-mail along?

Thanks in advance.

-- 
Aaron Biddar
Founder, CEO
aaron.bid...@social123.com
www.social123.com
78 Alexander St. #K  Charleston SC 29403
M  678 925 3556   P 800.505.7295 ex101


Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Egonsith
Erik, 

Thanks for the reply, 
i a bit embarres to say this is a clasiic example of a way to messy
development enviroment and these erros were due to many diffrent drivers and
xml files that were edited way to many times. i have cleaned my dev
enviromant and reinstalled tomcat and solr and am now getting past this
error. thank you for the help.

Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/WARNING-Unable-to-read-dataimport-properties-DHI-issue-tp3689183p3691278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: social123 Data Appending Service

2012-01-26 Thread Geert-Jan Brits
No thanks, not sure which site you're talking about btw.
But anyway, no thanks


Op 26 januari 2012 19:41 schreef Aaron Biddar
aaron.bid...@social123.comhet volgende:

 Hi there-

 I was on your site today and was not sure who to reach out to.  My Company,
 Social123, provides Social Data Appending for companies that provide
 lists.  In a nutshell, we add Facebook, LinkedIn and Twitter contact
 information to your current lists. Its a great way to easily offer a new
 service or add on to your current offerings.  Providing social media
 contact information to your customers will allow them to interact with
 their customers on a whole new level.

 If you are the right person to speak with, please let me know your
 availability for a quick 5-minute demo or check out our tour at
 www.social123.com.  If you are not the right person, would you mind
 passing
 this e-mail along?

 Thanks in advance.

 --
 Aaron Biddar
 Founder, CEO
 aaron.bid...@social123.com
 www.social123.com
 78 Alexander St. #K  Charleston SC 29403
 M  678 925 3556   P 800.505.7295 ex101



Re: Solr 3.5.0 can't find Carrot classes

2012-01-26 Thread Stanislaw Osinski
Hi,

Can you paste the logs from the second run?

Thanks,

Staszek

On Wed, Jan 25, 2012 at 00:12, Christopher J. Bottaro cjbott...@onespot.com
 wrote:

 On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote:
  SEVERE: java.lang.NoClassDefFoundError:
 org/carrot2/core/ControllerFactory
  at
 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.init(CarrotClusteringEngine.java:102)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
 Source)
  at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
  at java.lang.reflect.Constructor.newInstance(Unknown Source)
  at java.lang.Class.newInstance0(Unknown Source)
  at java.lang.Class.newInstance(Unknown Source)
 
  …
 
  I'm starting Solr with -Dsolr.clustering.enabled=true and I can see that
 the Carrot jars in contrib are getting loaded.
 
  Full log file is here:
 http://onespot-development.s3.amazonaws.com/solr.log
 
  Any ideas?  Thanks for the help.
 
 Ok, got a little further.  Seems that Solr doesn't like it if you include
 jars more than once (I had a lib dir and also lib directives in the
 solrconfig which ended up loading the same jars twice).

 But now I'm getting these errors:  java.lang.NoClassDefFoundError:
 org/apache/solr/handler/clustering/SearchClusteringEngine

 Any help?  Thanks.


Re: Solr and TF-IDF

2012-01-26 Thread Lee Carroll
content-based recommender  so its not CF etc
and its a project so its whatever his supervisor wants.

take a look at solrj should be more natural to integrate your java code with.

(Although not sure if it supports termv ector comp)

good luck



On 26 January 2012 17:27, Walter Underwood wun...@wunderwood.org wrote:
 Why are you using a search engine to build a recomender? None of the leading 
 teams in the Netflix Prize used search engines as a base technology.

 Start with the recommender algorithms in Mahout: http://mahout.apache.org/

 wunder

 On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote:

 Hey there,

 I'm using Solr for my thesis, where I have to implement a content-based
 recommender system for movies.

 I have indexed about 20thousand movies with their informations:
 movie-id
 title
 genre
 plot/movie-description - !!!
 cast

 I've enabled the TermvektorComponent for the fields genre, description and
 cast.
 So I can get the tf-idf-values for the terms of every movie.

 With these term-TfIdfValue-couples I have to compute the similarities
 between movies by using the cosine similarity.
 I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
 solution, I have to
 implement the CosineSimilarity in java myself.

 Now I have some problems/questions:
 I get the responses in XML-format, which I read out with an XML-reader in
 Java,
 where it wriggle trough every child-node in order to reach the right node.
 Is there a better way, to get these values in Node-Attributes or node-texts?
 I have tried it with wt=csv but for the requests I get
 responses only with the Movie-ID's, nothing more.
 By XML-responseWriter my request is for example this:
 http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=true
 I get the right response with all terms and tf-tdf's - in xml.

 And if I add csv-notation
 http://localhost:8983/solr/select/?qt=tvrhq=id:1800180382fl=idtv.tf_idf=truewt=csv
 I get only this:
 id
 1800180382

 Maybe my request is wrong?

 Another problem is, if I get the terms and their tfidf-values, I store
 them in a map.
 But there isn't a succession in the values. I want e.g. store only the 10
 chief terms,
 so 10 terms with the highest tfidf-values. Can I sort them in a descending
 succession?
 I haven't find anything therefor. If its not possible, I must sort them
 later in the map.

 My last question is:
 any movie has a genre - often more than one.
 Its like the cat-field (category) in the exampledocs with ipod/monitor
 etc. and its an important pointfor the movies.
 How can I integrate this factor?
 I changed the boost-attribute in the Solr-Xml-Schema like this:
 field name=genre type=string indexed=true stored=true
 multiValued=true omitNorms=false boost=3 termVectors=true
 termPositions=true termOffsets=true/
 Is that enough or is there any other possibility?

 Perhaps you see, that I am a beginner in Solr,
 at the beginning a few weeks ago it was even more difficult for me but now
 it goes better.
 I would be very grateful for any help, ideas, tips or suggestions!

 Many regards
 Nejla






solr shards

2012-01-26 Thread ramin
Hello,

I've gone through the list and have not found the answer but if it is a
repetitive question, my apologies.

I have a 3x shards solr cluster. If i send a query to each of the shards
individually I get the result with a list of relevant docs. However, if i
send the query to the main solr server (dispatcher) it only returns the
value for numFound but there is no list of docs. Since i seem to be the only
one having this issue, it is probably a misconfiguration for which i
couldn't find an answer in the documentations. Can someone please help?

Also, the sum of all the individual numFound's seems to not match the
numFound I get from the main solr server, given that i do not have any
duplicate on the unique key.

Thanks in advance,
Ramin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-shards-tp3691370p3691370.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Erick Erickson
Yeah, that happens. Glad you're past this issue thanks for closing it out.

Erick

On Thu, Jan 26, 2012 at 10:45 AM, Egonsith egons...@gmail.com wrote:
 Erik,

 Thanks for the reply,
 i a bit embarres to say this is a clasiic example of a way to messy
 development enviroment and these erros were due to many diffrent drivers and
 xml files that were edited way to many times. i have cleaned my dev
 enviromant and reinstalled tomcat and solr and am now getting past this
 error. thank you for the help.

 Mike

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/WARNING-Unable-to-read-dataimport-properties-DHI-issue-tp3689183p3691278.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread Nitin Arora
Hi,

We are using SOLR/Lucene to index/search the data about the user's of an
organization. The nature of data is brief information about the user's work.
Our data index requirement is to have segregated stores for each
organization and currently we have 10 organizations and we have to run 10
different instances of SOLR to serve search results for an organization. As
the new organizations are joining it is getting difficult to manage these
many instances.

I think now there is a need to use 1 SOLR instance and then have 10/multiple
different data directories for each organization. 

When index/search request is received in SOLR we decide the data directory
based on the organization.

1. Is it possible to do the same in SOLR and how can we achieve the 
same?
2. Will it be a good design to use SOLR like this?
3. Is there any impact on the scalability if we are able to manage the
separate data directories inside SOLR?

Thanks in advance

Nitin


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re:Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread wakemaster 39
I wish I had the link for you but it sounds like you are looking to use
solr cores.  They are separate indexes all under one solr instance. Check
out solr 3.5 example as I believe cores are now used and suggested as the
default configuration even if you only want to use one core.

Cameron
On Jan 26, 2012 4:18 PM, Nitin Arora aro_ni...@yahoo.com wrote:

 Hi,

 We are using SOLR/Lucene to index/search the data about the user's of an
 organization. The nature of data is brief information about the user's
 work.
 Our data index requirement is to have segregated stores for each
 organization and currently we have 10 organizations and we have to run 10
 different instances of SOLR to serve search results for an organization. As
 the new organizations are joining it is getting difficult to manage these
 many instances.

 I think now there is a need to use 1 SOLR instance and then have
 10/multiple
 different data directories for each organization.

 When index/search request is received in SOLR we decide the data directory
 based on the organization.

1. Is it possible to do the same in SOLR and how can we achieve the
 same?
2. Will it be a good design to use SOLR like this?
3. Is there any impact on the scalability if we are able to manage
 the
 separate data directories inside SOLR?

 Thanks in advance

 Nitin


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread David Radunz

Hey,

Sounds like what you need to setup is Multiple Cores 
configuration. At first I confused this with Multi Core CPU, but 
that's not what it's about. Basically it's a way to run multiple 'solr' 
cores/indexes/configurations from a single Solr instance (which will 
scale better as the resources will be shared). Have a read anyway: 
http://wiki.apache.org/solr/CoreAdmin


Cheers,

David

On 27/01/2012 8:18 AM, Nitin Arora wrote:

Hi,

We are using SOLR/Lucene to index/search the data about the user's of an
organization. The nature of data is brief information about the user's work.
Our data index requirement is to have segregated stores for each
organization and currently we have 10 organizations and we have to run 10
different instances of SOLR to serve search results for an organization. As
the new organizations are joining it is getting difficult to manage these
many instances.

I think now there is a need to use 1 SOLR instance and then have 10/multiple
different data directories for each organization.

When index/search request is received in SOLR we decide the data directory
based on the organization.

1. Is it possible to do the same in SOLR and how can we achieve the 
same?
2. Will it be a good design to use SOLR like this?
3. Is there any impact on the scalability if we are able to manage the
separate data directories inside SOLR?

Thanks in advance

Nitin


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Mark Miller

On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:

 
 I've tried setting the following config in our default req. handler:
 int name=shard-socket-timeout2000/int
 int name=shard-connection-timeout2000/int
 


What version are you using Jay? At least on trunk, I took a look and it appears 
at some point these where renamed to socketTimeout and connTimeout.

What about a timeout on your clients?

- Mark Miller
lucidimagination.com



Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
We're on the trunk:
4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47

Client timeouts are set to 4 seconds.

Thanks,
-Jay

On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller markrmil...@gmail.com wrote:


 On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:

 
  I've tried setting the following config in our default req. handler:
  int name=shard-socket-timeout2000/int
  int name=shard-connection-timeout2000/int
 


 What version are you using Jay? At least on trunk, I took a look and it
 appears at some point these where renamed to socketTimeout and connTimeout.

 What about a timeout on your clients?

 - Mark Miller
 lucidimagination.com




Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
i'm changing the params to socketTimeout and connTimeout and will test this
afternoon. client timeout was actually removed today, which helped a bit.

what about the other params, timeAllowed and partialResults. my
expectation was that these were specifically for distributed search,
meaning if a response wasn't received w/in the timeAllowed, and if
partialResults is true, then that shard would not be waited on for results.
is that correct?

thanks,
-jay


On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill jayallenh...@gmail.com wrote:

 We're on the trunk:
 4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47

 Client timeouts are set to 4 seconds.

 Thanks,
 -Jay


 On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller markrmil...@gmail.comwrote:


 On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:

 
  I've tried setting the following config in our default req. handler:
  int name=shard-socket-timeout2000/int
  int name=shard-connection-timeout2000/int
 


 What version are you using Jay? At least on trunk, I took a look and it
 appears at some point these where renamed to socketTimeout and connTimeout.

 What about a timeout on your clients?

 - Mark Miller
 lucidimagination.com





Re: Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread Anderson vasconcelos
Nitin,

Use Multicore configuration. For each organization, you create a new core
with especific configurations. You will have one SOLR instance and one SOLR
Admin tool to manage all cores. The configuration is simple.

Good Luck

Regards

Anderson

2012/1/26 David Radunz da...@boxen.net

 Hey,

Sounds like what you need to setup is Multiple Cores configuration.
 At first I confused this with Multi Core CPU, but that's not what it's
 about. Basically it's a way to run multiple 'solr'
 cores/indexes/configurations from a single Solr instance (which will scale
 better as the resources will be shared). Have a read anyway:
 http://wiki.apache.org/solr/**CoreAdminhttp://wiki.apache.org/solr/CoreAdmin

 Cheers,

 David


 On 27/01/2012 8:18 AM, Nitin Arora wrote:

 Hi,

 We are using SOLR/Lucene to index/search the data about the user's of an
 organization. The nature of data is brief information about the user's
 work.
 Our data index requirement is to have segregated stores for each
 organization and currently we have 10 organizations and we have to run 10
 different instances of SOLR to serve search results for an organization.
 As
 the new organizations are joining it is getting difficult to manage these
 many instances.

 I think now there is a need to use 1 SOLR instance and then have
 10/multiple
 different data directories for each organization.

 When index/search request is received in SOLR we decide the data directory
 based on the organization.

1. Is it possible to do the same in SOLR and how can we achieve
 the same?
2. Will it be a good design to use SOLR like this?
3. Is there any impact on the scalability if we are able to manage
 the
 separate data directories inside SOLR?

 Thanks in advance

 Nitin


 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Multiple-Data-**Directories-and-1-SOLR-**
 instance-tp3691644p3691644.**htmlhttp://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Gora Mohanty
On Thu, Jan 26, 2012 at 3:47 AM, Egonsith egons...@gmail.com wrote:
 I have tried to search for my specific problem but have not found solution. I
 have also read the wiki on the DHI and seem to have everything set up right
 but my Query still fails. Thank you for your help
[...]

This has nothing to do with the warning in the title of your message.
That is very likely because the user running DIH (typically the Jetty/
tomcat user) does not have permissions to read/write the
dataimport.properties file in your Solr conf/ directory


The relevant error in your log is the following one:

 *SEVERE: Exception while processing: Titles document :
 SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to execute query: SELECT mrID, mrTitle from
 KnowledgeBase_DM.dbo.AskMe_Data Processing Document # 1*
        at
[...]

 Caused by: java.lang.NullPointerException
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:241)
[...]

Your SQL select is failing for some reason. Please check the
setup there. E.g., one item that is incorrect is the url attribute
in:
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=://localhost:1433;DatabaseName=KnowledgeBase_DM user=user
password=password /

It should be something like
url=jdbc:sqlserver://localhost:1433;DatabaseName=KnowledgeBase_DM

Regards,
Gora


Re: Solr Join query with fq not correctly filtering results?

2012-01-26 Thread Mike Hugo
I created issue https://issues.apache.org/jira/browse/SOLR-3062 for this
problem.  I was able to track it down to something in this commit -
http://svn.apache.org/viewvc?view=revisionrevision=1188624 (LUCENE-1536:
Filters can now be applied down-low, if their DocIdSet implements a new
bits() method, returning all documents in a random access way
) - before that commit the join / fq functionality works as expected /
documented on the wiki page.  After that commit it's broken.

Any assistance is greatly appreciated!

Thanks,

Mike

On Thu, Jan 26, 2012 at 11:04 AM, Mike Hugo m...@piragua.com wrote:

 Hello,

 I'm trying out the Solr JOIN query functionality on trunk.  I have the
 latest checkout, revision #1236272 - I did the following steps to get the
 example up and running:

 cd solr
 ant example
 java -jar start.jar
 cd exampledocs
 java -jar post.jar *.xml

 Then I tried a few of the sample queries on the wiki page
 http://wiki.apache.org/solr/Join.  In particular, this is one that I'm
 interest in

 Find all manufacturer docs named belkin, then join them against
 (product) docs and filter that list to only products with a price less than
 12 dollars

 http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkinfq=price:%5B%2A+TO+12%5Dhttp://localhost:8983/solr/select?q=%7B!join+from=id+to=manu_id_s%7DcompName_s:Belkinfq=price:%5B%2A+TO+12%5D


 However, when I run that query, I get two results, one with a price of
 19.95 and another with a price of 11.5  Because of the filter query, I'm
 only expecting to see one result - the one with a price of 11.99.

 I was also able to replicate this in a unit test added to
 org.apache.solr.TestJoin:

   @Test
   public void testJoin_withFilterQuery() throws Exception {
 assertU(add(doc(id, 1,name, john, title, Director,
 dept_s,Engineering)));
 assertU(add(doc(id, 2,name, mark, title, VP,
 dept_s,Marketing)));
 assertU(add(doc(id, 3,name, nancy, title, MTS,
 dept_s,Sales)));
 assertU(add(doc(id, 4,name, dave, title, MTS,
 dept_s,Support, dept_s,Engineering)));
 assertU(add(doc(id, 5,name, tina, title, VP,
 dept_s,Engineering)));

 assertU(add(doc(id,10, dept_id_s, Engineering, text,These
 guys develop stuff)));
 assertU(add(doc(id,11, dept_id_s, Marketing, text,These
 guys make you look good)));
 assertU(add(doc(id,12, dept_id_s, Sales, text,These guys
 sell stuff)));
 assertU(add(doc(id,13, dept_id_s, Support, text,These guys
 help customers)));

 assertU(commit());

 //***
 //This works as expected - the correct number of results are found
 //***
 // find people that develop stuff
 assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop,
 fl,id)

 ,/response=={'numFound':3,'start':0,'docs':[{'id':'1'},{'id':'4'},{'id':'5'}]}
 );

 *//
 *// this fails - the response returned finds all three people - it
 should only find John*
 *//expected
 =/response=={numFound:1,start:0,docs:[{id:1}]}*
 *//response = {*
 *//responseHeader:{*
 *//  status:0,*
 *//  QTime:4},*
 *//response:{numFound:3,start:0,docs:[*
 *//  {*
 *//id:1},*
 *//  {*
 *//id:4},*
 *//  {*
 *//id:5}]*
 *//}}*
 *//
 *// find people that develop stuff - but limit via filter query to a
 name of john*
 *assertJQ(req(q,{!join from=dept_id_s to=dept_s}text:develop,
 fl,id, fq, name:john)*
 *,/response=={'numFound':1,'start':0,'docs':[{'id':'1'}]}*
 *);*

   }


 Interestingly, I know this worked at some point.  I had a snapshot build
 in my ivy cache from 10/2/2011 and it was working with that
 build maven_artifacts/org/apache/solr/
 solr/4.0-SNAPSHOT/solr-4.0-20111002.161157-1.pom


 Mike



addBean method inserting multivalued values

2012-01-26 Thread Siddharth Gargate
Hi,
I have annotated the setter methods with Field annotations. And I am using
addBean method to add SOLR document. But all fields are being indexed as
multivalued:
doc
float name=score1.0/float
arr name=id
str1/str
/arr

arr name=name
strsiddharth 0/str
/arr
arr name=updated_dt
date2012-01-28T06:22:19.946Z/date
/arr
/doc

How to avoid this?


Re: SpellCheck Help

2012-01-26 Thread vishal_asc
Downloaded Apache Solr from the URL: http://apache.dattatec.com//lucene/solr/
, 
 extracted it at my windows machine.

Then started solr:  [solr-path]/example, and typed the following in a
terminal: java –jar start.jar.
it started and i can see the solr page at http://localhost:8983/solr/admin/

Now copied Magento [magento-instance-root]/lib/Apache/Solr/conf to
[Solr-instance-root]/example/solr/conf.

then again restared solr lots of activity was going on their. then I run
System-index management and at front end search box i tried to search a
product with incorrect spelling, in solr console i can see some activity but
at magento front end I couldnt get any result, why ?

I followed the steps given at this URL:
http://www.summasolutions.net/blogposts/magento-apache-solr-set#comment-615

Please look into it and let me know any other information you require.

I also want to know how i can implement facet and highlight search with
resulted output.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3692518.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SpellCheck Help

2012-01-26 Thread David Radunz

Hey,

I really recommend you contact Magento pre-sales to find out why 
THEIR stuff doesn't work. The information you have provided is specific 
to magento... You can't expect people on a Solr mailing list to help you 
with a Magento problem. I guarantee you the issue is probably something 
Magento is doing, so try seeking support their first (Try their mailing 
lists if they have any, or on IRC: irc.freenode.org #magento).


I am not trying to be rude, rather to save you time and others effort.

Cheers,

David

On 27/01/2012 5:37 PM, vishal_asc wrote:

Downloaded Apache Solr from the URL: http://apache.dattatec.com//lucene/solr/
,
  extracted it at my windows machine.

Then started solr:  [solr-path]/example, and typed the following in a
terminal: java –jar start.jar.
it started and i can see the solr page at http://localhost:8983/solr/admin/

Now copied Magento [magento-instance-root]/lib/Apache/Solr/conf to
[Solr-instance-root]/example/solr/conf.

then again restared solr lots of activity was going on their. then I run
System-index management and at front end search box i tried to search a
product with incorrect spelling, in solr console i can see some activity but
at magento front end I couldnt get any result, why ?

I followed the steps given at this URL:
http://www.summasolutions.net/blogposts/magento-apache-solr-set#comment-615

Please look into it and let me know any other information you require.

I also want to know how i can implement facet and highlight search with
resulted output.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3692518.html
Sent from the Solr - User mailing list archive at Nabble.com.