Re: SpellCheck Help

2012-01-26 Thread David Radunz

Hey,

I really recommend you contact Magento pre-sales to find out why 
THEIR stuff doesn't work. The information you have provided is specific 
to magento... You can't expect people on a Solr mailing list to help you 
with a Magento problem. I guarantee you the issue is probably something 
Magento is doing, so try seeking support their first (Try their mailing 
lists if they have any, or on IRC: irc.freenode.org #magento).


I am not trying to be rude, rather to save you time and others effort.

Cheers,

David

On 27/01/2012 5:37 PM, vishal_asc wrote:

Downloaded Apache Solr from the URL: http://apache.dattatec.com//lucene/solr/
,
  extracted it at my windows machine.

Then started solr:  [solr-path]/example, and typed the following in a
terminal: java –jar start.jar.
it started and i can see the solr page at http://localhost:8983/solr/admin/

Now copied Magento [magento-instance-root]/lib/Apache/Solr/conf to
[Solr-instance-root]/example/solr/conf.

then again restared solr lots of activity was going on their. then I run
System->index management and at front end search box i tried to search a
product with incorrect spelling, in solr console i can see some activity but
at magento front end I couldnt get any result, why ?

I followed the steps given at this URL:
http://www.summasolutions.net/blogposts/magento-apache-solr-set#comment-615

Please look into it and let me know any other information you require.

I also want to know how i can implement facet and highlight search with
resulted output.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3692518.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: SpellCheck Help

2012-01-26 Thread vishal_asc
Downloaded Apache Solr from the URL: http://apache.dattatec.com//lucene/solr/
, 
 extracted it at my windows machine.

Then started solr:  [solr-path]/example, and typed the following in a
terminal: java –jar start.jar.
it started and i can see the solr page at http://localhost:8983/solr/admin/

Now copied Magento [magento-instance-root]/lib/Apache/Solr/conf to
[Solr-instance-root]/example/solr/conf.

then again restared solr lots of activity was going on their. then I run
System->index management and at front end search box i tried to search a
product with incorrect spelling, in solr console i can see some activity but
at magento front end I couldnt get any result, why ?

I followed the steps given at this URL:
http://www.summasolutions.net/blogposts/magento-apache-solr-set#comment-615

Please look into it and let me know any other information you require.

I also want to know how i can implement facet and highlight search with
resulted output.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3692518.html
Sent from the Solr - User mailing list archive at Nabble.com.


addBean method inserting multivalued values

2012-01-26 Thread Siddharth Gargate
Hi,
I have annotated the setter methods with Field annotations. And I am using
addBean method to add SOLR document. But all fields are being indexed as
multivalued:

1.0

1



siddharth 0


2012-01-28T06:22:19.946Z



How to avoid this?


Re: Solr Join query with fq not correctly filtering results?

2012-01-26 Thread Mike Hugo
I created issue https://issues.apache.org/jira/browse/SOLR-3062 for this
problem.  I was able to track it down to something in this commit -
http://svn.apache.org/viewvc?view=revision&revision=1188624 (LUCENE-1536:
Filters can now be applied down-low, if their DocIdSet implements a new
bits() method, returning all documents in a random access way
) - before that commit the join / fq functionality works as expected /
documented on the wiki page.  After that commit it's broken.

Any assistance is greatly appreciated!

Thanks,

Mike

On Thu, Jan 26, 2012 at 11:04 AM, Mike Hugo  wrote:

> Hello,
>
> I'm trying out the Solr JOIN query functionality on trunk.  I have the
> latest checkout, revision #1236272 - I did the following steps to get the
> example up and running:
>
> cd solr
> ant example
> java -jar start.jar
> cd exampledocs
> java -jar post.jar *.xml
>
> Then I tried a few of the sample queries on the wiki page
> http://wiki.apache.org/solr/Join.  In particular, this is one that I'm
> interest in
>
> Find all manufacturer docs named "belkin", then join them against
>> (product) docs and filter that list to only products with a price less than
>> 12 dollars
>>
>> http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin&fq=price:%5B%2A+TO+12%5D
>
>
> However, when I run that query, I get two results, one with a price of
> 19.95 and another with a price of 11.5  Because of the filter query, I'm
> only expecting to see one result - the one with a price of 11.99.
>
> I was also able to replicate this in a unit test added to
> org.apache.solr.TestJoin:
>
>   @Test
>   public void testJoin_withFilterQuery() throws Exception {
> assertU(add(doc("id", "1","name", "john", "title", "Director",
> "dept_s","Engineering")));
> assertU(add(doc("id", "2","name", "mark", "title", "VP",
> "dept_s","Marketing")));
> assertU(add(doc("id", "3","name", "nancy", "title", "MTS",
> "dept_s","Sales")));
> assertU(add(doc("id", "4","name", "dave", "title", "MTS",
> "dept_s","Support", "dept_s","Engineering")));
> assertU(add(doc("id", "5","name", "tina", "title", "VP",
> "dept_s","Engineering")));
>
> assertU(add(doc("id","10", "dept_id_s", "Engineering", "text","These
> guys develop stuff")));
> assertU(add(doc("id","11", "dept_id_s", "Marketing", "text","These
> guys make you look good")));
> assertU(add(doc("id","12", "dept_id_s", "Sales", "text","These guys
> sell stuff")));
> assertU(add(doc("id","13", "dept_id_s", "Support", "text","These guys
> help customers")));
>
> assertU(commit());
>
> //***
> //This works as expected - the correct number of results are found
> //***
> // find people that develop stuff
> assertJQ(req("q","{!join from=dept_id_s to=dept_s}text:develop",
> "fl","id")
>
> ,"/response=={'numFound':3,'start':0,'docs':[{'id':'1'},{'id':'4'},{'id':'5'}]}"
> );
>
> *//
> *// this fails - the response returned finds all three people - it
> should only find John*
> *//expected
> =/response=={"numFound":1,"start":0,"docs":[{"id":"1"}]}*
> *//response = {*
> *//"responseHeader":{*
> *//  "status":0,*
> *//  "QTime":4},*
> *//"response":{"numFound":3,"start":0,"docs":[*
> *//  {*
> *//"id":"1"},*
> *//  {*
> *//"id":"4"},*
> *//  {*
> *//"id":"5"}]*
> *//}}*
> *//
> *// find people that develop stuff - but limit via filter query to a
> name of "john"*
> *assertJQ(req("q","{!join from=dept_id_s to=dept_s}text:develop",
> "fl","id", "fq", "name:john")*
> *,"/response=={'numFound':1,'start':0,'docs':[{'id':'1'}]}"*
> *);*
>
>   }
>
>
> Interestingly, I know this worked at some point.  I had a snapshot build
> in my ivy cache from 10/2/2011 and it was working with that
> build maven_artifacts/org/apache/solr/
> solr/4.0-SNAPSHOT/solr-4.0-20111002.161157-1.pom"
>
>
> Mike
>


Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Gora Mohanty
On Thu, Jan 26, 2012 at 3:47 AM, Egonsith  wrote:
> I have tried to search for my specific problem but have not found solution. I
> have also read the wiki on the DHI and seem to have everything set up right
> but my Query still fails. Thank you for your help
[...]

This has nothing to do with the warning in the title of your message.
That is very likely because the user running DIH (typically the Jetty/
tomcat user) does not have permissions to read/write the
dataimport.properties file in your Solr conf/ directory


The relevant error in your log is the following one:

> *SEVERE: Exception while processing: Titles document :
> SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query: SELECT mrID, mrTitle from
> KnowledgeBase_DM.dbo.AskMe_Data Processing Document # 1*
>        at
[...]

> Caused by: java.lang.NullPointerException
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:241)
[...]

Your SQL select is failing for some reason. Please check the
setup there. E.g., one item that is incorrect is the "url" attribute
in:


It should be something like
url="jdbc:sqlserver://localhost:1433;DatabaseName=KnowledgeBase_DM"

Regards,
Gora


Re: Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread Anderson vasconcelos
Nitin,

Use Multicore configuration. For each organization, you create a new core
with especific configurations. You will have one SOLR instance and one SOLR
Admin tool to manage all cores. The configuration is simple.

Good Luck

Regards

Anderson

2012/1/26 David Radunz 

> Hey,
>
>Sounds like what you need to setup is "Multiple Cores" configuration.
> At first I confused this with "Multi Core CPU", but that's not what it's
> about. Basically it's a way to run multiple 'solr'
> cores/indexes/configurations from a single Solr instance (which will scale
> better as the resources will be shared). Have a read anyway:
> http://wiki.apache.org/solr/**CoreAdmin
>
> Cheers,
>
> David
>
>
> On 27/01/2012 8:18 AM, Nitin Arora wrote:
>
>> Hi,
>>
>> We are using SOLR/Lucene to index/search the data about the user's of an
>> organization. The nature of data is brief information about the user's
>> work.
>> Our data index requirement is to have segregated stores for each
>> organization and currently we have 10 organizations and we have to run 10
>> different instances of SOLR to serve search results for an organization.
>> As
>> the new organizations are joining it is getting difficult to manage these
>> many instances.
>>
>> I think now there is a need to use 1 SOLR instance and then have
>> 10/multiple
>> different data directories for each organization.
>>
>> When index/search request is received in SOLR we decide the data directory
>> based on the organization.
>>
>>1. Is it possible to do the same in SOLR and how can we achieve
>> the same?
>>2. Will it be a good design to use SOLR like this?
>>3. Is there any impact on the scalability if we are able to manage
>> the
>> separate data directories inside SOLR?
>>
>> Thanks in advance
>>
>> Nitin
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.**
>> nabble.com/Multiple-Data-**Directories-and-1-SOLR-**
>> instance-tp3691644p3691644.**html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
i'm changing the params to socketTimeout and connTimeout and will test this
afternoon. client timeout was actually removed today, which helped a bit.

what about the other params, "timeAllowed" and "partialResults". my
expectation was that these were specifically for distributed search,
meaning if a response wasn't received w/in the timeAllowed, and if
partialResults is true, then that shard would not be waited on for results.
is that correct?

thanks,
-jay


On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill  wrote:

> We're on the trunk:
> 4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47
>
> Client timeouts are set to 4 seconds.
>
> Thanks,
> -Jay
>
>
> On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller wrote:
>
>>
>> On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:
>>
>> >
>> > I've tried setting the following config in our default req. handler:
>> > 2000
>> > 2000
>> >
>>
>>
>> What version are you using Jay? At least on trunk, I took a look and it
>> appears at some point these where renamed to socketTimeout and connTimeout.
>>
>> What about a timeout on your clients?
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>


Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
We're on the trunk:
4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47

Client timeouts are set to 4 seconds.

Thanks,
-Jay

On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller  wrote:

>
> On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:
>
> >
> > I've tried setting the following config in our default req. handler:
> > 2000
> > 2000
> >
>
>
> What version are you using Jay? At least on trunk, I took a look and it
> appears at some point these where renamed to socketTimeout and connTimeout.
>
> What about a timeout on your clients?
>
> - Mark Miller
> lucidimagination.com
>
>


Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Mark Miller

On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:

> 
> I've tried setting the following config in our default req. handler:
> 2000
> 2000
> 


What version are you using Jay? At least on trunk, I took a look and it appears 
at some point these where renamed to socketTimeout and connTimeout.

What about a timeout on your clients?

- Mark Miller
lucidimagination.com



Re: Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread David Radunz

Hey,

Sounds like what you need to setup is "Multiple Cores" 
configuration. At first I confused this with "Multi Core CPU", but 
that's not what it's about. Basically it's a way to run multiple 'solr' 
cores/indexes/configurations from a single Solr instance (which will 
scale better as the resources will be shared). Have a read anyway: 
http://wiki.apache.org/solr/CoreAdmin


Cheers,

David

On 27/01/2012 8:18 AM, Nitin Arora wrote:

Hi,

We are using SOLR/Lucene to index/search the data about the user's of an
organization. The nature of data is brief information about the user's work.
Our data index requirement is to have segregated stores for each
organization and currently we have 10 organizations and we have to run 10
different instances of SOLR to serve search results for an organization. As
the new organizations are joining it is getting difficult to manage these
many instances.

I think now there is a need to use 1 SOLR instance and then have 10/multiple
different data directories for each organization.

When index/search request is received in SOLR we decide the data directory
based on the organization.

1. Is it possible to do the same in SOLR and how can we achieve the 
same?
2. Will it be a good design to use SOLR like this?
3. Is there any impact on the scalability if we are able to manage the
separate data directories inside SOLR?

Thanks in advance

Nitin


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re:Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread wakemaster 39
I wish I had the link for you but it sounds like you are looking to use
solr cores.  They are separate indexes all under one solr instance. Check
out solr 3.5 example as I believe cores are now used and suggested as the
default configuration even if you only want to use one core.

Cameron
On Jan 26, 2012 4:18 PM, "Nitin Arora"  wrote:

> Hi,
>
> We are using SOLR/Lucene to index/search the data about the user's of an
> organization. The nature of data is brief information about the user's
> work.
> Our data index requirement is to have segregated stores for each
> organization and currently we have 10 organizations and we have to run 10
> different instances of SOLR to serve search results for an organization. As
> the new organizations are joining it is getting difficult to manage these
> many instances.
>
> I think now there is a need to use 1 SOLR instance and then have
> 10/multiple
> different data directories for each organization.
>
> When index/search request is received in SOLR we decide the data directory
> based on the organization.
>
>1. Is it possible to do the same in SOLR and how can we achieve the
> same?
>2. Will it be a good design to use SOLR like this?
>3. Is there any impact on the scalability if we are able to manage
> the
> separate data directories inside SOLR?
>
> Thanks in advance
>
> Nitin
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread Nitin Arora
Hi,

We are using SOLR/Lucene to index/search the data about the user's of an
organization. The nature of data is brief information about the user's work.
Our data index requirement is to have segregated stores for each
organization and currently we have 10 organizations and we have to run 10
different instances of SOLR to serve search results for an organization. As
the new organizations are joining it is getting difficult to manage these
many instances.

I think now there is a need to use 1 SOLR instance and then have 10/multiple
different data directories for each organization. 

When index/search request is received in SOLR we decide the data directory
based on the organization.

1. Is it possible to do the same in SOLR and how can we achieve the 
same?
2. Will it be a good design to use SOLR like this?
3. Is there any impact on the scalability if we are able to manage the
separate data directories inside SOLR?

Thanks in advance

Nitin


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Erick Erickson
Yeah, that happens. Glad you're past this issue thanks for closing it out.

Erick

On Thu, Jan 26, 2012 at 10:45 AM, Egonsith  wrote:
> Erik,
>
> Thanks for the reply,
> i a bit embarres to say this is a clasiic example of a way to messy
> development enviroment and these erros were due to many diffrent drivers and
> xml files that were edited way to many times. i have cleaned my dev
> enviromant and reinstalled tomcat and solr and am now getting past this
> error. thank you for the help.
>
> Mike
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/WARNING-Unable-to-read-dataimport-properties-DHI-issue-tp3689183p3691278.html
> Sent from the Solr - User mailing list archive at Nabble.com.


solr shards

2012-01-26 Thread ramin
Hello,

I've gone through the list and have not found the answer but if it is a
repetitive question, my apologies.

I have a 3x shards solr cluster. If i send a query to each of the shards
individually I get the result with a list of relevant docs. However, if i
send the query to the main solr server (dispatcher) it only returns the
value for numFound but there is no list of docs. Since i seem to be the only
one having this issue, it is probably a misconfiguration for which i
couldn't find an answer in the documentations. Can someone please help?

Also, the sum of all the individual numFound's seems to not match the
numFound I get from the main solr server, given that i do not have any
duplicate on the unique key.

Thanks in advance,
Ramin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-shards-tp3691370p3691370.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr and TF-IDF

2012-01-26 Thread Lee Carroll
"content-based recommender"  so its not CF etc
and its a project so its whatever his supervisor wants.

take a look at solrj should be more natural to integrate your java code with.

(Although not sure if it supports termv ector comp)

good luck



On 26 January 2012 17:27, Walter Underwood  wrote:
> Why are you using a search engine to build a recomender? None of the leading 
> teams in the Netflix Prize used search engines as a base technology.
>
> Start with the recommender algorithms in Mahout: http://mahout.apache.org/
>
> wunder
>
> On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote:
>
>> Hey there,
>>
>> I'm using Solr for my thesis, where I have to implement a content-based
>> recommender system for movies.
>>
>> I have indexed about 20thousand movies with their informations:
>> movie-id
>> title
>> genre
>> plot/movie-description <- !!!
>> cast
>>
>> I've enabled the TermvektorComponent for the fields genre, description and
>> cast.
>> So I can get the tf-idf-values for the terms of every movie.
>>
>> With these term-TfIdfValue-couples I have to compute the similarities
>> between movies by using the cosine similarity.
>> I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
>> solution, I have to
>> implement the CosineSimilarity in java myself.
>>
>> Now I have some problems/questions:
>> I get the responses in XML-format, which I read out with an XML-reader in
>> Java,
>> where it wriggle trough every child-node in order to reach the right node.
>> Is there a better way, to get these values in Node-Attributes or node-texts?
>> I have tried it with wt=csv but for the requests I get
>> responses only with the Movie-ID's, nothing more.
>> By XML-responseWriter my request is for example this:
>> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true
>> I get the right response with all terms and tf-tdf's - in xml.
>>
>> And if I add csv-notation
>> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true&wt=csv
>> I get only this:
>> id
>> 1800180382
>>
>> Maybe my request is wrong?
>>
>> Another problem is, if I get the terms and their tfidf-values, I store
>> them in a map.
>> But there isn't a succession in the values. I want e.g. store only the 10
>> chief terms,
>> so 10 terms with the highest tfidf-values. Can I sort them in a descending
>> succession?
>> I haven't find anything therefor. If its not possible, I must sort them
>> later in the map.
>>
>> My last question is:
>> any movie has a genre - often more than one.
>> Its like the "cat"-field (category) in the exampledocs with ipod/monitor
>> etc. and its an important pointfor the movies.
>> How can I integrate this factor?
>> I changed the boost-attribute in the Solr-Xml-Schema like this:
>> > multiValued="true" omitNorms="false" boost="3" termVectors="true"
>> termPositions="true" termOffsets="true"/>
>> Is that enough or is there any other possibility?
>>
>> Perhaps you see, that I am a beginner in Solr,
>> at the beginning a few weeks ago it was even more difficult for me but now
>> it goes better.
>> I would be very grateful for any help, ideas, tips or suggestions!
>>
>> Many regards
>> Nejla
>>
>
>
>


Re: Solr 3.5.0 can't find Carrot classes

2012-01-26 Thread Stanislaw Osinski
Hi,

Can you paste the logs from the second run?

Thanks,

Staszek

On Wed, Jan 25, 2012 at 00:12, Christopher J. Bottaro  wrote:

> On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote:
> > SEVERE: java.lang.NoClassDefFoundError:
> org/carrot2/core/ControllerFactory
> > at
> org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.(CarrotClusteringEngine.java:102)
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
> Source)
> > at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
> > at java.lang.reflect.Constructor.newInstance(Unknown Source)
> > at java.lang.Class.newInstance0(Unknown Source)
> > at java.lang.Class.newInstance(Unknown Source)
> >
> > …
> >
> > I'm starting Solr with -Dsolr.clustering.enabled=true and I can see that
> the Carrot jars in contrib are getting loaded.
> >
> > Full log file is here:
> http://onespot-development.s3.amazonaws.com/solr.log
> >
> > Any ideas?  Thanks for the help.
> >
> Ok, got a little further.  Seems that Solr doesn't like it if you include
> jars more than once (I had a lib dir and also  directives in the
> solrconfig which ended up loading the same jars twice).
>
> But now I'm getting these errors:  java.lang.NoClassDefFoundError:
> org/apache/solr/handler/clustering/SearchClusteringEngine
>
> Any help?  Thanks.


Re: social123 Data Appending Service

2012-01-26 Thread Geert-Jan Brits
No thanks, not sure which site you're talking about btw.
But anyway, no thanks


Op 26 januari 2012 19:41 schreef Aaron Biddar
het volgende:

> Hi there-
>
> I was on your site today and was not sure who to reach out to.  My Company,
> Social123, provides Social Data Appending for companies that provide
> lists.  In a nutshell, we add Facebook, LinkedIn and Twitter contact
> information to your current lists. Its a great way to easily offer a new
> service or add on to your current offerings.  Providing social media
> contact information to your customers will allow them to interact with
> their customers on a whole new level.
>
> If you are the right person to speak with, please let me know your
> availability for a quick 5-minute demo or check out our tour at
> www.social123.com.  If you are not the right person, would you mind
> passing
> this e-mail along?
>
> Thanks in advance.
>
> --
> Aaron Biddar
> Founder, CEO
> aaron.bid...@social123.com
> www.social123.com
> 78 Alexander St. #K  Charleston SC 29403
> M  678 925 3556   P 800.505.7295 ex101
>


Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Egonsith
Erik, 

Thanks for the reply, 
i a bit embarres to say this is a clasiic example of a way to messy
development enviroment and these erros were due to many diffrent drivers and
xml files that were edited way to many times. i have cleaned my dev
enviromant and reinstalled tomcat and solr and am now getting past this
error. thank you for the help.

Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/WARNING-Unable-to-read-dataimport-properties-DHI-issue-tp3689183p3691278.html
Sent from the Solr - User mailing list archive at Nabble.com.


social123 Data Appending Service

2012-01-26 Thread Aaron Biddar
Hi there-

I was on your site today and was not sure who to reach out to.  My Company,
Social123, provides Social Data Appending for companies that provide
lists.  In a nutshell, we add Facebook, LinkedIn and Twitter contact
information to your current lists. Its a great way to easily offer a new
service or add on to your current offerings.  Providing social media
contact information to your customers will allow them to interact with
their customers on a whole new level.

If you are the right person to speak with, please let me know your
availability for a quick 5-minute demo or check out our tour at
www.social123.com.  If you are not the right person, would you mind passing
this e-mail along?

Thanks in advance.

-- 
Aaron Biddar
Founder, CEO
aaron.bid...@social123.com
www.social123.com
78 Alexander St. #K  Charleston SC 29403
M  678 925 3556   P 800.505.7295 ex101


Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
I'm on a project where we have 1B docs sharded across 20 servers. We're not
in production yet and we're doing load tests now. We're sending load to hit
100qps per server. As the load increases we're seeing query times
sporadically increasing to 10 seconds, 20 seconds, etc. at times. What
we're trying to do is set a shard timeout so that responses longer than 2
seconds are discarded. We can live with less results in these cases. We're
not replicating yet as we want to see how the 20 shards perform first (plus
we're waiting on the massive amount of hardware)

I've tried setting the following config in our default req. handler:
2000
2000

I've just added these, and am testing now, but this doesn't look promising
either:
2000
true

Couldn't find much on the wiki about these params - I'm looking for more
details about how these work. I'll be happy to update the wiki with more
details based on the discussion here.

Any details about exactly how I can achieve my goal of timing out and
disregarding queries longer that 2 seconds would be greatly appreciated.

The index is insanely lean - no stored fields, no norms, no stop words,
etc. RAM buffer is 128, and we're using the standard "search" req. handler.
Essentially we're running Solr as a nosql data store, which suits this
project, but we need responses to be no longer than 2 seconds at the max.

Thanks,
-Jay


Re: Advice - evaluating Solr for categorization & keyword search

2012-01-26 Thread Erick Erickson
See below...

On Wed, Jan 25, 2012 at 2:38 PM, Becky Neil  wrote:
> Hi all,
> I've been tasked with evaluating whether Solr is the right solution for my
> company's search needs.  If this isn't the right forum for this kind of
> question, please let me know where to go instead!
>
> We are currently using sql queries to find mysql db results that match a
> single keyword in one short text field, so our search is pretty crude.
>
Be a little careful here. Often, when people come from a DB background
they think in terms of normalized data. If each of your tables is
independent of all other tables, then the simple "map the rows into
documents" approach works. More likely, you'll combine bits from
several tables into each Solr document and your reflexive distaste
for de-normalizing data will trip you up. Get over it ..

> What we hope that Solr can do initially is:
> 1 enable more flexible search (booleans, more than one field
> searched/matched, etc)
This is OOB functionality. But do note that Solr/Lucene query
parsing is not a true boolean process, see:
http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

> 2 live search results (eg new records get added to the index upon creation)
As you indicated below, you'd need some process that noticed that
your DB changed and then indexed the changed records. Once the
records are indexed, Solr will pick up the changes automatically
but you have to control the indexing process from outside.

> 3 search rankings (eg most relevant -> least relevant)
OOB functionality with lots of knobs to turn for tuning. See
edismax

> 4 categorize our db (take records and at least group them, better if it
> could assign a label to each record)
Depending on what the details are here, this may be OOB. See
faceting and grouping/field collapsing. See:
http://wiki.apache.org/solr/SolrFacetingOverview
http://wiki.apache.org/solr/FieldCollapsing

> 5 locate nearby results (geospatial search)
OOB, although you need to store the lat/lon. See:
http://wiki.apache.org/solr/SpatialSearch
>
> What I hope you can advise on is:
> A How would you go about #2 - making sure that new documents are
> added/indexed asap, based on a new rows to the db? Is that as simple as a
> setting in Solr, or does it take some coding (eg a listener object, a kron
> job, etc.).  I tried looking at the wiki & tutorial but wasn't able to find
> answers - I couldn't make sense of how to use UpdateRequestProcessor to do
> it. (http://wiki.apache.org/solr/UpdateRequestProcessor)
What you'll be doing here is either using Data Import Handler or
SolrJ (Java client) to push solr documents into Solr. This is
straight-forward once you know the magic. A trivial SolrJ program
that indexes documents from a DB is maybe 100 lines, including
imports. It *uses* the updatehandler, but you don't see that, you see
something like solrServer.add(ListOfSolrInputDocuments);

> B What's the status of document clustering? The wiki says it's not been
> fully implemented. Would we be able to achieve any of #4 yet? If not, what
> else should we consider?
I don't think you're really thinking about document clustering here. I suspect
that grouping and/or faceting will be where you start. At least I'd look at
that first although clustering may be exactly what you want. Half the battle
is learning the right vocabulary 

> C Would you use Solr over say Google Maps api to run location aware
> searches?
*shrugs*

> D How long should we expect it to take to configure Solr on our servers
> with our db, get the initial index set up, and enable live search results?
>  Are we talking one week, or one month? Our db is not tiny, but it's not
> huge - say around 8k records in each of ~20 tables. Most tables have around
> 10 fields, including at least one large text field and then a variety of
> dates, numbers, and small text.
Too many variables for you to count on this estimate, but:
*If* you can use Data Import Handler and starting from scratch, probably a week.
Someone who already knows Solr maybe a day. But whenever I start something
new, I usually chase a number of blind alleys.

Once set up, indexing your entire corpus will probably be a matter of
less than an hour (and I'm being quite conservative here. On my laptop,
Solr can index 7K documents/second from the English wiki dump). But
at times the database connection is the limiting factor

By the way, I recommend that if DIH starts getting hard to use, especially
due to the relationships between tables, consider jumping to SolrJ earlier
rather than later.

Your index size is pretty small by Solr standards, so you probably won't have
to shard or do some of the other complex kinds of things that come up when
you have lots of data.

Note that this is *just* for setting up Solr and being able to query
through, say,
the admin page. It does not exclude all the work for the UI you'll need to front
the app. Count on tweaking your configuration files (e.g. schema.xml and
solrconfig.xml) an

ord/rord with a function

2012-01-26 Thread entdeveloper
Is it possible for ord/rord to work with a function? I'm attempting to use
rord with a spatial function like the following as a bf:

bf=rord(geodist())

If there's no way for this to work, is there a way to simulate the same
behavior?

For some background, I have two sets of documents: one set applies to a
location in NY and another in LA. I want to boost documents that are closer
to where the user is searching from. But I only need these sets to be ranked
1 & 2. In other words, the actual distance should not be used to boost the
documents, just if you are closer or farther. We may add more locations in
the future, so I'd like to be able to rank the locations from closest to
furthest.

I need some way to rank the distances, and rord is the right idea, but
doesn't seem to work with functions.

I'm running Solr 3.4, btw.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/ord-rord-with-a-function-tp3691138p3691138.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: WARNING: Unable to read: dataimport.properties DHI issue

2012-01-26 Thread Erick Erickson
Nothing jumps out at me, but you might you might get some insight
from http://wiki.apache.org/solr/DataImportHandler, see the
"interactive development mode" section. The dataimport.jsp
page is helpful.

It *looks* like you're sql statement is having problems, but
I confess I only glanced at the output...

Best
Erick

On Wed, Jan 25, 2012 at 2:17 PM, Egonsith  wrote:
> I have tried to search for my specific problem but have not found solution. I
> have also read the wiki on the DHI and seem to have everything set up right
> but my Query still fails. Thank you for your help
>
> I am running Solr 3.1 with Tomcat 6.0
> Windows server 2003 r2 and SQL 2008
>
> I have the sqljdbc4.jar sitting in C:\Program Files\Apache Software
> Foundation\Tomcat 6.0\lib
>
> /My solrconfig.xml/
> -  class="org.apache.solr.handler.dataimport.DataImportHandler">
> - 
>  db-data-config.xml
>  
>  
>
> /My db-data-config.xml/
> - 
>   url="://localhost:1433;DatabaseName=KnowledgeBase_DM" user="user"
> password="password" />
> - 
> - 
>  
>  
> - 
>  
>  
>  
>  
>  
>
>
> /My logfile Output /
> Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImportHandler
> processConfiguration
> INFO: Processing configuration from solrconfig.xml:
> {config=db-data-config.xml}
> Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter
> loadDataConfig
> INFO: Data Configuration loaded successfully
> Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.DataImporter
> doFullImport
> INFO: Starting Full Import
> Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.SolrWriter
> readIndexerProperties
> *WARNING: Unable to read: dataimport.properties*
> Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Creating a connection for entity Titles with URL:
> ://localhost:1433;DatabaseName=KnowledgeBase_DM
> Jan 25, 2012 2:17:37 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
> call
> INFO: Time taken for getConnection(): 0
> Jan 25, 2012 2:17:37 PM org.apache.solr.common.SolrException log
> *SEVERE: Exception while processing: Titles document :
> SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query: SELECT mrID, mrTitle from
> KnowledgeBase_DM.dbo.AskMe_Data Processing Document # 1*
>        at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
>        at
> org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:188)
>        at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>        at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>        at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:591)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
>        at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:205)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>        at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>        at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>        at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>        at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>        at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>        at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>        at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>        at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>        at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>        at
> org.apache.coyote.h

Re: is it posible to get more Number of characters?

2012-01-26 Thread Erick Erickson
You still haven't given us much to go on. It would be helpful
to give some sample inputs, what you see when you query
(the output after adding &debugQuery=on is helpful), and the
fieldType definition from schema.xml for the field in question.

You might also try looking at the admin/analysis page to see
how your analysis chain breaks up the incoming stream
into tokens, that's often helpful

Best
Erick

On Thu, Jan 26, 2012 at 7:24 AM, Jörg Agatz  wrote:
> no,
>
> i have a lot of charecter in my url.. it looks like a stop at xyz
> charakters, so i hope to find a way use mor character
>
>
>
>
>
>
> On Thu, Jan 26, 2012 at 3:11 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
>> Hi Jörg,
>>
>> Hmmm, do you mind rephrasing the question?
>>
>> Otis
>> 
>> Performance Monitoring SaaS for Solr -
>> http://sematext.com/spm/solr-performance-monitoring/index.html
>>
>>
>> - Original Message -
>> > From: Jörg Agatz 
>> > To: solr-user@lucene.apache.org
>> > Cc:
>> > Sent: Thursday, January 26, 2012 5:23 AM
>> > Subject: is it posible to get more Number of characters?
>> >
>> > is it posible to get more Number of characters?
>> >
>> > I have a problem with too many characters in the search, my "Think
>> > Tank" is
>> > very long, but this also be the case.
>> > Unfortunately I can not find a setting that is responsible.
>> >
>>


Re: Solr and TF-IDF

2012-01-26 Thread Walter Underwood
Why are you using a search engine to build a recomender? None of the leading 
teams in the Netflix Prize used search engines as a base technology.

Start with the recommender algorithms in Mahout: http://mahout.apache.org/

wunder

On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote:

> Hey there,
> 
> I'm using Solr for my thesis, where I have to implement a content-based
> recommender system for movies.
> 
> I have indexed about 20thousand movies with their informations:
> movie-id
> title
> genre
> plot/movie-description <- !!!
> cast
> 
> I've enabled the TermvektorComponent for the fields genre, description and
> cast.
> So I can get the tf-idf-values for the terms of every movie.
> 
> With these term-TfIdfValue-couples I have to compute the similarities
> between movies by using the cosine similarity.
> I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
> solution, I have to
> implement the CosineSimilarity in java myself.
> 
> Now I have some problems/questions:
> I get the responses in XML-format, which I read out with an XML-reader in
> Java,
> where it wriggle trough every child-node in order to reach the right node.
> Is there a better way, to get these values in Node-Attributes or node-texts?
> I have tried it with wt=csv but for the requests I get
> responses only with the Movie-ID's, nothing more.
> By XML-responseWriter my request is for example this:
> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true
> I get the right response with all terms and tf-tdf's - in xml.
> 
> And if I add csv-notation
> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true&wt=csv
> I get only this:
> id
> 1800180382
> 
> Maybe my request is wrong?
> 
> Another problem is, if I get the terms and their tfidf-values, I store
> them in a map.
> But there isn't a succession in the values. I want e.g. store only the 10
> chief terms,
> so 10 terms with the highest tfidf-values. Can I sort them in a descending
> succession?
> I haven't find anything therefor. If its not possible, I must sort them
> later in the map.
> 
> My last question is:
> any movie has a genre - often more than one.
> Its like the "cat"-field (category) in the exampledocs with ipod/monitor
> etc. and its an important pointfor the movies.
> How can I integrate this factor?
> I changed the boost-attribute in the Solr-Xml-Schema like this:
>  multiValued="true" omitNorms="false" boost="3" termVectors="true"
> termPositions="true" termOffsets="true"/>
> Is that enough or is there any other possibility?
> 
> Perhaps you see, that I am a beginner in Solr,
> at the beginning a few weeks ago it was even more difficult for me but now
> it goes better.
> I would be very grateful for any help, ideas, tips or suggestions!
> 
> Many regards
> Nejla
> 





Solr and TF-IDF

2012-01-26 Thread Nejla Karacan
Hey there,

I'm using Solr for my thesis, where I have to implement a content-based
recommender system for movies.

I have indexed about 20thousand movies with their informations:
movie-id
title
genre
plot/movie-description <- !!!
cast

I've enabled the TermvektorComponent for the fields genre, description and
cast.
So I can get the tf-idf-values for the terms of every movie.

With these term-TfIdfValue-couples I have to compute the similarities
between movies by using the cosine similarity.
I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
solution, I have to
implement the CosineSimilarity in java myself.

Now I have some problems/questions:
I get the responses in XML-format, which I read out with an XML-reader in
Java,
where it wriggle trough every child-node in order to reach the right node.
Is there a better way, to get these values in Node-Attributes or node-texts?
I have tried it with wt=csv but for the requests I get
responses only with the Movie-ID's, nothing more.
By XML-responseWriter my request is for example this:
http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true
I get the right response with all terms and tf-tdf's - in xml.

And if I add csv-notation
http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true&wt=csv
I get only this:
id
1800180382

Maybe my request is wrong?

Another problem is, if I get the terms and their tfidf-values, I store
them in a map.
But there isn't a succession in the values. I want e.g. store only the 10
chief terms,
so 10 terms with the highest tfidf-values. Can I sort them in a descending
succession?
I haven't find anything therefor. If its not possible, I must sort them
later in the map.

My last question is:
any movie has a genre - often more than one.
Its like the "cat"-field (category) in the exampledocs with ipod/monitor
etc. and its an important pointfor the movies.
How can I integrate this factor?
I changed the boost-attribute in the Solr-Xml-Schema like this:

Is that enough or is there any other possibility?

Perhaps you see, that I am a beginner in Solr,
at the beginning a few weeks ago it was even more difficult for me but now
it goes better.
I would be very grateful for any help, ideas, tips or suggestions!

Many regards
Nejla



Solr Join query with fq not correctly filtering results?

2012-01-26 Thread Mike Hugo
Hello,

I'm trying out the Solr JOIN query functionality on trunk.  I have the
latest checkout, revision #1236272 - I did the following steps to get the
example up and running:

cd solr
ant example
java -jar start.jar
cd exampledocs
java -jar post.jar *.xml

Then I tried a few of the sample queries on the wiki page
http://wiki.apache.org/solr/Join.  In particular, this is one that I'm
interest in

Find all manufacturer docs named "belkin", then join them against (product)
> docs and filter that list to only products with a price less than 12 dollars
>
> http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin&fq=price:%5B%2A+TO+12%5D


However, when I run that query, I get two results, one with a price of
19.95 and another with a price of 11.5  Because of the filter query, I'm
only expecting to see one result - the one with a price of 11.99.

I was also able to replicate this in a unit test added to
org.apache.solr.TestJoin:

  @Test
  public void testJoin_withFilterQuery() throws Exception {
assertU(add(doc("id", "1","name", "john", "title", "Director",
"dept_s","Engineering")));
assertU(add(doc("id", "2","name", "mark", "title", "VP",
"dept_s","Marketing")));
assertU(add(doc("id", "3","name", "nancy", "title", "MTS",
"dept_s","Sales")));
assertU(add(doc("id", "4","name", "dave", "title", "MTS",
"dept_s","Support", "dept_s","Engineering")));
assertU(add(doc("id", "5","name", "tina", "title", "VP",
"dept_s","Engineering")));

assertU(add(doc("id","10", "dept_id_s", "Engineering", "text","These
guys develop stuff")));
assertU(add(doc("id","11", "dept_id_s", "Marketing", "text","These guys
make you look good")));
assertU(add(doc("id","12", "dept_id_s", "Sales", "text","These guys
sell stuff")));
assertU(add(doc("id","13", "dept_id_s", "Support", "text","These guys
help customers")));

assertU(commit());

//***
//This works as expected - the correct number of results are found
//***
// find people that develop stuff
assertJQ(req("q","{!join from=dept_id_s to=dept_s}text:develop",
"fl","id")

,"/response=={'numFound':3,'start':0,'docs':[{'id':'1'},{'id':'4'},{'id':'5'}]}"
);

*//
*// this fails - the response returned finds all three people - it
should only find John*
*//expected =/response=={"numFound":1,"start":0,"docs":[{"id":"1"}]}
*
*//response = {*
*//"responseHeader":{*
*//  "status":0,*
*//  "QTime":4},*
*//"response":{"numFound":3,"start":0,"docs":[*
*//  {*
*//"id":"1"},*
*//  {*
*//"id":"4"},*
*//  {*
*//"id":"5"}]*
*//}}*
*//
*// find people that develop stuff - but limit via filter query to a
name of "john"*
*assertJQ(req("q","{!join from=dept_id_s to=dept_s}text:develop",
"fl","id", "fq", "name:john")*
*,"/response=={'numFound':1,'start':0,'docs':[{'id':'1'}]}"*
*);*

  }


Interestingly, I know this worked at some point.  I had a snapshot build in
my ivy cache from 10/2/2011 and it was working with that
build maven_artifacts/org/apache/solr/
solr/4.0-SNAPSHOT/solr-4.0-20111002.161157-1.pom"


Mike


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
Sean, Ahmet,
thanks for response:)

I use Solr 4.0 from trunk.
In my solrconfig.xml is only one maxFieldLength param.
I think it is deprecated in Solr Versions 3.5+...

But LimitTokenCountFilterFactory works in my case :)
Thanks!

Regards
Vadim



2012/1/26 Ahmet Arslan :
>> i want to decrease the max. number of terms for my fields to
>> 500.
>> I thought what the maxFieldLength parameter in
>> solrconfig.xml is
>> intended for this.
>> In my case it doesn't work.
>>
>> The half of my text fields includes longer text(about 1
>> words).
>> With 100 docs in my index i had an segment size of 1140KB
>> for indexed
>> data and 270KB for stored data (.fdx, .fdt).
>> After a change from default
>> 1 to
>> 500,
>> delete(index folder), restarting Tomcat and reindex, i see
>> the same
>> segment sizes (1140KB for indexed and 270KB for stored
>> data).
>>
>> Please tell me if I made an error in reasoning.
>
> What version of solr are you using?
>
> Could it be 
> http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html?
>
> http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Sean Adams-Hiett
Vadim,

Is it possible that your solrconfig.xml is using maxFieldLength in both the
 and ?

If so the mainIndex config overwrites the other.  See this issue:
http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html

Sean

On Thu, Jan 26, 2012 at 10:15 AM, Vadim Kisselmann <
v.kisselm...@googlemail.com> wrote:

> P.S.:
> i use Solr 4.0 from trunk.
> Is maxFieldLength deprecated in Solr 4.0 ?
> If so, do i have an alternative to decrease the number of terms during
> indexing?
> Regards
> Vadim
>
>
>
> 2012/1/26 Vadim Kisselmann :
> > Hello Folks,
> > i want to decrease the max. number of terms for my fields to 500.
> > I thought what the maxFieldLength parameter in solrconfig.xml is
> > intended for this.
> > In my case it doesn't work.
> >
> > The half of my text fields includes longer text(about 1 words).
> > With 100 docs in my index i had an segment size of 1140KB for indexed
> > data and 270KB for stored data (.fdx, .fdt).
> > After a change from default 1 to
> > 500,
> > delete(index folder), restarting Tomcat and reindex, i see the same
> > segment sizes (1140KB for indexed and 270KB for stored data).
> >
> > Please tell me if I made an error in reasoning.
> >
> > Regards
> > Vadim
>



-- 
Sean Adams-Hiett
Owner, Web Geeks For Hire
phone: (361) 433.5748
email: s...@webgeeksforhire.com
web: www.webgeeksforhire.com
twitter: @geekbusiness 


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Ahmet Arslan
> i want to decrease the max. number of terms for my fields to
> 500.
> I thought what the maxFieldLength parameter in
> solrconfig.xml is
> intended for this.
> In my case it doesn't work.
> 
> The half of my text fields includes longer text(about 1
> words).
> With 100 docs in my index i had an segment size of 1140KB
> for indexed
> data and 270KB for stored data (.fdx, .fdt).
> After a change from default
> 1 to
> 500,
> delete(index folder), restarting Tomcat and reindex, i see
> the same
> segment sizes (1140KB for indexed and 270KB for stored
> data).
> 
> Please tell me if I made an error in reasoning.

What version of solr are you using?

Could it be 
http://lucene.apache.org/solr/api/org/apache/solr/analysis/LimitTokenCountFilterFactory.html?

http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/analysis/LimitTokenCountFilter.html


Re: decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
P.S.:
i use Solr 4.0 from trunk.
Is maxFieldLength deprecated in Solr 4.0 ?
If so, do i have an alternative to decrease the number of terms during indexing?
Regards
Vadim



2012/1/26 Vadim Kisselmann :
> Hello Folks,
> i want to decrease the max. number of terms for my fields to 500.
> I thought what the maxFieldLength parameter in solrconfig.xml is
> intended for this.
> In my case it doesn't work.
>
> The half of my text fields includes longer text(about 1 words).
> With 100 docs in my index i had an segment size of 1140KB for indexed
> data and 270KB for stored data (.fdx, .fdt).
> After a change from default 1 to
> 500,
> delete(index folder), restarting Tomcat and reindex, i see the same
> segment sizes (1140KB for indexed and 270KB for stored data).
>
> Please tell me if I made an error in reasoning.
>
> Regards
> Vadim


decreasing of maxFieldLength in solrconfig.xml doesn't work

2012-01-26 Thread Vadim Kisselmann
Hello Folks,
i want to decrease the max. number of terms for my fields to 500.
I thought what the maxFieldLength parameter in solrconfig.xml is
intended for this.
In my case it doesn't work.

The half of my text fields includes longer text(about 1 words).
With 100 docs in my index i had an segment size of 1140KB for indexed
data and 270KB for stored data (.fdx, .fdt).
After a change from default 1 to
500,
delete(index folder), restarting Tomcat and reindex, i see the same
segment sizes (1140KB for indexed and 270KB for stored data).

Please tell me if I made an error in reasoning.

Regards
Vadim


Re: is it posible to get more Number of characters?

2012-01-26 Thread Jörg Agatz
no,

i have a lot of charecter in my url.. it looks like a stop at xyz
charakters, so i hope to find a way use mor character






On Thu, Jan 26, 2012 at 3:11 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Hi Jörg,
>
> Hmmm, do you mind rephrasing the question?
>
> Otis
> 
> Performance Monitoring SaaS for Solr -
> http://sematext.com/spm/solr-performance-monitoring/index.html
>
>
> - Original Message -
> > From: Jörg Agatz 
> > To: solr-user@lucene.apache.org
> > Cc:
> > Sent: Thursday, January 26, 2012 5:23 AM
> > Subject: is it posible to get more Number of characters?
> >
> > is it posible to get more Number of characters?
> >
> > I have a problem with too many characters in the search, my "Think
> > Tank" is
> > very long, but this also be the case.
> > Unfortunately I can not find a setting that is responsible.
> >
>


RE: Using multiple DirectSolrSpellcheckers for a query

2012-01-26 Thread Dyer, James
Nalini,

Right now the best you can do is to use  to combine everything into 
a catch-all for spellchecking purposes.  While this seems wasteful, this often 
has to be done anyhow because typically you'll need less/different analysis for 
spellchecking than for searching.  But rather than having separate s 
to create multiple dictionaries, put everything into one field to create a 
single "master" dictionary.

>From there, you need to set "spellcheck.collate" to true and also 
>"spellcheck.maxCollationTries" greater than zero (5-10 usually works).  The 
>first parameter tells it to generate re-written queries with spelling 
>suggestions (collations).  The second parameter tells it to weed out any 
>collations that won't generate hits if you re-query them.  This is important 
>because having unrelated keywords in your master dictionary will increase the 
>chances the spellchecker will pick the wrong words as corrections.

There is a significant caveat to this:  The spellchecker typically only 
suggests for words in the dictionary.  So by creating a huge, master dictionary 
you might find that many misspelled words won't generate suggestions.  See this 
thread for some workarounds:  
http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-td3658411.html
  

I think having multiple, per-field dictionaries as you suggest might be a good 
way to go.  While this is not supported, I don't think its because of 
performance concerns.  (There would be an overhead cost to this but I think it 
would still be practical).  It just hasn't been implemented yet.  But we might 
be getting to a possible start to this type of functionality.  In 
https://issues.apache.org/jira/browse/SOLR-2585 a separate spellchecker is 
added that just corrects wordbreak (or is it "word break"?) problems, then a 
"ConjunctionSolrSpellChecker" combines the results from the main spellchecker 
and the wordbreak spellcheker.  I could see a next step beyond this being to 
support per-field dictionaries, checking them separately, then combining the 
results.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Wednesday, January 25, 2012 11:56 AM
To: solr-user@lucene.apache.org
Subject: Using multiple DirectSolrSpellcheckers for a query

Hi,

We are trying to use the DirectSolrSpellChecker to get corrections for
mis-spelled query terms directly from fields in the Solr index.

However, we need to use multiple fields for spellchecking a query. It looks
looks like you can only use one spellchecker for a request and so the
workaround for this it to create a copy field from the fields required for
spell correction?

We'd like to avoid this because we allow users to perform different kinds
of queries on different sets of fields and so to provide meaningful
corrections we'd have to create multiple copy fields - one for each query
type.

Is there any reason why Solr doesn't support using multiple spellcheckers
for a query? Is it because of performance overhead?

Thanks,
Nalini


Re: Query for documents that have ONLY a certain value in a multivalued field

2012-01-26 Thread Garrett Conaty
Thought of another way to do this which will at least work for one field,
and that is by mapping all of the values into a simple string field and
then querying for an exact match in the string (one term).  This is similar
to having a 'count' field, but for our index creation process we could
reuse a string field we already had made (for sorting).  Still, I'd like to
see if the community has any other options from within Solr itself.


On Thu, Jan 26, 2012 at 2:05 AM, bilal dadanlar  wrote:

> I am having a similar problem and would appreciate any useful explanation
> on this topic.
> I couldn't find a way of querying for "exact match" in multivalued or
> normal text fields
>
> On Thu, Jan 26, 2012 at 3:14 AM, Garrett Conaty  wrote:
>
> > Does anyone know if there's a way using the SOLR query syntax to filter
> > documents that have only a certain value in a multivalued field?  As an
> > example if I have some field "country" that's multivalued and I want
> >
> > q=id:[* TO *]&fq=country:brazil   where 'brazil' is the only value
> > present.
> >
> > I've run through a few possibilities to do this, but I think it would be
> > more common and a better solution would exist:
> >
> > 1) On index creation time, aggregate my source data and create a
> > "count_country" field that contains the number of terms in the country
> > field.  Then the query would be q=id:[* TO
> > *]&fq=country:brazil&fq=count_country=1
> >
> > 2) In the search client, use the terms component to retrieve all terms
> for
> > "country" and then do the exclusions in the client and construct the
> query
> > as follows q=id:[* TO
> > *]&fq=country:brazil&fq=-country:canada&fq=-country:us   etc.
> >
> > 3) Write a function query or similar that could capture the info.
> >
> >
> >
> > Thanks in advance,
> > Garrett Conaty
> >
>
>
>
> --
> Bilal Dadanlar
>


Re: Size of index to use shard

2012-01-26 Thread Dmitry Kan
@Erick:
Thanks for the detailed explanation. On this note, we have 75GB for *.fdt
and *.fdx out of 99GB index. The search is still not that fast, if cache
size is small. But giving more cache led to OOMs. Partitioning to shards is
not an option either, as at the moment we try to run as less machines as
possible.

@Vadim:
Thanks for the info! For the 6GB of heap size I assume you cache are not
that big? We had filterCache (used heavily compared to other cache types in
facet and non-facet queries according to our measurements) in the order of
20 thousand entries and heap size 22GB and observed OOM. So we decided to
lower the cache params down substantially.

Dmitry

On Tue, Jan 24, 2012 at 10:25 PM, Vadim Kisselmann <
v.kisselm...@googlemail.com> wrote:

> @Erick
> thanks:)
> i´m with you with your opinion.
> my load tests show the same.
>
> @Dmitry
> my docs are small too, i think about 3-15KB per doc.
> i update my index all the time and i have an average of 20-50 requests
> per minute (20% facet queries, 80% large boolean queries with
> wildcard/fuzzy) . How much docs at a time=> depends from choosed
> filters, from 10 to all 100Mio.
> I work with very small caches (strangely, but if my index is under
> 100GB i need larger caches, over 100GB smaller caches..)
> My JVM has 6GB, 18GB for I/O.
> With few updates a day i would configure very big caches, like Tim
> Burton (see HathiTrust´s Blog)
>
> Regards Vadim
>
>
>
> 2012/1/24 Anderson vasconcelos :
> > Thanks for the explanation Erick :)
> >
> > 2012/1/24, Erick Erickson :
> >> Talking about "index size" can be very misleading. Take
> >> a look at
> http://lucene.apache.org/java/3_5_0/fileformats.html#file-names.
> >> Note that the *.fdt and *.fdx files are used to for stored fields, i.e.
> >> the verbatim copy of data put in the index when you specify
> >> stored="true". These files have virtually no impact on search
> >> speed.
> >>
> >> So, if your *.fdx and *.fdt files are 90G out of a 100G index
> >> it is a much different thing than if these files are 10G out of
> >> a 100G index.
> >>
> >> And this doesn't even mention the peculiarities of your query mix.
> >> Nor does it say a thing about whether your cheapest alternative
> >> is to add more memory.
> >>
> >> Anderson's method is about the only reliable one, you just have
> >> to test with your index and real queries. At some point, you'll
> >> find your tipping point, typically when you come under memory
> >> pressure. And it's a balancing act between how much memory
> >> you allocate to the JVM and how much you leave for the op
> >> system.
> >>
> >> Bottom line: No hard and fast numbers. And you should periodically
> >> re-test the empirical numbers you *do* arrive at...
> >>
> >> Best
> >> Erick
> >>
> >> On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos
> >>  wrote:
> >>> Apparently, not so easy to determine when to break the content into
> >>> pieces. I'll investigate further about the amount of documents, the
> >>> size of each document and what kind of search is being used. It seems,
> >>> I will have to do a load test to identify the cutoff point to begin
> >>> using the strategy of shards.
> >>>
> >>> Thanks
> >>>
> >>> 2012/1/24, Dmitry Kan :
>  Hi,
> 
>  The article you gave mentions 13GB of index size. It is quite small
> index
>  from our perspective. We have noticed, that at least solr 3.4 has some
>  sort
>  of "choking" point with respect to growing index size. It just becomes
>  substantially slower than what we need (a query on avg taking more
> than
>  3-4
>  seconds) once index size crosses a magic level (about 80GB following
> our
>  practical observations). We try to keep our indices at around 60-70GB
> for
>  fast searches and above 100GB for slow ones. We also route majority of
>  user
>  queries to fast indices. Yes, caching may help, but not necessarily we
>  can
>  afford adding more RAM for bigger indices. BTW, our documents are very
>  small, thus in 100GB index we can have around 200 mil. documents. It
>  would
>  be interesting to see, how you manage to ensure q-times under 1 sec
> with
>  an
>  index of 250GB? How many documents / facets do you ask max. at a time?
>  FYI,
>  we ask for a thousand of facets in one go.
> 
>  Regards,
>  Dmitry
> 
>  On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann <
>  v.kisselm...@googlemail.com> wrote:
> 
> > Hi,
> > it depends from your hardware.
> > Read this:
> >
> >
> http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
> > Think about your cache-config (few updates, big caches) and a good
> > HW-infrastructure.
> > In my case i can handle a 250GB index with 100mil. docs on a I7
> > machine with RAID10 and 24GB RAM => q-times under 1 sec.
> > Regards
> > Vadim
> >
> >
> >
> > 2012/1/24 Anderson vasconcel

Re: is it posible to get more Number of characters?

2012-01-26 Thread Otis Gospodnetic
Hi Jörg,

Hmmm, do you mind rephrasing the question?

Otis 

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html 


- Original Message -
> From: Jörg Agatz 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Thursday, January 26, 2012 5:23 AM
> Subject: is it posible to get more Number of characters?
> 
> is it posible to get more Number of characters?
> 
> I have a problem with too many characters in the search, my "Think 
> Tank" is
> very long, but this also be the case.
> Unfortunately I can not find a setting that is responsible.
>


is it posible to get more Number of characters?

2012-01-26 Thread Jörg Agatz
is it posible to get more Number of characters?

I have a problem with too many characters in the search, my "Think Tank" is
very long, but this also be the case.
Unfortunately I can not find a setting that is responsible.


Re: Query for documents that have ONLY a certain value in a multivalued field

2012-01-26 Thread bilal dadanlar
I am having a similar problem and would appreciate any useful explanation
on this topic.
I couldn't find a way of querying for "exact match" in multivalued or
normal text fields

On Thu, Jan 26, 2012 at 3:14 AM, Garrett Conaty  wrote:

> Does anyone know if there's a way using the SOLR query syntax to filter
> documents that have only a certain value in a multivalued field?  As an
> example if I have some field "country" that's multivalued and I want
>
> q=id:[* TO *]&fq=country:brazil   where 'brazil' is the only value
> present.
>
> I've run through a few possibilities to do this, but I think it would be
> more common and a better solution would exist:
>
> 1) On index creation time, aggregate my source data and create a
> "count_country" field that contains the number of terms in the country
> field.  Then the query would be q=id:[* TO
> *]&fq=country:brazil&fq=count_country=1
>
> 2) In the search client, use the terms component to retrieve all terms for
> "country" and then do the exclusions in the client and construct the query
> as follows q=id:[* TO
> *]&fq=country:brazil&fq=-country:canada&fq=-country:us   etc.
>
> 3) Write a function query or similar that could capture the info.
>
>
>
> Thanks in advance,
> Garrett Conaty
>



-- 
Bilal Dadanlar


Re: Currency field type

2012-01-26 Thread darul
Thank you Erik, I think about taking time to be more involved in solr
development.

In the meantime, I will choose to store prices and currency in a normalized
way.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Currency-field-type-tp3684682p3690076.html
Sent from the Solr - User mailing list archive at Nabble.com.


More Number of characters???

2012-01-26 Thread Jörg Agatz
Ausgangssprache: Deutsch 
Geben Sie Text oder eine Website-Adresse ein oder lassen Sie ein Dokument
übersetzen .
Abbrechen 
is it posible to get more Number of characters?

I have a problem with too many characters in the search, my "Think Tank" is
very long, but this also be the case.
Unfortunately I can not find a setting that is responsible.