Re: solr user

2010-09-01 Thread Pavan Gupta
Hi Ankita,
One reason could be that you are using area_t instead of city_t for mapping.
So the association may not be taking place in Solr. Have you tried searching
on skill? That should have worked for you.
Pavan

On Thu, Sep 2, 2010 at 12:10 PM, ankita shinde wrote:

> hi,
> I am able to index all my entries in my table named info. This table has
> four columns named id, name, city and skill.
> I have written a data-config file as follow :
>
> 
>  
>  driver="com.mysql.jdbc.Driver"
>  url="jdbc:mysql://3307/dbname"
>  user="user-name"
>  password="password"/>
>  
>
>query="select id,name,city,branch from info">
>
> 
> 
> 
> 
>
>
>  
> 
>
>
>
>
> *And the entries made in schema.xml are:*
>
> 
> 
>
>
> entries for id and name are already present.
>
>
> And i also have added requestimporthandler in solrconfig.xml as follow:
>
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>  data-config.xml
> 
> 
>
>
>
> Data is successfully indexed. But I am able to find results only for
> column 'name'. For other columns it is not giving result.
>


Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Lance Norskog
Loading a servlet creates a bunch of classes via reflection. These are
in PermGen and never go away. If you load&unload over and over again,
any PermGen setting will fill up.

On Wed, Sep 1, 2010 at 2:23 PM, Luke Tebbs  wrote:
>
> Have you tried to up the MaxHeapSize?
>
> I tend to run solr and the development instance in a separate jetty (on a
> separate port) and actually restart the web server for the dev application
> every now and again.
> It doesn't take too long if you only have one webapp on jetty - I tend to
> use mvn jetty:run on the CLI rather than launch jetty in eclipse. I also use
> JRebel to reduce the number of restarts needed during dev.
>
> As for a production instance, should you need to redeploy that often?
>
> Luke
>
> Antonio Calo' wrote:
>>
>>  Hi guys
>>
>> I'm facing an error in our production environment with our search
>> application based on maven with spring + solrj.
>>
>> When I try to change a class, or try to redeploy/restart an application, I
>> catch a java.lang.OutOfMemoryError: PermGen
>>
>> I've tryed to understand the cause of this and also I've succeded in
>> reproducing this issue on my local develop environment by just restarting
>> the jetty several time (I'm using eclipse + maven plugin).
>>
>> The logs obtained are those:
>>
>>   [...]
>>   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
>>   /admin/: org.apache.solr.handler.admin.AdminHandlers
>>   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
>>   /admin/ping: PingRequestHandler
>>   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
>>   /debug/dump: solr.DumpRequestHandler
>>   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
>>   SolrCore org.apache.solr.core.solrc...@1409c28
>>   17:43:19 ERROR InvertedIndexEngine:124 open -
>>   java.lang.OutOfMemoryError: PermGen space
>>   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
>>        at org.apache.solr.core.SolrCore.(SolrCore.java:579)
>>        at
>>
>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
>>        at
>>
>> com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>>
>> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
>>        at
>>
>> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
>>        at
>>
>> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
>>   [...]
>>
>> The exception is always thrown while solr init is performed after a
>> restart (this is the reason why I'm asking your support ;) )
>>
>> It seems that while solr is trying to be set up (by [Timer-1]), another
>> thread ([Finalizer]) is trying to close it. I can see from the Solr code
>> that this exception is thrown always in the same place: SolrCore.java:1068.
>> Here there is a comment that say:
>>
>>   // need to close the searcher here??? we shouldn't have to.
>>          throw new RuntimeException(th);
>>        } finally {
>>          if (newestSearcher != null) {
>>            newestSearcher.decref();
>>          }
>>        }
>>
>> I'm using slorj lib in a Spring container, so I'm supposing that Spring
>> will manage the relase of all the singleton classes. Should I do something
>> other like force closing solr?
>>
>> Thanks in advance for your support.
>>
>> Best regards
>>
>> Antonio
>>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Download document from solr

2010-09-01 Thread Lance Norskog
Solr can return the list of results in JSON or php format, so that you
UI can allow a download.

You can write a UI in the Velocity toolkit- it's pretty easy.

On Wed, Sep 1, 2010 at 8:24 AM, Matteo Moci  wrote:
>  Hello to All,
> I am a newbie with Solr, and I am trying to understand if I can use it form
> my purpose,
> and I was wondering how Solr lists the result documents: do they appear as
> "downloadable files",
> just like http://solr.machine.com/path/file.doc, or do I need develop
> another layer to take care of downloading?
> Even a link to the docs might work...
>
> Thank you,
> Matteo
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: questions about synonyms

2010-09-01 Thread Lance Norskog
2. Is there a way to do synonyms' highlight in search result?

>From the highlighter's point of view, there are one or more terms at a
position. The SynonymFilter adds or changes those terms. Other filters
also add or change those terms. The highlighter highlights whatever it
finds.

On Tue, Aug 31, 2010 at 2:06 PM, Geert-Jan Brits  wrote:
> concerning:
>> . I got a very big text file of synonyms. How I can use it? Do I need to
> index this text file first?
>
> have you seen
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter ?
>
> Cheers,
> Geert-Jan
> 
>
> 2010/8/31 Ma, Xiaohui (NIH/NLM/LHC) [C] 
>
>> Hello,
>>
>>
>>
>> I have an couple of questions about synonyms.
>>
>>
>>
>> 1. I got a very big text file of synonyms. How I can use it? Do I need to
>> index this text file first?
>>
>>
>>
>> 2. Is there a way to do synonyms' highlight in search result?
>>
>>
>>
>> 3. Does anyone use WordNet to solr?
>>
>>
>>
>>
>>
>> Thanks so much in advance,
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


solr user

2010-09-01 Thread ankita shinde
hi,
I am able to index all my entries in my table named info. This table has
four columns named id, name, city and skill.
I have written a data-config file as follow :


  
  









  





*And the entries made in schema.xml are:*





entries for id and name are already present.


And i also have added requestimporthandler in solrconfig.xml as follow:



  data-config.xml





Data is successfully indexed. But I am able to find results only for
column 'name'. For other columns it is not giving result.


Re: Resume Solr indexing CSV after exception

2010-09-01 Thread Lance Norskog
No. They are talking about a new feature in the DataImportHandler for
reading CSV files.

On Tue, Aug 31, 2010 at 1:55 PM, romiawasthy  wrote:
>
> How do I use this feature, is there some parameter that I need to specify in
> the update request?
>
> curl
> http://localhost:8983/solr/update/csv?stream.file=exampledocs/books.csv&stream.contentType=text/plain;charset=utf-8
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Resume-Solr-indexing-CSV-after-exception-tp878801p1396875.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


Re: **SPAM** solr

2010-09-01 Thread ankita shinde
I am able to index all my entries in my table named info. This table has
four columns named id, name, city and branch.
I have written a data-config file as follow :


  
  




  



Re: Distance sorting with spatial filtering

2010-09-01 Thread Lance Norskog
Post your schema.

On Mon, Aug 30, 2010 at 2:04 PM, Scott K  wrote:
> The new spatial filtering (SOLR-1586) works great and is much faster
> than fq={!frange. However, I am having problems sorting by distance.
> If I try
> GET 
> 'http://localhost:8983/solr/select/?q=*:*&sort=dist(2,latitude,longitude,0,0)+asc'
> I get an error:
> Error 400 can not sort on unindexed field: dist(2,latitude,longitude,0,0)
>
> I was able to work around this with
> GET 'http://localhost:8983/solr/select/?q=*:* AND _val_:"recip(dist(2,
> latitude, longitude, 0,0),1,1,1)"&fl=*,score'
>
> But why isn't sorting by functions working? I get this error with any
> function I try to sort on.This is a nightly trunk build from Aug 25th.
> I see SOLR-1297 was reopened, but that seems to be for edge cases.
>
> Second question: I am using the LatLonType from the Spatial Filtering
> wiki, http://wiki.apache.org/solr/SpatialSearch
> Are there any distance sorting functions that use this field, or do I
> need to have three indexed fields, store_lat_lon, latitude, and
> longitude, if I want both filtering and sorting by distance.
>
> Thanks, Scott
>



-- 
Lance Norskog
goks...@gmail.com


Re: Custom scoring

2010-09-01 Thread Lance Norskog
Check out the function query feature, and the bf= parameter. It may be
that the existing functions meet your needs, or that you can add a few
new functions.

It can take a while to understand what you really want to do, so
writing a large piece of code now can be wasteful.

On Mon, Aug 30, 2010 at 2:04 PM, Brad Kellett  wrote:
> Hi all,
>
> I'm looking for examples or pointers to some info on implementing custom 
> scoring in solr/lucene. Basically, what we're looking at doing is to augment 
> the score from a dismax query with some custom signals based on data in 
> fields from the row initially matched. There will be several of these 
> features dynamically scored at query-time (due to the nature of the data, 
> pre-computed stuff isn't really what we're looking for).
>
> I do apologize for the vagueness of this, but a lot of this data is stuff we 
> want to keep under wraps. Essentially, I'm just looking for a place to use 
> some custom java code to be able to manipulate the score for a row matched in 
> a dismax query.
>
> I've been Googling like a mad man, but haven't really hit on something that 
> seems ideal yet. Custom similarity appears to just allow changing the 
> components of the TF-IDF score, for example. Can someone point me to an 
> example of doing something like this?
>
> ~Brad



-- 
Lance Norskog
goks...@gmail.com


Re: Hardware Specs Question

2010-09-01 Thread Lance Norskog
I was just reading about configuring mass computation grids: hardware
writes on 2 striped disks take 10% than writes on a single disk,
because you have to wait for the slower disk to finish. So, single
disks without RAID are faster.

I don't know how much SSD disks cost, but they will certainly cure the
disk i/o problem.

On Tue, Aug 31, 2010 at 1:35 AM, scott chu (朱炎詹)  wrote:
> In our current lab project, we already built a Chinese newspaper index with
> 18 millions documents. The index size is around 51GB. So I am very concerned
> about the memory issue you guys mentioned.
>
> I also look up the Hathitrust report on SolrPerformanceData page:
> http://wiki.apache.org/solr/SolrPerformanceData. They said their main
> bottleneck is Disk-I/O even they have 10 shards spread over 4 servers.
>
> Can you guys give me some helpful suggestion about hardward spec & memory
> configuration on our project?
>
> Thanks in advance.
>
> Scott
>
> - Original Message - From: "Lance Norskog" 
> To: 
> Sent: Tuesday, August 31, 2010 1:01 PM
> Subject: Re: Hardware Specs Question
>
>
> There are synchronization points, which become chokepoints at some
> number of cores. I don't know where they cause Lucene to top out.
> Lucene apps are generally disk-bound, not CPU-bound, but yours will
> be. There are so many variables that it's really not possible to give
> any numbers.
>
> Lance
>
> On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian  wrote:
>>
>> Lance,
>>
>> makes sense and I have heard about the long GC times on large heaps but I
>> personally haven't experienced a slowdown but that doesn't mean anything
>> either :-). Agreed that tuning the SOLR caching is the way to go.
>>
>> I haven't followed all the solr/lucene changes but from what I remember
>> there are synchronization points that could be a bottleneck where adding
>> more cores won't help this problem? Or am I completely missing something.
>>
>> Thanks again
>> Amit
>>
>> On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹)
>> wrote:
>>
>>> I am also curious as Amit does. Can you make an example about the garbage
>>> collection problem you mentioned?
>>>
>>> - Original Message - From: "Lance Norskog" 
>>> To: 
>>> Sent: Tuesday, August 31, 2010 9:14 AM
>>> Subject: Re: Hardware Specs Question
>>>
>>>
>>>
>>> It generally works best to tune the Solr caches and allocate enough

 RAM to run comfortably. Linux & Windows et. al. have their own cache
 of disk blocks. They use very good algorithms for managing this cache.
 Also, they do not make long garbage collection passes.

 On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian 
 wrote:

> Lance,
>
> Thanks for your help. What do you mean by that the OS can keep the
> index
> in
> memory better than Solr? Do you mean that you should use another means
> to
> keep the index in memory (i.e. ramdisk)? Is there a generally accepted
> heap
> size/index size that you follow?
>
> Thanks
> Amit
>
> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog 
> wrote:
>
> The price-performance knee for small servers is 32G ram, 2-6 SATA
>>
>> disks on a raid, 8/16 cores. You can buy these servers and half-fill
>> them, leaving room for expansion.
>>
>> I have not done benchmarks about the max # of processors that can be
>> kept busy during indexing or querying, and the total numbers: QPS,
>> response time averages & variability, etc.
>>
>> If your index file size is 8G, and your Java heap is 8G, you will do
>> long garbage collection cycles. The operating system is very good at
>> keeping your index in memory- better than Solr can.
>>
>> Lance
>>
>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian 
>> wrote:
>> > Hi all,
>> >
>> > I am curious to know get some opinions on at what point having more
>> > >  >
>> CPU
>> > cores shows diminishing returns in terms of QPS. Our index size is >
>> about
>> 8GB
>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
>> > Currently I have the heap to 8GB.
>> >
>> > We are looking to get more servers to increase capacity and because
>> > >  >
>> the
>> > warranty is set to expire on our old servers and so I was curious >
>> before
>> > asking for a certain spec what others run and at what point does >
>> having
>> more
>> > cores cease to matter? Mainly looking at somewhere between 4-12 >
>> > cores
>> > per
>> > server.
>> >
>> > Thanks!
>> > Amit
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>


 --
 Lance Norskog
 goks...@gmail.com


>>>
>>>
>>>
>>> 
>>>
>>>
>>>
>>> ___b___J_T_f_r_C
>>> Checked by AVG - www.avg.com
>>> Version: 9.0.851 / Virus D

In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-01 Thread Scott Gonyea
Hi,

I'm looking to get some direction on where I should focus my attention, with
regards to the Solr codebase and documentation.  Rather than write a ton of
stuff no one wants to read, I'll just start with a use-case.  For context,
the data originates from Nutch crawls and is indexed into Solr.

Imagine a web page has the following content (4 occurences of "Johnson" are
bolded):

--content_--
Lorem ipsum dolor *Johnson* sit amet, consectetur adipiscing elit. Aenean id
urna et justo fringilla dictum *johnson* in at tortor. Nulla eu nulla magna,
nec sodales est. Sed *johnSon* sed elit non lorem sagittis fermentum. Mauris
a arcu et sem sagittis rhoncus vel malesuada *Johnsons* mi. Morbi eget
ligula nisi. Ut fringilla ullamcorper sem.
--_content--

*First*; I would like to have the entire "content" block be indexed within
Solr.  This is done and definitely not an issue.

*Second* (+); during the injection of crawl data into Solr, I would like to
grab every occurence of a specific word, or phrase, with "Johnson" being my
example for the above.  I want to take every such phrase (without
collision), as well as its unique-context, and inject that into its own,
separate Solr index.  For example, the above "content" example, having been
indexed in its entirety, would also be the source of 4 additional indexes.
In each index, "Johnson" would only appear once.  All of the text before and
after "Johnson" would be BOUND BY any other occurrence of "Johnson."  eg:

--index1_--
Lorem ipsum dolor *Johnson* sit amet, consectetur adipiscing elit. Aenean id
urna et justo fringilla dictum
--_index1-- --index2_--
sit amet, consectetur adipiscing elit. Aenean id urna et justo fringilla
dictum *johnson* in at tortor. Nulla eu nulla magna, nec sodales est. Sed
--_index2-- --index3_--
in at tortor. Nulla eu nulla magna, nec sodales est. Sed *johnSon* sed elit
non lorem sagittis fermentum. Mauris a arcu et sem sagittis rhoncus vel
malesuada
--_index3-- --index4_--
sed elit non lorem sagittis fermentum. Mauris a arcu et sem sagittis rhoncus
vel malesuada *Johnsons* mi. Morbi eget ligula nisi. Ut fringilla
ullamcorper sem.
--_index4--

Q:
How much of this is feasible in "present-day Solr" and how much of it do I
need to produce in a patch of my own?  Can anyone give me some direction on
where I should look, in approaching this problem (ie, libs / classes /
confs)?  I sincerely appreciate it.

*Third*; I would later like to go through the above, child indexes and
dismiss any that appear within a given context.  For example, I may deem
"ipsum dolor *Johnson* sit amet" as not being useful and I'd want to delete
any indexes matching that particular phrase-context.  The deletion is
trivial and, with the 2nd item resolved--this becomes a fairly non-issue.

Q:
The question, more or less, comes from the fact that my source data is from
a web crawler.  When recrawled, I need to repeat the process of dismissing
phrase-contexts that are not relevant to me.  Where is the best place to
perform this work?  I could easily perform queries, after indexing my crawl,
but that seems needlessly intensive.  I think the answer to that will be
"wherever I implement #2", but assumptions can be painfully expensive.


Thank you for reading my bloated e-mail.  Again, I'm mostly just looking to
be pointed to various pieces of the Lucene / Solr code-base, and am trolling
for any insight that people might share.

Scott Gonyea


Re: Alphanumeric wildcard search problem

2010-09-01 Thread Erick Erickson
Oh dear. Wildcard queries aren't analyzed, so I suspect it's a casing issue.

Try two things:
1> search for r-1*
2> look in your index and be sure the actual terms are there as you expect.

HTH
Erick

On Wed, Sep 1, 2010 at 4:35 PM, Hasnain  wrote:

>
> Thankyou for your suggestions
>
> when before removing the wordDelimiterFilterFactory, the results for q=R-*
> returned perfect results but not for q=R-1*, also after removing
> wordDelimiterFilterFactory, it didnt bring me results for q=R-*
>
> the results before removing wordDelimiterFilterFactory using debugQuery=on
> were
>
> 
> −
> 
> 0
> 78
> −
> 
> on
> mat_nr
> R-1*
> standard2
> 
> 
> 
> −
> 
> R-1*
> R-1*
> −
> 
> +DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
> description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6) ()
> 
> −
> 
> +(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
> prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6 ()
> 
> 
> DisMaxQParser
> 
> 
> −
> 
> 31.0
> −
> 
> 15.0
> −
> 
> 15.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> 
> −
> 
> 16.0
> −
> 
> 16.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> 
> 
> 
> 
>
> and after removing wordDelimiterFilterFactory
>
> 
> −
> 
> 0
> 78
> −
> 
> on
> mat_nr
> R-1*
> standard2
> 
> 
> 
> −
> 
> R-1*
> R-1*
> −
> 
> +DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
> description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6) ()
> 
> −
> 
> +(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
> prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6 ()
> 
> 
> DisMaxQParser
> 
> 
> −
> 
> 31.0
> −
> 
> 15.0
> −
> 
> 15.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> 
> −
> 
> 16.0
> −
> 
> 16.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> −
> 
> 0.0
> 
> 
> 
> 
> 
>
> also at first the wordDelimiterFilterFactory used was this
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>
>
> before removing wordDelimiterFilterFactory, solr admin showed
>
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term position   1
> term text   R-1110
> term type   word
> source start,end0,6
> payload
> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
> ignoreCase=true, enablePositionIncrements=true}
> term position   1
> term text   R-1110
> term type   word
> source start,end0,6
> payload
> org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
> catenateNumbers=1}
> term position   1   2
> term text   R   1110
> term type   wordword
> source start,end0,1 2,6
> payload
> org.apache.solr.analysis.LowerCaseFilterFactory {}
> term position   1   2
> term text   r   1110
> term type   wordword
> source start,end0,1 2,6
> payload
> org.apache.solr.analysis.EnglishPorterFilterFactory
> {protected=protwords.txt}
> term position   1   2
> term text   r   1110
> term type   wordword
> source start,end0,1 2,6
> payload
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
> term position   1   2
> term text   r   1110
> term type   wordword
> source start,end0,1 2,6
> payload
>
>
>
> also after removing wordDelimiterFilterFactory,solr admin looks like this
>
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term position   1
> term text   R-1110
> term type   word
> source start,end0,6
> payload
> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
> ignoreCase=true, enablePositionIncrements=true}
> term position   1
> term text   R-1110
> term type   word
> source start,end0,6
> payload
> org.apache.solr.analysis.EnglishPorterFilterFactory
> {protected=protwords.txt}
> term position   1
> term text   R-1110
> term type   word
> source start,end0,6
> payload
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
> term position   1
> term text   R-1110
> term type   word
> source start,end0,6
> payload
>
>
> any suggestions?
>
> thankyou
>
>
> Erick Erickson wrote:
> >
> > Really look at the analysis page in solr admin for how your
> > analyzer chain handles things, or you'll spend time until you're
> > really old having trouble :).
> >
> > Here's what I see on a quick scan:
> >
> >> StandardTokenizer tries to, among other things, preserve
> > email addresses. The kinds of strings you're working with may
> > trip something up here.
> >
> >> Remove WordDelimiterFactory 

Re: Download document from solr

2010-09-01 Thread Erick Erickson
SOLR returns an XML packet (well, you can also specify other response
formats, e.b. JSON). Within that XML, there'll be some overall response
characteristics (e.g. number of matches) and a list of documents.

If you do the example setup (http://lucene.apache.org/solr/tutorial.html)
and submit a query you'll see the XML returned (default) right in your
browser. If you're using FireFox or Chrome, you might have to install
an XML plugin to see it nicely formatted.

HTH
Erick

On Wed, Sep 1, 2010 at 11:24 AM, Matteo Moci  wrote:

>  Hello to All,
> I am a newbie with Solr, and I am trying to understand if I can use it form
> my purpose,
> and I was wondering how Solr lists the result documents: do they appear as
> "downloadable files",
> just like http://solr.machine.com/path/file.doc, or do I need develop
> another layer to take care of downloading?
> Even a link to the docs might work...
>
> Thank you,
> Matteo
>
>


Re: Need help with field collapsing and out of memory error

2010-09-01 Thread Moazzam Khan
Hi,


If this is how you configure the field collapsing cache, then I don't
have it setup:


 



  


I didnt add that part to solrconfig.xml.

The way I setup field collapsing is I added this tag:



Then I modified the default request handler (for standard queries) with this:

 

 
   explicit

 
 
collapse
facet
highlight
debug
 
  




On Wed, Sep 1, 2010 at 4:11 PM, Jean-Sebastien Vachon
 wrote:
> can you tell us what are your current settings regarding the 
> fieldCollapseCache?
>
> I had similar issues with field collapsing and I found out that this cache 
> was responsible for
> most of the OOM exceptions.
>
> Reduce or even remove this cache from your configuration and it should help.
>
>
> On 2010-09-01, at 1:10 PM, Moazzam Khan wrote:
>
>> Hi guys,
>>
>> I have about 20k documents in the Solr index (and there's a lot of
>> text in each of them). I have field collapsing enabled on a specific
>> field (AdvisorID).
>>
>> The thing is if I have field collapsing enabled in the search request
>> I don't get correct count for the total number of records that
>> matched. It always says that the number of "rows" I asked to get back
>> is the number of total records it found.
>>
>> And, when I run a query with search criteria *:* (to get the number of
>> total advisors in the index) solr runs of out memory and gives me an
>> error saying
>>
>> SEVERE: java.lang.OutOfMemoryError: Java heap space
>>        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
>>        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
>>        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
>>        at java.lang.StringCoding.decode(StringCoding.java:173)
>>
>>
>> This is going to be a huge problem later on when we index 50k
>> documents later on.
>>
>> These are the options I am running Solr with :
>>
>> java  -Xms2048M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:PermSize=1024m
>> MaxPermSize=1024m    -jar  start.jar
>>
>>
>> Is there any way I can get the counts and not run out of memory?
>>
>> Thanks in advance,
>> Moazzam
>
>


Re: High - Low field value?

2010-09-01 Thread kenf_nc

That's exactly what I want.  I was just searching the wiki using the wrong
terms.
Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/High-Low-field-value-tp1402568p1403164.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High - Low field value?

2010-09-01 Thread Geert-Jan Brits
StatsComponent is exactly what you're looking for.

http://wiki.apache.org/solr/StatsComponent

Cheers,
Geert-Jan

2010/9/1 kenf_nc 

>
> I want to do range facets on a couple fields, a Price field in particular.
> But Price is relative to the product type. Books, Automobiles and Houses
> are
> vastly different price ranges, and withing Houses there may be a regional
> difference (price range in San Francisco is different than Columbus, OH for
> example).
>
> If I do Filter Query on type, so I'm not mixing books with houses, is there
> a quick way in a query to get the High and Low value for a given field? I
> would need those to build my range boundaries more efficiently.
>
> Ideally it would be a function of the query, so regionality could be taken
> into account. It's not a search score, or a facet, it's more a function. I
> know query functions exist, but haven't had to use them yet and the 'max'
> function doesn't look like what I need.  Any suggestions?
> Thanks.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/High-Low-field-value-tp1402568p1402568.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: how do I create custom function that uses multiple ValuesSources?

2010-09-01 Thread Gerald

Figured this out about ten minutes after I posted the message, and much
simpler than I thought it would be.

I used the SumFloatFunction (which extends MultiFloatFunction) as a starting
point and was able to achieve what I was going for for my test; a simple
string length function that returns the length of the specified fields after
concatenation

very nice being able to create custom functions.

now on to creating custom functions that handle multiValued data types
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-create-custom-function-that-uses-multiple-ValuesSources-tp1402645p1403070.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Luke Tebbs


Have you tried to up the MaxHeapSize?

I tend to run solr and the development instance in a separate jetty (on 
a separate port) and actually restart the web server for the dev 
application every now and again.
It doesn't take too long if you only have one webapp on jetty - I tend 
to use mvn jetty:run on the CLI rather than launch jetty in eclipse. I 
also use JRebel to reduce the number of restarts needed during dev.


As for a production instance, should you need to redeploy that often?

Luke

Antonio Calo' wrote:

 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an 
application, I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) 


at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113) 


at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 


at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 


at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536) 


at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477) 


at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409) 


   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), 
another thread ([Finalizer]) is trying to close it. I can see from the 
Solr code that this exception is thrown always in the same place: 
SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that 
Spring will manage the relase of all the singleton classes. Should I 
do something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio





Re: Need help with field collapsing and out of memory error

2010-09-01 Thread Jean-Sebastien Vachon
can you tell us what are your current settings regarding the fieldCollapseCache?

I had similar issues with field collapsing and I found out that this cache was 
responsible for 
most of the OOM exceptions.

Reduce or even remove this cache from your configuration and it should help.


On 2010-09-01, at 1:10 PM, Moazzam Khan wrote:

> Hi guys,
> 
> I have about 20k documents in the Solr index (and there's a lot of
> text in each of them). I have field collapsing enabled on a specific
> field (AdvisorID).
> 
> The thing is if I have field collapsing enabled in the search request
> I don't get correct count for the total number of records that
> matched. It always says that the number of "rows" I asked to get back
> is the number of total records it found.
> 
> And, when I run a query with search criteria *:* (to get the number of
> total advisors in the index) solr runs of out memory and gives me an
> error saying
> 
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>at java.nio.CharBuffer.wrap(CharBuffer.java:350)
>at java.nio.CharBuffer.wrap(CharBuffer.java:373)
>at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
>at java.lang.StringCoding.decode(StringCoding.java:173)
> 
> 
> This is going to be a huge problem later on when we index 50k
> documents later on.
> 
> These are the options I am running Solr with :
> 
> java  -Xms2048M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:PermSize=1024m
> MaxPermSize=1024m-jar  start.jar
> 
> 
> Is there any way I can get the counts and not run out of memory?
> 
> Thanks in advance,
> Moazzam



Localsolr with Dismax

2010-09-01 Thread Luke Tebbs
Does anyone have any experience with getting dismax to work with a 
geospatial (localsolr) search?


I have the following configuration -


 
   
 dismax
 title description^0.5
 title description^0.5
 0%
 0.1
   
 

 
   
 dismax
 title description^0.5
 title description^0.5
 0%
 0.1
   
   
 localsolr 
 facet

 mlt
 highlight
 debug
   
 


All of the location searching works fine, as does the normal search, but 
when using the "geo" handler the textual search seems to be using the 
standard search handler and only the title field is searched.


I'm a bit stumped on this one, any help would be greatly appreciated.

Luke


Re: Auto Suggest

2010-09-01 Thread Eric Grobler
Thanks for your feedback Robert,

I will try that and see how Solr performs on my data - I think I will create
a field that contains only important key/product terms from the text.

Regards
Johan

On Wed, Sep 1, 2010 at 9:12 PM, Robert Petersen  wrote:

> We don't have that many, just a hundred thousand, and solr response
> times (since the index's docs are small and not complex) are logged as
> typically 1 ms if not 0 ms.  It's funny but sometimes it is so fast no
> milliseconds have elapsed.  Incredible if you ask me...  :)
>
> Once you get SOLR to consider the whole phrase as just one big term, the
> wildcard is very fast.
>
> -Original Message-
> From: Eric Grobler [mailto:impalah...@googlemail.com]
> Sent: Wednesday, September 01, 2010 12:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Auto Suggest
>
> Hi Robert,
>
> Interesting approach, how many documents do you have in Solr?
> I have about 2 million and I just wonder if it might be a bit slow.
>
> Regards
> Johan
>
> On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen 
> wrote:
>
> > I do this by replacing the spaces with a '%' in a separate search
> field
> > which is not parsed nor tokenized and then you can wildcard across the
> > whole phrase like you want and the spaces don't mess you up.  Just
> store
> > the original phrase with spaces in a separate field for returning to
> the
> > front end for display.
> >
> > -Original Message-
> > From: Jazz Globe [mailto:jazzgl...@hotmail.com]
> > Sent: Wednesday, September 01, 2010 7:33 AM
> > To: solr-user@lucene.apache.org
> > Subject: Auto Suggest
> >
> >
> > Hallo
> >
> > How would one implement a multiple term auto-suggest feature in Solr
> > that is filter sensitive?
> > For example, a user enters :
> > "mp3"
> >  and solr might suggest:
> >  ->   "mp3 player"
> >  ->   "mp3 nano"
> >  ->   "mp3 sony"
> > and then the user starts the second word :
> > "mp3 n"
> > and that narrows it down to:
> >  -> "mp3 nano"
> >
> > I had a quick look at the Terms Component.
> > I suppose it just returns term totals for the entire index and cannot
> be
> > used with a filter or query?
> >
> > Thanks
> > Johan
> >
> >
> >
>


Re: Alphanumeric wildcard search problem

2010-09-01 Thread Hasnain

Thankyou for your suggestions

when before removing the wordDelimiterFilterFactory, the results for q=R-*
returned perfect results but not for q=R-1*, also after removing
wordDelimiterFilterFactory, it didnt bring me results for q=R-*

the results before removing wordDelimiterFilterFactory using debugQuery=on
were


−

0
78
−

on
mat_nr
R-1*
standard2



−

R-1*
R-1*
−

+DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6) ()

−

+(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6 ()


DisMaxQParser


−

31.0
−

15.0
−

15.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0


−

16.0
−

16.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0






and after removing wordDelimiterFilterFactory


−

0
78
−

on
mat_nr
R-1*
standard2



−

R-1*
R-1*
−

+DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6) ()

−

+(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6 ()


DisMaxQParser


−

31.0
−

15.0
−

15.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0


−

16.0
−

16.0

−

0.0

−

0.0

−

0.0

−

0.0

−

0.0






also at first the wordDelimiterFilterFactory used was this



before removing wordDelimiterFilterFactory, solr admin showed

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text   R-1110
term type   word
source start,end0,6
payload 
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true, enablePositionIncrements=true}
term position   1
term text   R-1110
term type   word
source start,end0,6
payload 
org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
catenateNumbers=1}
term position   1   2
term text   R   1110
term type   wordword
source start,end0,1 2,6
payload 
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1   2
term text   r   1110
term type   wordword
source start,end0,1 2,6
payload 
org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}
term position   1   2
term text   r   1110
term type   wordword
source start,end0,1 2,6
payload 
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1   2
term text   r   1110
term type   wordword
source start,end0,1 2,6
payload 



also after removing wordDelimiterFilterFactory,solr admin looks like this

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1
term text   R-1110
term type   word
source start,end0,6
payload 
org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true, enablePositionIncrements=true}
term position   1
term text   R-1110
term type   word
source start,end0,6
payload 
org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}
term position   1
term text   R-1110
term type   word
source start,end0,6
payload 
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1
term text   R-1110
term type   word
source start,end0,6
payload 


any suggestions?

thankyou


Erick Erickson wrote:
> 
> Really look at the analysis page in solr admin for how your
> analyzer chain handles things, or you'll spend time until you're
> really old having trouble :).
> 
> Here's what I see on a quick scan:
> 
>> StandardTokenizer tries to, among other things, preserve
> email addresses. The kinds of strings you're working with may
> trip something up here.
> 
>> Remove WordDelimiterFactory altogether. The point of WDF
> is to break words apart at transitions.
> 
>> Remove EnglishPorterFilterFactory too. What the effect
> of applying an algorithmic stemming process to words like
> you're interested in is...er...not obvious.
> 
> All that said, I took a quick at the analysis page with your definition
> and nothing jumped out at me. Are you sure that:
>> you're getting to the request handler you think? What does adding
> &debugQuery=on show?
>> you've indexed the data after you've made the changes you outlined above?
> The SOLR
> admin page can help here, especially the [full interface] link, with debug
> info on.
> 
> If nothing shows up, can you post the results of &debugQuery=on?
> 
> Best
> Erick
> 
> On Tue, Aug 31, 2010 at 6:11 AM, Hasnain  wrote:
> 
>>
>> I have gone through all the of the related

RE: Auto Suggest

2010-09-01 Thread Robert Petersen
We don't have that many, just a hundred thousand, and solr response
times (since the index's docs are small and not complex) are logged as
typically 1 ms if not 0 ms.  It's funny but sometimes it is so fast no
milliseconds have elapsed.  Incredible if you ask me...  :)

Once you get SOLR to consider the whole phrase as just one big term, the
wildcard is very fast.

-Original Message-
From: Eric Grobler [mailto:impalah...@googlemail.com] 
Sent: Wednesday, September 01, 2010 12:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Auto Suggest

Hi Robert,

Interesting approach, how many documents do you have in Solr?
I have about 2 million and I just wonder if it might be a bit slow.

Regards
Johan

On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen 
wrote:

> I do this by replacing the spaces with a '%' in a separate search
field
> which is not parsed nor tokenized and then you can wildcard across the
> whole phrase like you want and the spaces don't mess you up.  Just
store
> the original phrase with spaces in a separate field for returning to
the
> front end for display.
>
> -Original Message-
> From: Jazz Globe [mailto:jazzgl...@hotmail.com]
> Sent: Wednesday, September 01, 2010 7:33 AM
> To: solr-user@lucene.apache.org
> Subject: Auto Suggest
>
>
> Hallo
>
> How would one implement a multiple term auto-suggest feature in Solr
> that is filter sensitive?
> For example, a user enters :
> "mp3"
>  and solr might suggest:
>  ->   "mp3 player"
>  ->   "mp3 nano"
>  ->   "mp3 sony"
> and then the user starts the second word :
> "mp3 n"
> and that narrows it down to:
>  -> "mp3 nano"
>
> I had a quick look at the Terms Component.
> I suppose it just returns term totals for the entire index and cannot
be
> used with a filter or query?
>
> Thanks
> Johan
>
>
>


how do I create custom function that uses multiple ValuesSources?

2010-09-01 Thread Gerald

using the NvlValueSourceParser example, I was able to create a custom
function that has two parameters; a valuesource (a solr field) and a string
literal.  i.e.: myfunc(mysolrfield, "test")

it works well but is a pretty simple function.

what is the the best way to implement a (more complex) custom function that
contains two (or more) ValuesSources as parameters??  i.e.:  

myfunc2(myValuesSource1, myValuesSources2, "test") or
myfunc2(div(myValuesSource1,3), sum(myValuesSources2, 2), "test") or
myfunc2(myValuesSource1, myValuesSource2, myValuesSource3)

I dont have a concrete example right now but will likely get some
application ideas once I figure this out

any thoughts/examples on something like this?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-create-custom-function-that-uses-multiple-ValuesSources-tp1402645p1402645.html
Sent from the Solr - User mailing list archive at Nabble.com.


High - Low field value?

2010-09-01 Thread kenf_nc

I want to do range facets on a couple fields, a Price field in particular.
But Price is relative to the product type. Books, Automobiles and Houses are
vastly different price ranges, and withing Houses there may be a regional
difference (price range in San Francisco is different than Columbus, OH for
example). 

If I do Filter Query on type, so I'm not mixing books with houses, is there
a quick way in a query to get the High and Low value for a given field? I
would need those to build my range boundaries more efficiently. 

Ideally it would be a function of the query, so regionality could be taken
into account. It's not a search score, or a facet, it's more a function. I
know query functions exist, but haven't had to use them yet and the 'max'
function doesn't look like what I need.  Any suggestions?
Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/High-Low-field-value-tp1402568p1402568.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto Suggest

2010-09-01 Thread Eric Grobler
Hi Robert,

Interesting approach, how many documents do you have in Solr?
I have about 2 million and I just wonder if it might be a bit slow.

Regards
Johan

On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen  wrote:

> I do this by replacing the spaces with a '%' in a separate search field
> which is not parsed nor tokenized and then you can wildcard across the
> whole phrase like you want and the spaces don't mess you up.  Just store
> the original phrase with spaces in a separate field for returning to the
> front end for display.
>
> -Original Message-
> From: Jazz Globe [mailto:jazzgl...@hotmail.com]
> Sent: Wednesday, September 01, 2010 7:33 AM
> To: solr-user@lucene.apache.org
> Subject: Auto Suggest
>
>
> Hallo
>
> How would one implement a multiple term auto-suggest feature in Solr
> that is filter sensitive?
> For example, a user enters :
> "mp3"
>  and solr might suggest:
>  ->   "mp3 player"
>  ->   "mp3 nano"
>  ->   "mp3 sony"
> and then the user starts the second word :
> "mp3 n"
> and that narrows it down to:
>  -> "mp3 nano"
>
> I had a quick look at the Terms Component.
> I suppose it just returns term totals for the entire index and cannot be
> used with a filter or query?
>
> Thanks
> Johan
>
>
>


RE: Auto Suggest

2010-09-01 Thread Robert Petersen
I do this by replacing the spaces with a '%' in a separate search field
which is not parsed nor tokenized and then you can wildcard across the
whole phrase like you want and the spaces don't mess you up.  Just store
the original phrase with spaces in a separate field for returning to the
front end for display.

-Original Message-
From: Jazz Globe [mailto:jazzgl...@hotmail.com] 
Sent: Wednesday, September 01, 2010 7:33 AM
To: solr-user@lucene.apache.org
Subject: Auto Suggest


Hallo

How would one implement a multiple term auto-suggest feature in Solr
that is filter sensitive?
For example, a user enters :
"mp3"
  and solr might suggest:
  ->   "mp3 player"
  ->   "mp3 nano"
  ->   "mp3 sony"
and then the user starts the second word :
"mp3 n"
and that narrows it down to:
  -> "mp3 nano"

I had a quick look at the Terms Component.
I suppose it just returns term totals for the entire index and cannot be
used with a filter or query?

Thanks
Johan

  


Do commits block updates in SOLR 1.4?

2010-09-01 Thread Robert Petersen
I can't seem to find a definitive answer.  I have ten threads doing my
indexing and I block all the threads when one is ready to do a commit so
no adds are done until the commit finishes.  Is this still required in
SOLR 1.4 or could I take it out?  I tried testing this on a separate
small index where I set autocommit in solrconfig and seem to have no
issues just continuously adding documents from multiple threads to it
despite its commit activity.  I'd like to do the same in my big main
index, is it safe?

 

Also, is there any difference in behavior between autocommits and
explicit commits in this regard?

 

 



RE: how to deal with virtual collection in solr?

2010-09-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thank you, Jan. Unfortunately I got following exception when I use 
http://localhost:8983/solr/aapublic/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/
 . 

*
Aug 31, 2010 4:54:42 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at java.io.StringReader.(StringReader.java:33)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
at org.apache.solr.search.QParser.getQuery(QParser.java:131)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
*

-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Tuesday, August 31, 2010 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?

Hi,

If you have multiple cores defined in your solr.xml you need to issue your 
queries to one of the cores. Below it seems as if you are lacking core name. 
Try instead:


http://localhost:8983/solr/aapublic/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/

And as Lance pointed out, make sure your XML files conform to the Solr XML 
format (http://wiki.apache.org/solr/UpdateXmlMessages).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 27. aug. 2010, at 15.04, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

> Thank you, Jan Høydahl. 
> 
> I used 
> http://localhost:8983/solr/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
>  I got a error "Missing solr core name in path". I have aapublic and 
> aaprivate cores. I also got a error if I used 
> http://localhost:8983/solr/aapublic/select?&shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
>  I got a null exception "java.lang.NullPointerException". 
> 
> My collections are xml files. Please let me if I can use the following way 
> you suggested.
> curl 
> "http://localhost:8983/solr/update/extract?literal.collection=aaprivate&literal.id=doc1&commit=true";
>  -F "fi...@myfile.xml"
> 
> Thanks so much as always!
> Xiaohui 
> 
> 
> -Original Message-
> From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
> Sent: Friday, August 27, 2010 7:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: how to deal with virtual collection in solr?
> 
> Hi,
> 
> Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please 
> use this style:
> &shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/
> 
> 
> However, since schema is the same, I'd opt for one index with a "collections" 
> field as the filter.
> 
> You can add that field to your schema, and then inject it as metadata on the 
> ExtractingRequestHandler call:
> 
> curl 
> "http://localhost:8983/solr/update/extract?literal.collection=aaprivate&literal.id=doc1&commit=true";
>  -F "fi...@myfile.pdf"
> 
> --
> Jan Høydahl, 

Need help with field collapsing and out of memory error

2010-09-01 Thread Moazzam Khan
Hi guys,

I have about 20k documents in the Solr index (and there's a lot of
text in each of them). I have field collapsing enabled on a specific
field (AdvisorID).

The thing is if I have field collapsing enabled in the search request
I don't get correct count for the total number of records that
matched. It always says that the number of "rows" I asked to get back
is the number of total records it found.

And, when I run a query with search criteria *:* (to get the number of
total advisors in the index) solr runs of out memory and gives me an
error saying

SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.nio.CharBuffer.wrap(CharBuffer.java:350)
at java.nio.CharBuffer.wrap(CharBuffer.java:373)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
at java.lang.StringCoding.decode(StringCoding.java:173)


This is going to be a huge problem later on when we index 50k
documents later on.

These are the options I am running Solr with :

java  -Xms2048M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:PermSize=1024m
MaxPermSize=1024m-jar  start.jar


Is there any way I can get the counts and not run out of memory?

Thanks in advance,
Moazzam


Re: missing part folder - how to debug?

2010-09-01 Thread Alex Baranau
Hi,

Adding Solr user list.

We used similar approach to the one in this patch but with Hadoop Streaming.
Did you determine that indices are really missing? I mean did you find
missing documents in the output indices?

Alex Baranau

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - HBase

2010/8/31 Koji Sekiguchi 

>  Hello,
>
> We are using Hadoop to make Solr index. We are using SOLR-1301
> that was first contributed by Andrzej:
>
> https://issues.apache.org/jira/browse/SOLR-1301
>
> It works great on testing environment, 4 servers.
> Today, we run it on production environment, 320 servers.
> We run 5120 reducers (16 per server). This results 5120 indexes
> i.e. part-X folders should be created. But about 20 part
> folders were missing, and Hadoop didn't produce any error logs.
> How can we investigate/debug this problem?
>
> Any pointers, experiences would be highly appreciated!
>
> Thanks,
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>


java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Antonio Calo'

 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an application, 
I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), another 
thread ([Finalizer]) is trying to close it. I can see from the Solr code 
that this exception is thrown always in the same place: SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that Spring 
will manage the relase of all the singleton classes. Should I do 
something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio


java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Antonio Calo'

 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an application, 
I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), another 
thread ([Finalizer]) is trying to close it. I can see from the Solr code 
that this exception is thrown always in the same place: SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that Spring 
will manage the relase of all the singleton classes. Should I do 
something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio


Re: place of log4j.properties file

2010-09-01 Thread joyce chan
Hi

Sorry to reopen this thread.  Do you guys know how to use log4jdbc in solr?

Thanks
JC

2010/3/19 Király Péter 

> Thanks David!
>
> It works. Even with relative path, like
> -Dlog4j.configuration=file:etc/log4j.properties.
>
> Péter
>
> - Original Message - From: "Smiley, David W." 
> To: 
> Cc: "Eric Pugh" 
> Sent: Friday, March 19, 2010 5:43 PM
> Subject: Re: place of log4j.properties file
>
>
>
> I believe that should have been
> -Dlog4j.configuration=file:/c:/foo/log4j.properties
> I've done this sort of thing many times before.
>
> I've also found it helpful to add -Dlog4j.debug  (no value needed) to debug
> logging.
>
> http://logging.apache.org/log4j/1.2/manual.html
>
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
>
> On Mar 19, 2010, at 12:27 PM, Király Péter wrote:
>
>  Hi,
>>
>> on page 205 of the Solr 1.4 Enterprise Search Server book there is an
>> example,
>> of how to reference log4j.properties file from Jetty. I tried that and
>> several other
>> methods (like -Dlog4j.properties=), but the only working way
>> was to put create a WEB-INF/classes directory inside the solr.war, and put
>> the file there (tip I found in the list's archive).
>>
>> Is it possible, that there is no other way?
>>
>> Thanks,
>> Péter
>>
>>
>
>
>
>
>


Download document from solr

2010-09-01 Thread Matteo Moci

 Hello to All,
I am a newbie with Solr, and I am trying to understand if I can use it 
form my purpose,
and I was wondering how Solr lists the result documents: do they appear 
as "downloadable files",
just like http://solr.machine.com/path/file.doc, or do I need develop 
another layer to take care of downloading?

Even a link to the docs might work...

Thank you,
Matteo



Re: NullpointerException when combining spellcheck component and synonyms

2010-09-01 Thread Stefan Moises
 doh, looks like I only forgot to add the spellcheck component to my 
edismax request handler... now it works with:


...

spellcheck
elevator


What's strange is that spellchecking seemed to work *without* that 
entry, too


Cheers,
Stefan

Am 01.09.2010 13:33, schrieb Stefan Moises:

 Hi there,

I am using Solr from SVN, 
https://svn.apache.org/repos/asf/lucene/dev/trunk (my last 
update/build on my dev server was in July I think)...


I've encountered a strange problem when using the Spellcheck component 
in combination with the SynonymFilter...

My "text" field is pretty standard, using the default synonyms.txt file:




words="stopwords.txt"/>
generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1"/>



protected="protwords.txt"/>





ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="1"/>



protected="protwords.txt"/>





I have only added some terms at the end of synonyms.txt:
...
# Synonym mappings can be used for spelling correction too
pixima => pixma

tekanne => teekanne
teekane => teekanne
flashen => flaschen
flasen => flaschen

Here is my query and the exception... if I turn off spellcheck, 
everything works as expected and the synonyms are found...


INFO: [db] webapp=/solr path=/select 
params={mlt.minwl=3&spellcheck=true&facet=true&mlt.fl=oxmanu_oxid,oxvendor_oxid,oxtags,oxsearchkeys&spellcheck.q=flasen&mlt.mintf=1&facet.limit=-1&mlt=true&json.nl=map&hl.fl=oxtitle&hl.fl=oxshortdesc&hl.fl=oxlongdesc&hl.fl=oxtags&hl.fl=seodesc&hl.fl=seokeywords&wt=json&hl=true&rows=10&version=1.2&mlt.mindf=1&debugQuery=true&facet.sort=lex&start=0&q=flasen&facet.field=oxcat_oxid&facet.field=oxcat_oxidtitle&facet.field=oxprice&facet.field=oxmanu_oxid&facet.field=oxmanu_oxidtitle&facet.field=oxvendor_oxid&facet.field=oxvendor_oxidtitle&facet.field=attrgroup_oxid&facet.field=attrgroup_oxidtitle&facet.field=attrgroup_oxidvalue&facet.field=attrvalue_oxid&facet.field=attrvalue_oxidtitle&facet.field=attr2attrgroup_oxidtitle&qt=dismax&spellcheck.build=false} 
hits=2 status=500 QTime=14

01.09.2010 12:54:47 org.apache.solr.common.SolrException log
SCHWERWIEGEND: java.lang.NullPointerException
at 
org.apache.lucene.util.AttributeSource.cloneAttributes(AttributeSource.java:470)
at 
org.apache.lucene.analysis.synonym.SynonymFilter.incrementToken(SynonymFilter.java:128)
at 
org.apache.lucene.analysis.core.StopFilter.incrementToken(StopFilter.java:260)
at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:336)
at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
at 
org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:380)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:127)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:619)

Thanks for any hint what I may be doing wrong! :)
Stefan




--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax

MoreLikethis and fq not giving exact results ?

2010-09-01 Thread Sumit Arora
Hi All,

 I have provided identifications ,While submitting document to Solr e.g; jp_
for job posting , cp_ for career profile , and it stores id in a form of :
jp_1, or jp_2 etc or cp_1 or cp_2 etc.

 So when I perform standard query with fq=cp_ , then its provide me the
results belong to cp_ only or jp only.

 But when I enable mlt inside the query it returns the results for jp_ as
well, because job_title also exist in job posting ( though jp_ or cp_
already differentiating to both of this ?)

e.g;

http://192.168.1.4:8983/solr/select/?mlt=true&mlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_looking&version=1.2&q=id%3A
*cp_4*&start=0&rows=100&*fq=cp_*
*
*
*
*
*How I can effectively use FilterQuery and MoreLikeThis ?*
*
*
*/Sumit*
*
*
*
*


Re: Solr Admin Schema Browser and field named "keywords"

2010-09-01 Thread Shawn Heisey

 On 8/26/2010 5:04 PM, Chris Hostetter wrote:

doubtful.

I suspect it has more to do with the amount of data in your keywords
field and the underlying request to hte LukeRequestHandler timing out.

   have you tried using it with a test index where the "keywords"
field has only a few words in it?


It just occurred to me that there probably isn't enough data in the 
keywords field to cause this.  It is one of four fields copied into the 
catchall field, and is nowhere near as large as the ft_text field that 
is also copied to catchall.  The schema browser has always worked on the 
catchall field.


Actually, on a test index that I just built (with my leading/trailing 
punctuation filter included), I CAN access the keywords field.  
Bizarre.  Ideas?


Thanks,
Shawn



Auto Suggest

2010-09-01 Thread Jazz Globe

Hallo

How would one implement a multiple term auto-suggest feature in Solr that is 
filter sensitive?
For example, a user enters :
"mp3"
  and solr might suggest:
  ->   "mp3 player"
  ->   "mp3 nano"
  ->   "mp3 sony"
and then the user starts the second word :
"mp3 n"
and that narrows it down to:
  -> "mp3 nano"

I had a quick look at the Terms Component.
I suppose it just returns term totals for the entire index and cannot be used 
with a filter or query?

Thanks
Johan

  

Re: shingles work in analyzer but not real data

2010-09-01 Thread Markus Jelsma
If your use-case is limited to this, why don't you encapsulate all queries in 
double quotes? 

On Wednesday 01 September 2010 14:21:47 Jeff Rose wrote:
> Hi,
>   We are using SOLR to match query strings with a keyword database, where
> some of the keywords are actually more than one word.  For example a
>  keyword might be "apple pie" and we only want it to match for a query
>  containing that word pair, but not one only containing "apple".  Here is
>  the relevant piece of the schema.xml, defining the index and query
>  pipelines:
> 
>   
>  
>
> 
> 
>  
>  
> 
> 
> 
> 
>   
>
> 
> In the analysis tool this schema looks like it works correctly.  Our
> multi-word keywords are indexed as a single entry, and then when a search
> phrase contains one of these multi-word keywords it is shingled and
>  matched. Unfortunately, when we do the same queries on top of the actual
>  index it responds with zero matches.  I can see in the index histogram
>  that the terms are correctly indexed from our mysql datasource containing
>  the keywords, but somehow the shingling doesn't appear to work on this
>  live data.  Does anyone have experience with shingling that might have
>  some tips for us, or otherwise advice for debugging the issue?
> 
> Thanks,
> Jeff
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: shingles work in analyzer but not real data

2010-09-01 Thread Robert Muir
On Wed, Sep 1, 2010 at 8:21 AM, Jeff Rose  wrote:

> Hi,
>  We are using SOLR to match query strings with a keyword database, where
> some of the keywords are actually more than one word.  For example a
> keyword
> might be "apple pie" and we only want it to match for a query containing
> that word pair, but not one only containing "apple".  Here is the relevant
> piece of the schema.xml, defining the index and query pipelines:
>
>  
> 
>   
>
>
> 
> 
>
> 
>
> 
>  
>   
>
> In the analysis tool this schema looks like it works correctly.  Our
> multi-word keywords are indexed as a single entry, and then when a search
> phrase contains one of these multi-word keywords it is shingled and
> matched.
>  Unfortunately, when we do the same queries on top of the actual index it
> responds with zero matches.  I can see in the index histogram that the
> terms
> are correctly indexed from our mysql datasource containing the keywords,
> but
> somehow the shingling doesn't appear to work on this live data.  Does
> anyone
> have experience with shingling that might have some tips for us, or
> otherwise advice for debugging the issue?
>

query-time shingling probably isnt working with the queryparser you are
using, the default lucene one first splits on whitespace before sending it
to the analyzer: e.g. a query of foo bar is processed as TokenStream(foo) +
TokenStream(bar)

so query-time shingling like this doesn't work as you expect for this
reason.


-- 
Robert Muir
rcm...@gmail.com


Re: Problems indexing spatial field - undefined subField

2010-09-01 Thread Thomas Joiner
While you have already solved your problem, my guess as to why it didn't
work originally is that you probably didn't have a


What subFieldType does is it registers a dynamicField for you.
 subFieldSuffix requires that you have already defined that dynamicField.

On Tue, Aug 31, 2010 at 8:07 PM, Simon Wistow  wrote:

> On Wed, Sep 01, 2010 at 01:05:47AM +0100, me said:
> > I'm trying to index a latLon field.
> >
> >  subFieldSuffix="_latLon"/>
> > 
>
> Turns out changing it to
>
> 
>
> fixed it.
>
>
>


shingles work in analyzer but not real data

2010-09-01 Thread Jeff Rose
Hi,
  We are using SOLR to match query strings with a keyword database, where
some of the keywords are actually more than one word.  For example a keyword
might be "apple pie" and we only want it to match for a query containing
that word pair, but not one only containing "apple".  Here is the relevant
piece of the schema.xml, defining the index and query pipelines:

  
 
   


 
 




  
   

In the analysis tool this schema looks like it works correctly.  Our
multi-word keywords are indexed as a single entry, and then when a search
phrase contains one of these multi-word keywords it is shingled and matched.
 Unfortunately, when we do the same queries on top of the actual index it
responds with zero matches.  I can see in the index histogram that the terms
are correctly indexed from our mysql datasource containing the keywords, but
somehow the shingling doesn't appear to work on this live data.  Does anyone
have experience with shingling that might have some tips for us, or
otherwise advice for debugging the issue?

Thanks,
Jeff


Re: Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

On 01/09/2010 13:54, Xavier Schepler wrote:

On 01/09/2010 12:38, Markus Jelsma wrote:

I think you need to enable usePhraseHighlighter in order to use the
highlightMultiTerm parameter.

  On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:

Hi,

can the highlighting component highlight terms only if the distance
between them matches the query ?
I use those parameters :

hl=on&hl.fl=qFR,iFR,mFR,vlFR&hl.usePhraseHighlighter=false&hl.highlightMult 

iTerm=true&hl.simple.pre=&hl.simple.post=<%2Fb>&hl.mergeContiguous=false 




Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



yes, you're right.


but it doesn't help for the other problem


Re: Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

On 01/09/2010 12:38, Markus Jelsma wrote:

I think you need to enable usePhraseHighlighter in order to use the
highlightMultiTerm parameter.

  On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:
   

Hi,

can the highlighting component highlight terms only if the distance
between them matches the query ?
I use those parameters :

hl=on&hl.fl=qFR,iFR,mFR,vlFR&hl.usePhraseHighlighter=false&hl.highlightMult
iTerm=true&hl.simple.pre=&hl.simple.post=<%2Fb>&hl.mergeContiguous=false

 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


   

yes, you're right.


NullpointerException when combining spellcheck component and synonyms

2010-09-01 Thread Stefan Moises

 Hi there,

I am using Solr from SVN, 
https://svn.apache.org/repos/asf/lucene/dev/trunk (my last update/build 
on my dev server was in July I think)...


I've encountered a strange problem when using the Spellcheck component 
in combination with the SynonymFilter...

My "text" field is pretty standard, using the default synonyms.txt file:




words="stopwords.txt"/>
generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1"/>









ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="1"/>









I have only added some terms at the end of synonyms.txt:
...
# Synonym mappings can be used for spelling correction too
pixima => pixma

tekanne => teekanne
teekane => teekanne
flashen => flaschen
flasen => flaschen

Here is my query and the exception... if I turn off spellcheck, 
everything works as expected and the synonyms are found...


INFO: [db] webapp=/solr path=/select 
params={mlt.minwl=3&spellcheck=true&facet=true&mlt.fl=oxmanu_oxid,oxvendor_oxid,oxtags,oxsearchkeys&spellcheck.q=flasen&mlt.mintf=1&facet.limit=-1&mlt=true&json.nl=map&hl.fl=oxtitle&hl.fl=oxshortdesc&hl.fl=oxlongdesc&hl.fl=oxtags&hl.fl=seodesc&hl.fl=seokeywords&wt=json&hl=true&rows=10&version=1.2&mlt.mindf=1&debugQuery=true&facet.sort=lex&start=0&q=flasen&facet.field=oxcat_oxid&facet.field=oxcat_oxidtitle&facet.field=oxprice&facet.field=oxmanu_oxid&facet.field=oxmanu_oxidtitle&facet.field=oxvendor_oxid&facet.field=oxvendor_oxidtitle&facet.field=attrgroup_oxid&facet.field=attrgroup_oxidtitle&facet.field=attrgroup_oxidvalue&facet.field=attrvalue_oxid&facet.field=attrvalue_oxidtitle&facet.field=attr2attrgroup_oxidtitle&qt=dismax&spellcheck.build=false} 
hits=2 status=500 QTime=14

01.09.2010 12:54:47 org.apache.solr.common.SolrException log
SCHWERWIEGEND: java.lang.NullPointerException
at 
org.apache.lucene.util.AttributeSource.cloneAttributes(AttributeSource.java:470)
at 
org.apache.lucene.analysis.synonym.SynonymFilter.incrementToken(SynonymFilter.java:128)
at 
org.apache.lucene.analysis.core.StopFilter.incrementToken(StopFilter.java:260)
at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:336)
at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
at 
org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:380)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:127)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:619)

Thanks for any hint what I may be doing wrong! :)
Stefan



Re: Proximity search + Highlighting

2010-09-01 Thread Markus Jelsma
I think you need to enable usePhraseHighlighter in order to use the 
highlightMultiTerm parameter.

 On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:
> Hi,
> 
> can the highlighting component highlight terms only if the distance
> between them matches the query ?
> I use those parameters :
> 
> hl=on&hl.fl=qFR,iFR,mFR,vlFR&hl.usePhraseHighlighter=false&hl.highlightMult
> iTerm=true&hl.simple.pre=&hl.simple.post=<%2Fb>&hl.mergeContiguous=false
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

Hi,

can the highlighting component highlight terms only if the distance 
between them matches the query ?

I use those parameters :

hl=on&hl.fl=qFR,iFR,mFR,vlFR&hl.usePhraseHighlighter=false&hl.highlightMultiTerm=true&hl.simple.pre=&hl.simple.post=<%2Fb>&hl.mergeContiguous=false


Re: Deploying Solr 1.4.1 in JbossAs 6

2010-09-01 Thread Grijesh.singh

1-extract the solr.war
2-edit the web.xml for setting solr/home param
3-create the solr.war
4-setup solr home directory
5-copy the solr.war to JBossAs 6 deploy directory
7-start the jboss server
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Deploying-Solr-1-4-1-in-JbossAs-6-tp1392539p1398859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implementing Memcache for Solr

2010-09-01 Thread Grijesh.singh

As per my experience with memcache was not so good .
Finaly I have configured solr's built in cache for best perfoemance.
By memcache we were caching query,bu it solr provides already.
you can take a call after Load testing with memcache and without memcache 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-Memcache-for-Solr-tp1398625p1398823.html
Sent from the Solr - User mailing list archive at Nabble.com.