Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

2012-09-05 Thread guenter.hip...@unibas.ch
Hoss, I'm so happy you realized the problem because I was quite worried 
about it!!


Let me know if I can provide support with testing it.
The last two days I was busy with migrating a bunch of hosts which 
should -hopefully- be finished today.

Then I have again the infrastructure for running tests

Günter

On 09/05/2012 11:19 PM, Chris Hostetter wrote:

: Subject: Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

Günter, This is definitely strange

The good news is, i can reproduce your problem.
The bad news is, i can reproduce your problem - and i have no idea what's
causing it.

I've opened SOLR-3793 to try to get to the bottom of this, and included
some basic steps to demonstrate the bug using the Solr 4.0-BETA example
data, but i'm really not sure what the problem might be...

https://issues.apache.org/jira/browse/SOLR-3793


-Hoss



--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
e-mailguenter.hip...@unibas.ch
URL:www.swissbib.org   /http://www.ub.unibas.ch/



Re: Searching of Chinese characters and English

2012-09-05 Thread waynelam

Thank you Lance.
I just found out the problem, in case somebody came across this.
It turn out to be the problem that tomcat is not accepting UTF-8 in URL 
by default.


http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

I have no idea why it is the case but after i follow the instruction in 
the document above.


Problem solved!!

Thanks so much for your help!


Wayne


On 6/9/2012 11:19, Lance Norskog wrote:

I believe that you should remove the Analyzer class name from the field type. I think 
it overrides the stacks of tokenizer/tokenfilter. Other  
declarations do not have an Analyzer class and Tokenizers.
  
should be:
  

This may not help with your searching problem.

- Original Message -
| From: "waynelam" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, September 5, 2012 8:07:36 PM
| Subject: Re: Searching of Chinese characters and English
|
| Any thoughts?
|
| It is weird, i can see the words are cutting correctly in Field
| Analysis. I checked almost every website that they are telling either
| CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the
| words are cutting then it should not be the problem of settings of
| different Analyzer. Am I correct?
|
| Anyone have an idea or hints?
|
| Thanks so much
|
| Wayne
|
|
|
| On 4/9/2012 13:03, waynelam wrote:
| > Hi all,
| >
| > I tried to modified the schema.xml and solrconfig.xml come with
| > Drupal
| > "search_api_solr" modules. I tried to modified it so that it is
| > suitable for an CJK environment. I can see Chinese words cut up
| > each 2
| > words in "Field Analysis". If i use the following query
| >
| > 
my_ip_address:8080/solr/select?indent=on&version=2.2&fq=t_title:"Find"&start=0&rows=10&fl=t_title
| >
| >
| > I can see it returning results. The problem is when i change the
| > search keywords for one of my field (e.g. t_title) to Chinese
| > characters. It always shows
| >
| > 
| >
| > in the results. It is strange because if a title contains both
| > chinese
| > and english (e.g. testing ??), when i search just the english part
| > (e.g. fq=t_title:"testing"), i can find the result perfectly. It
| > just
| > happened to be problem when searching chinese characters.
| >
| >
| > Much appreciated if you guys can show me which part i did wrong.
| >
| > Thanks
| >
| > Wayne
| >
| > *My Settings:*
| > Java : 1.6.0_24
| > Solr : version 3.6.1
| > tomcat: version 6.0.35
| >
| > *My schema.xml* (i highlighted the place i changed from default)
| >
| > * stored="true" multiValued="true">**
| > **   class="org.apache.lucene.analysis.cjk.CJKAnalyzer">**
| > ** class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>**
| > ** generateWordParts="1" generateNumberParts="1" catenateWords="1"
| > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>**
| > ****
| > ** language="English" protected="protwords.txt"/>**
| > ** class="solr.RemoveDuplicatesTokenFilterFactory"/>**
| > ** version="icu4j" composed="false" remove_diacritics="true"
| > remove_modifiers="true" fold="true"/>**
| > ****
| > **  **
| > **   class="org.apache.lucene.analysis.cjk.CJKAnalyzer">**
| > ** class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>**
| > ** generateWordParts="1" generateNumberParts="1" catenateWords="0"
| > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>**
| > ****
| > ** language="English" protected="protwords.txt"/>**
| > ** class="solr.RemoveDuplicatesTokenFilterFactory"/>**
| > ** version="icu4j" composed="false" remove_diacritics="true"
| > remove_modifiers="true" fold="true"/>**
| > ****
| > **  **
| > ***
| >
| >  indexed="true"
| > stored="true" sortMissingLast="true" omitNorms="true">
| >   
| >
| > 
| >
| > 
| > 
| >   
| > 
| >
| >  indexed="true" />
| >
| >  class="solr.StrField" />
| >  
| >  
| >
| >stored="true"
| > required="true" />
| >stored="true"
| > required="true" />
| >stored="true"
| > required="true" />
| >
| >
| >stored="true"
| > multiValued="true"/>
| >
| >
| > * autoGeneratePhraseQueries="false"/>*
| >
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| >
| >
| >
| >  
| >
| >  id
| >  
| >
| > 
| >
|
|
| --
| -
| Wayne Lam
| Assistant Librarian II
| Systems Development & Support
| Fong Sum Wood Library
| Lingnan University
| 8 Castle Peak Road
| Tuen Mun, New Territories
| Hong Kong SAR
| China
| Phone:   +852 26168576
| Email:   wayne...@ln.edu.hk
| Website: http://www.library.ln.edu.hk
|
|



--
---

Re: Searching of Chinese characters and English

2012-09-05 Thread Lance Norskog
I believe that you should remove the Analyzer class name from the field type. I 
think it overrides the stacks of tokenizer/tokenfilter. Other  
declarations do not have an Analyzer class and Tokenizers.
 
should be:
 

This may not help with your searching problem.

- Original Message -
| From: "waynelam" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, September 5, 2012 8:07:36 PM
| Subject: Re: Searching of Chinese characters and English
| 
| Any thoughts?
| 
| It is weird, i can see the words are cutting correctly in Field
| Analysis. I checked almost every website that they are telling either
| CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the
| words are cutting then it should not be the problem of settings of
| different Analyzer. Am I correct?
| 
| Anyone have an idea or hints?
| 
| Thanks so much
| 
| Wayne
| 
| 
| 
| On 4/9/2012 13:03, waynelam wrote:
| > Hi all,
| >
| > I tried to modified the schema.xml and solrconfig.xml come with
| > Drupal
| > "search_api_solr" modules. I tried to modified it so that it is
| > suitable for an CJK environment. I can see Chinese words cut up
| > each 2
| > words in "Field Analysis". If i use the following query
| >
| > 
my_ip_address:8080/solr/select?indent=on&version=2.2&fq=t_title:"Find"&start=0&rows=10&fl=t_title
| >
| >
| > I can see it returning results. The problem is when i change the
| > search keywords for one of my field (e.g. t_title) to Chinese
| > characters. It always shows
| >
| > 
| >
| > in the results. It is strange because if a title contains both
| > chinese
| > and english (e.g. testing ??), when i search just the english part
| > (e.g. fq=t_title:"testing"), i can find the result perfectly. It
| > just
| > happened to be problem when searching chinese characters.
| >
| >
| > Much appreciated if you guys can show me which part i did wrong.
| >
| > Thanks
| >
| > Wayne
| >
| > *My Settings:*
| > Java : 1.6.0_24
| > Solr : version 3.6.1
| > tomcat: version 6.0.35
| >
| > *My schema.xml* (i highlighted the place i changed from default)
| >
| > * stored="true" multiValued="true">**
| > **   class="org.apache.lucene.analysis.cjk.CJKAnalyzer">**
| > ** class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>**
| > ** generateWordParts="1" generateNumberParts="1" catenateWords="1"
| > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>**
| > ****
| > ** language="English" protected="protwords.txt"/>**
| > ** class="solr.RemoveDuplicatesTokenFilterFactory"/>**
| > ** version="icu4j" composed="false" remove_diacritics="true"
| > remove_modifiers="true" fold="true"/>**
| > ****
| > **  **
| > **   class="org.apache.lucene.analysis.cjk.CJKAnalyzer">**
| > ** class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>**
| > ** generateWordParts="1" generateNumberParts="1" catenateWords="0"
| > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>**
| > ****
| > ** language="English" protected="protwords.txt"/>**
| > ** class="solr.RemoveDuplicatesTokenFilterFactory"/>**
| > ** version="icu4j" composed="false" remove_diacritics="true"
| > remove_modifiers="true" fold="true"/>**
| > ****
| > **  **
| > ***
| >
| >  indexed="true"
| > stored="true" sortMissingLast="true" omitNorms="true">
| >   
| >
| > 
| >
| > 
| > 
| >   
| > 
| >
| >  indexed="true" />
| >
| >  class="solr.StrField" />
| >  
| >  
| >
| >stored="true"
| > required="true" />
| >stored="true"
| > required="true" />
| >stored="true"
| > required="true" />
| >
| >
| >stored="true"
| > multiValued="true"/>
| >
| >
| > * autoGeneratePhraseQueries="false"/>*
| >
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| > termVectors="true" />
| >
| >
| >
| >  
| >
| >  id
| >  
| >
| > 
| >
| 
| 
| --
| -
| Wayne Lam
| Assistant Librarian II
| Systems Development & Support
| Fong Sum Wood Library
| Lingnan University
| 8 Castle Peak Road
| Tuen Mun, New Territories
| Hong Kong SAR
| China
| Phone:   +852 26168576
| Email:   wayne...@ln.edu.hk
| Website: http://www.library.ln.edu.hk
| 
| 


Re: Document Processing

2012-09-05 Thread Lance Norskog
There is another way to do this: crawl the mobile site! 

The Fennec browser from Mozilla talks Android. I often use it to get pagecrap 
off my screen.

- Original Message -
| From: "Lance Norskog" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, August 29, 2012 7:37:37 PM
| Subject: Re: Document Processing
| 
| I've seen the JSoup HTML parser library used for this. It worked
| really well. The Boilerpipe library may be what you want. Its
| schwerpunkt (*) is to separate boilerplate from wanted text in an
| HTML
| page. I don't know what fine-grained control it has.
| 
| * raison d'être. There is no English word for this concept.
| 
| On Tue, Dec 6, 2011 at 1:39 PM, Tommaso Teofili
|  wrote:
| > Hello Michael,
| >
| > I can help you with using the UIMA UpdateRequestProcessor [1]; the
| > current
| > implementation uses in-memory execution of UIMA pipelines but since
| > I was
| > planning to add the support for higher scalability (with UIMA-AS
| > [2]) that
| > may help you as well.
| >
| > Tommaso
| >
| > [1] :
| > 
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java
| > [2] : http://uima.apache.org/doc-uimaas-what.html
| >
| > 2011/12/5 Michael Kelleher 
| >
| >> Hello Erik,
| >>
| >> I will take a look at both:
| >>
| >> org.apache.solr.update.**processor.**LangDetectLanguageIdentifierUp**
| >> dateProcessor
| >>
| >> and
| >>
| >> org.apache.solr.update.**processor.**TikaLanguageIdentifierUpdatePr**
| >> ocessor
| >>
| >>
| >> and figure out what I need to extend to handle processing in the
| >> way I am
| >> looking for.  I am assuming that "component" configuration is
| >> handled in a
| >> standard way such that I can configure my new UpdateProcessor in
| >> the same
| >> way I would configure any other UpdateProcessor "component"?
| >>
| >> Thanks for the suggestion.
| >>
| >>
| >> 1 more question:  given that I am probably going to convert the
| >> HTML to
| >> XML so I can use XPath expressions to "extract" my content, do you
| >> think
| >> that this kind of processing will overload Solr?  This Solr
| >> instance will
| >> be used solely for indexing, and will only ever have a single
| >> ManifoldCF
| >> crawling job feeding it documents at one time.
| >>
| >> --mike
| >>
| 
| 
| 
| --
| Lance Norskog
| goks...@gmail.com
| 


Re: Searching of Chinese characters and English

2012-09-05 Thread waynelam

Any thoughts?

It is weird, i can see the words are cutting correctly in Field 
Analysis. I checked almost every website that they are telling either 
CJKAnalyzer, IKAnalyzer or SmartChineseAnalyzer. But if i can see the 
words are cutting then it should not be the problem of settings of 
different Analyzer. Am I correct?


Anyone have an idea or hints?

Thanks so much

Wayne



On 4/9/2012 13:03, waynelam wrote:

Hi all,

I tried to modified the schema.xml and solrconfig.xml come with Drupal 
"search_api_solr" modules. I tried to modified it so that it is 
suitable for an CJK environment. I can see Chinese words cut up each 2 
words in "Field Analysis". If i use the following query


my_ip_address:8080/solr/select?indent=on&version=2.2&fq=t_title:"Find"&start=0&rows=10&fl=t_title 



I can see it returning results. The problem is when i change the 
search keywords for one of my field (e.g. t_title) to Chinese 
characters. It always shows




in the results. It is strange because if a title contains both chinese 
and english (e.g. testing ??), when i search just the english part 
(e.g. fq=t_title:"testing"), i can find the result perfectly. It just 
happened to be problem when searching chinese characters.



Much appreciated if you guys can show me which part i did wrong.

Thanks

Wayne

*My Settings:*
Java : 1.6.0_24
Solr : version 3.6.1
tomcat: version 6.0.35

*My schema.xml* (i highlighted the place i changed from default)

*stored="true" multiValued="true">**
**  class="org.apache.lucene.analysis.cjk.CJKAnalyzer">**
**class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>**
**generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>**

****
**language="English" protected="protwords.txt"/>**

****
**version="icu4j" composed="false" remove_diacritics="true" 
remove_modifiers="true" fold="true"/>**

****
**  **
**  class="org.apache.lucene.analysis.cjk.CJKAnalyzer">**
**class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>**
**generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>**

****
**language="English" protected="protwords.txt"/>**

****
**version="icu4j" composed="false" remove_diacritics="true" 
remove_modifiers="true" fold="true"/>**

****
**  **
***

stored="true" sortMissingLast="true" omitNorms="true">

  





  




class="solr.StrField" />

 
 

   required="true" />
   required="true" />
   required="true" />


   
   multiValued="true"/>

   

*autoGeneratePhraseQueries="false"/>*

   
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />
   termVectors="true" />

   
   
   
 

 id
 






--
-
Wayne Lam
Assistant Librarian II
Systems Development & Support
Fong Sum Wood Library
Lingnan University
8 Castle Peak Road
Tuen Mun, New Territories
Hong Kong SAR
China
Phone:   +852 26168576
Email:   wayne...@ln.edu.hk
Website: http://www.library.ln.edu.hk



Re: Problem with verifying signature ?

2012-09-05 Thread Kiran Jayakumar
Thank you Hoss. I imported the KEYS file using *gpg --import KEYS.txt*.
Then I did the *--verify* again. This time I get an output like this:

gpg: Signature made 08/06/12 19:52:21 Pacific Daylight Time using RSA key
ID 322
D7ECA
gpg: Good signature from "Robert Muir (Code Signing Key) "
*gpg: WARNING: This key is not certified with a trusted signature!*
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: 6661 9BA3 C030 DD55 3625  1303 817A E1DD 322D 7ECA

Is this acceptable ?

Thanks

On Wed, Sep 5, 2012 at 5:38 PM, Chris Hostetter wrote:

> : I download solr 4.0 beta and the .asc file. I use gpg4win and type this
> in
> : the command line:
> :
> : >gpg --verify file.zip file.asc
> :
> : I get a message like this:
> :
> : *gpg: Can't check signature: No public key*
>
> you can verify the asc sig file using the public KEYS file hosted on the
> main apache download site (do not trust asc or KEYS from a download
> mirror, that defeats the point)
>
>
> https://www.apache.org/dist/lucene/solr/KEYS
>
>
>
> -Hoss
>


deletedPkQuery not work in solr 3.3

2012-09-05 Thread jun Wang
I have a data-config.xml with 2 entity, like


...


and


...


entity delta_build is for delta import, query is

?command=full-import&entity=delta_build&clean=false

and I want to using deletedPkQuery to delete index. So I have add those to
entity "delta_build"

deltaQuery="select -1 as ID from dual"

deltaImportQuery="select * from product where a.id='${dataimporter.delta.ID}' "

deletedPKQuery="select product_id as ID from modified_product where
gmt_create > to_date('${dataimporter.last_index_time}','-mm-dd
hh24:mi:ss') and modification = 'deleted'"

deltaQuery and deltaImportQuery is simply to avoid delta import any
records, course delta import has been implement by full import. and I am
just want using delta for delete index.

But when I hit query

?command=delta-import

deltaQuery and deltaImportQuery can be found in log, and without
deletedPKQuery. Is there any thing wrong in config file?

-- 
from Jun Wang


Re: Problem with verifying signature ?

2012-09-05 Thread Chris Hostetter
: I download solr 4.0 beta and the .asc file. I use gpg4win and type this in
: the command line:
: 
: >gpg --verify file.zip file.asc
: 
: I get a message like this:
: 
: *gpg: Can't check signature: No public key*

you can verify the asc sig file using the public KEYS file hosted on the 
main apache download site (do not trust asc or KEYS from a download 
mirror, that defeats the point)


https://www.apache.org/dist/lucene/solr/KEYS



-Hoss


Re: Solr not allowing persistent HTTP connections

2012-09-05 Thread Aleksey Vorona
Some extra information. If I use curl and force it to use HTTP 1.0, it 
is more visible that Solr doesn't allow persistent connections:


$ curl -v -0 'http://localhost:8983/solr/select?q=*:*' -H'Connection: 
Keep-Alive'* About to connect() to localhost port 8983 (#0)

*   Trying ::1... connected
> GET /solr/select?q=*:* HTTP/1.0
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 
OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3

> Host: localhost:8983
> Accept: */*
> Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Content-Type: application/xml; charset=UTF-8
* no chunk, no close, no size. Assume close to signal end
<


...removed the rest of the response body...

-- Aleksey

On 12-09-05 03:54 PM, Aleksey Vorona wrote:

Hi,

Running example Solr from the 3.6.1 distribution I can not make it to
keep persistent HTTP connections:

$ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep
Keep-Alive
Keep-Alive requests:0

What should I change to fix that?

P.S. We have the same issue in production with Jetty 7, but I thought it
would be better to ask about Solr example, since it is easier for anyone
to reproduce the issue.

-- Aleksey





Solr not allowing persistent HTTP connections

2012-09-05 Thread Aleksey Vorona

Hi,

Running example Solr from the 3.6.1 distribution I can not make it to 
keep persistent HTTP connections:


$ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep 
Keep-Alive

Keep-Alive requests:0

What should I change to fix that?

P.S. We have the same issue in production with Jetty 7, but I thought it 
would be better to ask about Solr example, since it is easier for anyone 
to reproduce the issue.


-- Aleksey


Re: Delete all documents in the index

2012-09-05 Thread Mark Mandel
Thanks for posting this!

I ran into exactly this issue yesterday, and ended up felting the files to
get around it.

Mark

Sent from my mobile doohickey.
On Sep 6, 2012 4:13 AM, "Rohit Harchandani"  wrote:

> Thanks everyone. Adding the _version_ field in the schema worked.
> Deleting the data directory works for me, but was not sure why deleting
> using curl was not working.
>
> On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta <
> michael.della.bi...@appinions.com> wrote:
>
> > Rohit:
> >
> > If it's easy, the easiest thing to do is to turn off your servlet
> > container, rm -r * inside of the data directory, and then restart the
> > container.
> >
> > Michael Della Bitta
> >
> > 
> > Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> > www.appinions.com
> > Where Influence Isn’t a Game
> >
> >
> > On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky  >
> > wrote:
> > > Check to make sure that you are not stumbling into SOLR-3432:
> > "deleteByQuery
> > > silently ignored if updateLog is enabled, but {{_version_}} field does
> > not
> > > exist in schema".
> > >
> > > See:
> > > https://issues.apache.org/jira/browse/SOLR-3432
> > >
> > > This could happen if you kept the new 4.0 solrconfig.xml, but copied in
> > your
> > > pre-4.0 schema.xml.
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: Rohit Harchandani
> > > Sent: Wednesday, September 05, 2012 12:48 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Delete all documents in the index
> > >
> > >
> > > Hi,
> > > I am having difficulty deleting documents from the index using curl.
> The
> > > urls i tried were:
> > > curl "http://localhost:9020/solr/core1/update/?stream.body=
> > > *:*&commit=true"
> > > curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
> > > "Content-Type: text/xml" --data-binary 'id:[* TO
> > > *]'
> > > curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
> > > "Content-Type: text/xml" --data-binary
> > '*:*'
> > > I also tried:
> > > curl "
> > >
> >
> http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true
> > > "
> > > as suggested on some forums. I get a response with status=0 in all
> cases,
> > > but none of the above seem to work.
> > > When I run
> > > curl "http://localhost:9020/solr/core1/select?q=*:*&rows=0&wt=xml";
> > > I still get a value for "numFound".
> > >
> > > I am currently using solr 4.0 beta version.
> > >
> > > Thanks for your help in advance.
> > > Regards,
> > > Rohit
> >
>


Duplicates in the suggester.

2012-09-05 Thread sharath jagannath
Not sure whether it is a duplicate question. Did try to browse through the
archive and did not find anything specific to what I was looking for.
I see duplicates in the dictionary if I update the document concurrently.

I am using Solr 3.6.1 with the following configurations for suggester:

Solr Config:
   
text_auto_suggest

suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.tst.TSTLookup
name_auto
true




true
suggest
10


suggest



Schema:





















Example text I would be indexing for suggester:
foo_bar %|4%|1%|food

%| - used as a combiner,
Part 1: foo_bar, Name of the entity
Part 2: number of activities(application specific) on the entity.
Part 3: id of the document.
Part 4: food, category of the entity.

As I mentioned earlier, I saw duplicates in the spellcheck index documents
when I updated the concurrently.


foo_bar %|4%|1%|food
foo_bar %|1%|1%|food
foo_bar %|2%|1%|food
foo_bar %|3%|1%|food


I do not see duplicates when I update the documents sequentially. I have a
strong doubt this is happening because of the way I am combining multiple
fields using %|.
Would appreciate if somebody could suggest any suitable changes that would
help me with this issue.


-- 
Thanks,
Sharath


Re: Solr 4.0 BETA Replication problems on Tomcat

2012-09-05 Thread Ravi Solr
The replication finally worked after I removed the compression setting
from the solrconfig.xml on the slave. Thanks for providing the
workaround.

Ravi Kiran

On Wed, Sep 5, 2012 at 10:23 AM, Ravi Solr  wrote:
> Wow, That was quick. Thank you very much Mr. Siren. I shall remove the
> compression node in the solrconfig.xml and let you know how it went.
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
> On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren  wrote:
>> I opened SOLR-3789. As a workaround you can remove > name="compression">internal from the config and it should work.
>>
>> --
>>  Sami Siren
>>
>> On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr  wrote:
>>> Hello,
>>> I have a very simple setup one master and one slave configured
>>> as below, but replication keeps failing with stacktrace as shown
>>> below. Note that 3.6 works fine on the same machines so I am thinking
>>> that Iam missing something in configuration with regards to solr
>>> 4.0...can somebody kindly let me know if Iam missing something ? I am
>>> running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
>>> problem with SOLR on glassfish, this is the first time Iam using it on
>>> Tomcat
>>>
>>> On Master
>>>
>>> 
>>>  
>>>   commit
>>>   optimize
>>>   schema.xml,stopwords.txt,synonyms.txt
>>>   00:00:10
>>>   
>>> 
>>>
>>> On Slave
>>>
>>> 
>>>  
>>> >> name="masterUrl">http://testslave:8080/solr/mycore/replication
>>>
>>> 00:00:50
>>> internal
>>> 5000
>>> 1
>>>  
>>> 
>>>
>>>
>>> Error
>>>
>>> 22:44:10WARNING SnapPuller  Error in fetching packets
>>>
>>> java.util.zip.ZipException: unknown compression method
>>> at 
>>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
>>> at 
>>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
>>> at 
>>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
>>> at 
>>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
>>> at 
>>> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
>>> at 
>>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
>>> at 
>>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>>> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
>>> at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>> at 
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>> at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>> at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>> at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>> at java.lang.Thread.run(Thread.java:662)
>>>
>>> 22:44:10SEVERE  ReplicationHandler  SnapPull failed
>>> :org.apache.solr.common.SolrException: Unable to download
>>> _3_Lucene40_0.tip completely. Downloaded 0!=170 at
>>> org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
>>> at 
>>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999)
>>> at 
>>> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
>>> at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
>>> at 
>>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>>> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>> at 
>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>> at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>> at 
>>> java.u

Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

2012-09-05 Thread Chris Hostetter

: Subject: Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

Günter, This is definitely strange

The good news is, i can reproduce your problem. 
The bad news is, i can reproduce your problem - and i have no idea what's 
causing it.

I've opened SOLR-3793 to try to get to the bottom of this, and included 
some basic steps to demonstrate the bug using the Solr 4.0-BETA example 
data, but i'm really not sure what the problem might be...

https://issues.apache.org/jira/browse/SOLR-3793


-Hoss

Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Chris Hostetter

: Actually, I didn't technically "upgrade". I downloaded the new
: version, grabbed the example, and pasted in the fields from my schema
: into the new one. So the only two files I changed from the example are
: schema.xml and solr.xml.

ok -- so with the fix for SOLR-3432, anyone who tries similar steps with 
4.0-final will get a clear error on startup -- that was my main concern.  
thanks for clarifying.


-Hoss


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Jack Krupansky
And when you pasted your 3.5 fields into the 4.0 schema, did you delete the 
existing fields (including _version_) at the same time?


-- Jack Krupansky

-Original Message- 
From: Paul

Sent: Wednesday, September 05, 2012 4:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Still see document after delete with commit in solr 4.0

Actually, I didn't technically "upgrade". I downloaded the new
version, grabbed the example, and pasted in the fields from my schema
into the new one. So the only two files I changed from the example are
schema.xml and solr.xml.

Then I reindexed everything from scratch so there was no old index
involved, either.

On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter
 wrote:


: That was exactly it. I added the following line to schema.xml and it now 
works.

:
: 

Just to be clear: how exactly did you "upgraded to solr 4.0 from solr 3.5"
-- did you throw out your old solrconfig.xml and use the example
solrconfig.xml from 4.0, but keep your 3.5 schema.xml?  Do you in fact
have an  in your solrconfig.xml?

(if so: then this is all known as part of SOLR-3432, and won't affect any
users of 4.0-final -- but i want to be absolutely sure there isn't some
other edge case of this bug)


-Hoss 




Re: EdgeNgramTokenFilter and positions

2012-09-05 Thread Jack Krupansky
I don't see a Jira for it, but I do see the bad behavior in both Solr 3.6 
and 4.0-BETA in Solr admin analysis.


Interestingly, the screen shot for LUCENE-3642 does in fact show the 
(improperly) incremented positions for successive ngrams.


See:
https://issues.apache.org/jira/browse/LUCENE-3642

I'm surprised that nobody noticed the bogus positions back then.

Technically, this is a Lucene issue.

-- Jack Krupansky

-Original Message- 
From: Walter Underwood

Sent: Wednesday, September 05, 2012 1:51 PM
To: solr-user@lucene.apache.org
Subject: EdgeNgramTokenFilter and positions

In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at 
sequential positions. This seems wrong, because an n-gram is associated with 
a source token at a specific position. It also really messes up phrase 
matches.


With the source text "fleen", these positions and tokens are generated:

1,fl
2,fle
3,flee
4,fleen

Is this a known bug? Fixed? I'm running 3.3.

wunder
--
Walter Underwood
Search Guy
wun...@chegg.com





Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
Actually, I didn't technically "upgrade". I downloaded the new
version, grabbed the example, and pasted in the fields from my schema
into the new one. So the only two files I changed from the example are
schema.xml and solr.xml.

Then I reindexed everything from scratch so there was no old index
involved, either.

On Wed, Sep 5, 2012 at 2:42 PM, Chris Hostetter
 wrote:
>
> : That was exactly it. I added the following line to schema.xml and it now 
> works.
> :
> : 
>
> Just to be clear: how exactly did you "upgraded to solr 4.0 from solr 3.5"
> -- did you throw out your old solrconfig.xml and use the example
> solrconfig.xml from 4.0, but keep your 3.5 schema.xml?  Do you in fact
> have an  in your solrconfig.xml?
>
> (if so: then this is all known as part of SOLR-3432, and won't affect any
> users of 4.0-final -- but i want to be absolutely sure there isn't some
> other edge case of this bug)
>
>
> -Hoss


Re: Setting up two cores in solr.xml for Solr 4.0

2012-09-05 Thread Chris Hostetter

: I don't think I changed by solrconfig.xml file from the default that
: was provided in the example folder for solr 4.0.

ok ... well the Solr 4.0-BETA example solrconfig.xml has this in it...

  ${solr.data.dir:}

So if you want to override the dataDir using a "property" like your second 
example, it should be something like...

  
 
   

...the property name used in the solrconfig.xml has to match the property 
name you use when declaring the core, or it won't get used and you'll get 
the default behavior.  "solr.data.dir" isn't special here -- you could use 
any umber of properties in your solrconfig.xml, and declare them when 
defining your individual cores.

that's very differnet from your other example...

   

...which doesn't use properties at all, and says "this is waht the dataDir 
should be, regardless of what the ... looks like in the 
solrconfig.xml


(at least: i'm pretty sure that's how it works)



-Hoss


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Chris Hostetter

: That was exactly it. I added the following line to schema.xml and it now 
works.
: 
: 

Just to be clear: how exactly did you "upgraded to solr 4.0 from solr 3.5" 
-- did you throw out your old solrconfig.xml and use the example 
solrconfig.xml from 4.0, but keep your 3.5 schema.xml?  Do you in fact 
have an  in your solrconfig.xml?

(if so: then this is all known as part of SOLR-3432, and won't affect any 
users of 4.0-final -- but i want to be absolutely sure there isn't some 
other edge case of this bug)


-Hoss


Re: Solr index on Amazon S3

2012-09-05 Thread Erik Hatcher
Nicolas -

Can you elaborate on your use and configuration of Solr on NFS?What lock 
factory are you using?  (you had to change from the default, right?)

And how are you coordinating updates/commits to the other servers?   Where does 
indexing occur and then how are commits sent to the NFS mounted servers?

Thanks for sharing anything you can about this.

Erik

On Sep 5, 2012, at 13:26 , Nicolas de Saint-Aubert wrote:

> Hi,
> 
> We currently share a single solr read index on an nfs accessed by
> various solr instances from various devices which gives us a high
> performant cluster framework. We would like to migrate to Amazon or
> other cloud. Is there any way (compatibility) to have solr index on
> Amazon S3 file cloud system, so that we could access a single index
> form various solr as we currently do ?
> 
> Thanks for helping !



Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
That was exactly it. I added the following line to schema.xml and it now works.




On Wed, Sep 5, 2012 at 10:13 AM, Jack Krupansky  wrote:
> Check to make sure that you are not stumbling into SOLR-3432: "deleteByQuery
> silently ignored if updateLog is enabled, but {{_version_}} field does not
> exist in schema".
>
> See:
> https://issues.apache.org/jira/browse/SOLR-3432
>
> -- Jack Krupansky
>
> -Original Message- From: Paul
> Sent: Wednesday, September 05, 2012 10:05 AM
> To: solr-user
> Subject: Still see document after delete with commit in solr 4.0
>
>
> I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
> statement used to work, but now it doesn't seem to be deleting. I've
> been experimenting around, and it seems like this should be the URL
> for deleting the document with the uri of "network_24".
>
> In a browser, I first go here:
>
> http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3E&commit=true
>
> I get this response:
>
> 
>  
>0
>5
>  
> 
>
> And this is in the log file:
>
> (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start
> commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
> (timestamp) org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening Searcher@646dd60e main
> (timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> (timestamp) org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener sending requests to Searcher@646dd60e
> main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
> (timestamp) org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener done.
> S(timestamp) org.apache.solr.core.SolrCore registerSearcher
> INFO: [MYCORE] Registered new searcher Searcher@646dd60e
> main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
> (timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: [MYCORE] webapp=/solr path=/update
> params={commit=true&stream.body=uri:network_24}
> {deleteByQuery=uri:network_24,commit=} 0 5
>
> But if I then go to this URL:
>
> http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24&wt=xml
>
> I get this response:
>
> 
>  
>0
>1
>
>  xml
>  uri:network_24
>
>  
>  
>
>  network24
>  network_24
>
>  
> 
>
> Why didn't that document disappear?


Re: Solr index on Amazon S3

2012-09-05 Thread Michael Della Bitta
Amazon doesn't have a prebuilt network filesystem that's mountable on
multiple hosts out of the box. The closest thing would be setting up
NFS among your hosts yourself, but at that point it'd probably be
easier to set up Solr replication.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Sep 5, 2012 at 1:26 PM, Nicolas de Saint-Aubert
 wrote:
> Hi,
>
> We currently share a single solr read index on an nfs accessed by
> various solr instances from various devices which gives us a high
> performant cluster framework. We would like to migrate to Amazon or
> other cloud. Is there any way (compatibility) to have solr index on
> Amazon S3 file cloud system, so that we could access a single index
> form various solr as we currently do ?
>
> Thanks for helping !


Re: Delete all documents in the index

2012-09-05 Thread Rohit Harchandani
Thanks everyone. Adding the _version_ field in the schema worked.
Deleting the data directory works for me, but was not sure why deleting
using curl was not working.

On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Rohit:
>
> If it's easy, the easiest thing to do is to turn off your servlet
> container, rm -r * inside of the data directory, and then restart the
> container.
>
> Michael Della Bitta
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky 
> wrote:
> > Check to make sure that you are not stumbling into SOLR-3432:
> "deleteByQuery
> > silently ignored if updateLog is enabled, but {{_version_}} field does
> not
> > exist in schema".
> >
> > See:
> > https://issues.apache.org/jira/browse/SOLR-3432
> >
> > This could happen if you kept the new 4.0 solrconfig.xml, but copied in
> your
> > pre-4.0 schema.xml.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Rohit Harchandani
> > Sent: Wednesday, September 05, 2012 12:48 PM
> > To: solr-user@lucene.apache.org
> > Subject: Delete all documents in the index
> >
> >
> > Hi,
> > I am having difficulty deleting documents from the index using curl. The
> > urls i tried were:
> > curl "http://localhost:9020/solr/core1/update/?stream.body=
> > *:*&commit=true"
> > curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
> > "Content-Type: text/xml" --data-binary 'id:[* TO
> > *]'
> > curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
> > "Content-Type: text/xml" --data-binary
> '*:*'
> > I also tried:
> > curl "
> >
> http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true
> > "
> > as suggested on some forums. I get a response with status=0 in all cases,
> > but none of the above seem to work.
> > When I run
> > curl "http://localhost:9020/solr/core1/select?q=*:*&rows=0&wt=xml";
> > I still get a value for "numFound".
> >
> > I am currently using solr 4.0 beta version.
> >
> > Thanks for your help in advance.
> > Regards,
> > Rohit
>


EdgeNgramTokenFilter and positions

2012-09-05 Thread Walter Underwood
In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at 
sequential positions. This seems wrong, because an n-gram is associated with a 
source token at a specific position. It also really messes up phrase matches.

With the source text "fleen", these positions and tokens are generated:

1,fl
2,fle
3,flee
4,fleen

Is this a known bug? Fixed? I'm running 3.3.

wunder
--
Walter Underwood
Search Guy
wun...@chegg.com





Solr index on Amazon S3

2012-09-05 Thread Nicolas de Saint-Aubert
Hi,

We currently share a single solr read index on an nfs accessed by
various solr instances from various devices which gives us a high
performant cluster framework. We would like to migrate to Amazon or
other cloud. Is there any way (compatibility) to have solr index on
Amazon S3 file cloud system, so that we could access a single index
form various solr as we currently do ?

Thanks for helping !


Re: Delete all documents in the index

2012-09-05 Thread Michael Della Bitta
Rohit:

If it's easy, the easiest thing to do is to turn off your servlet
container, rm -r * inside of the data directory, and then restart the
container.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky  wrote:
> Check to make sure that you are not stumbling into SOLR-3432: "deleteByQuery
> silently ignored if updateLog is enabled, but {{_version_}} field does not
> exist in schema".
>
> See:
> https://issues.apache.org/jira/browse/SOLR-3432
>
> This could happen if you kept the new 4.0 solrconfig.xml, but copied in your
> pre-4.0 schema.xml.
>
> -- Jack Krupansky
>
> -Original Message- From: Rohit Harchandani
> Sent: Wednesday, September 05, 2012 12:48 PM
> To: solr-user@lucene.apache.org
> Subject: Delete all documents in the index
>
>
> Hi,
> I am having difficulty deleting documents from the index using curl. The
> urls i tried were:
> curl "http://localhost:9020/solr/core1/update/?stream.body=
> *:*&commit=true"
> curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
> "Content-Type: text/xml" --data-binary 'id:[* TO
> *]'
> curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
> "Content-Type: text/xml" --data-binary '*:*'
> I also tried:
> curl "
> http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true
> "
> as suggested on some forums. I get a response with status=0 in all cases,
> but none of the above seem to work.
> When I run
> curl "http://localhost:9020/solr/core1/select?q=*:*&rows=0&wt=xml";
> I still get a value for "numFound".
>
> I am currently using solr 4.0 beta version.
>
> Thanks for your help in advance.
> Regards,
> Rohit


Re: Delete all documents in the index

2012-09-05 Thread Jack Krupansky
Check to make sure that you are not stumbling into SOLR-3432: "deleteByQuery 
silently ignored if updateLog is enabled, but {{_version_}} field does not 
exist in schema".


See:
https://issues.apache.org/jira/browse/SOLR-3432

This could happen if you kept the new 4.0 solrconfig.xml, but copied in your 
pre-4.0 schema.xml.


-- Jack Krupansky

-Original Message- 
From: Rohit Harchandani

Sent: Wednesday, September 05, 2012 12:48 PM
To: solr-user@lucene.apache.org
Subject: Delete all documents in the index

Hi,
I am having difficulty deleting documents from the index using curl. The
urls i tried were:
curl "http://localhost:9020/solr/core1/update/?stream.body=
*:*&commit=true"
curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
"Content-Type: text/xml" --data-binary 'id:[* TO
*]'
curl "http://localhost:9020/solr/core1/update/?commit=true"; -H
"Content-Type: text/xml" --data-binary '*:*'
I also tried:
curl "
http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true
"
as suggested on some forums. I get a response with status=0 in all cases,
but none of the above seem to work.
When I run
curl "http://localhost:9020/solr/core1/select?q=*:*&rows=0&wt=xml";
I still get a value for "numFound".

I am currently using solr 4.0 beta version.

Thanks for your help in advance.
Regards,
Rohit 



Re: Website (crawler for) indexing

2012-09-05 Thread Rafał Kuć
Hello!

You can implement your own crawler using Droids
(http://incubator.apache.org/droids/) or use Apache Nutch
(http://nutch.apache.org/), which is very easy to integrate with Solr
and is very powerful crawler.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> This may be a bit off topic: How do you index an existing website
> and control the data going into index?

> We already have Java code to process the HTML (or XHTML) and turn
> it into a SolrJ Document (removing tags and other things we do not
> want in the index). We use SolrJ for indexing.
> So I guess the question is essentially which Java crawler could be useful.

> We used to use wget on command line in our publishing process, but we do no 
> longer want to do that.

> Thanks,
> Alexander



RE: Website (crawler for) indexing

2012-09-05 Thread Markus Jelsma
Please take a look at the Apache Nutch project.  
http://nutch.apache.org/
 
-Original message-
> From:Lochschmied, Alexander 
> Sent: Wed 05-Sep-2012 17:09
> To: solr-user@lucene.apache.org
> Subject: Website (crawler for) indexing
> 
> This may be a bit off topic: How do you index an existing website and control 
> the data going into index?
> 
> We already have Java code to process the HTML (or XHTML) and turn it into a 
> SolrJ Document (removing tags and other things we do not want in the index). 
> We use SolrJ for indexing.
> So I guess the question is essentially which Java crawler could be useful.
> 
> We used to use wget on command line in our publishing process, but we do no 
> longer want to do that.
> 
> Thanks,
> Alexander
> 
> 


Website (crawler for) indexing

2012-09-05 Thread Lochschmied, Alexander
This may be a bit off topic: How do you index an existing website and control 
the data going into index?

We already have Java code to process the HTML (or XHTML) and turn it into a 
SolrJ Document (removing tags and other things we do not want in the index). We 
use SolrJ for indexing.
So I guess the question is essentially which Java crawler could be useful.

We used to use wget on command line in our publishing process, but we do no 
longer want to do that.

Thanks,
Alexander



Re: Solr 4.0 BETA Replication problems on Tomcat

2012-09-05 Thread Ravi Solr
Wow, That was quick. Thank you very much Mr. Siren. I shall remove the
compression node in the solrconfig.xml and let you know how it went.

Thanks,

Ravi Kiran Bhaskar

On Wed, Sep 5, 2012 at 2:54 AM, Sami Siren  wrote:
> I opened SOLR-3789. As a workaround you can remove  name="compression">internal from the config and it should work.
>
> --
>  Sami Siren
>
> On Wed, Sep 5, 2012 at 5:58 AM, Ravi Solr  wrote:
>> Hello,
>> I have a very simple setup one master and one slave configured
>> as below, but replication keeps failing with stacktrace as shown
>> below. Note that 3.6 works fine on the same machines so I am thinking
>> that Iam missing something in configuration with regards to solr
>> 4.0...can somebody kindly let me know if Iam missing something ? I am
>> running SOLR 4.0 on Tomcat-7.0.29 with Java6. FYI I never has any
>> problem with SOLR on glassfish, this is the first time Iam using it on
>> Tomcat
>>
>> On Master
>>
>> 
>>  
>>   commit
>>   optimize
>>   schema.xml,stopwords.txt,synonyms.txt
>>   00:00:10
>>   
>> 
>>
>> On Slave
>>
>> 
>>  
>> > name="masterUrl">http://testslave:8080/solr/mycore/replication
>>
>> 00:00:50
>> internal
>> 5000
>> 1
>>  
>> 
>>
>>
>> Error
>>
>> 22:44:10WARNING SnapPuller  Error in fetching packets
>>
>> java.util.zip.ZipException: unknown compression method
>> at 
>> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:147)
>> at 
>> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
>> at 
>> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
>> at 
>> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:124)
>> at 
>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:149)
>> at 
>> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:144)
>> at 
>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchPackets(SnapPuller.java:1024)
>> at 
>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:985)
>> at 
>> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
>> at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
>> at 
>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175)
>> at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>>
>> 22:44:10SEVERE  ReplicationHandler  SnapPull failed
>> :org.apache.solr.common.SolrException: Unable to download
>> _3_Lucene40_0.tip completely. Downloaded 0!=170 at
>> org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1115)
>> at 
>> org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:999)
>> at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:627)
>> at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:331)
>> at 
>> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:297)
>> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:175) at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)


RE: exception in highlighter when using phrase search

2012-09-05 Thread Yoni Amir
I think I found the cause for this. It is partially my fault, because I sent 
solr a field with empty value, but this is also a configuration problem.

https://issues.apache.org/jira/browse/SOLR-3792


-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Tuesday, September 04, 2012 3:53 PM
To: solr-user@lucene.apache.org
Subject: exception in highlighter when using phrase search

I got this problem with solr 4 beta and the highlighting component.

When I search for a phrase, such as "foo bar", everything works ok.
When I add highlighting, I get this exception below.
You can see according to the first log line that I am searching only one field  
(all_text), but what is not visible in the log is that I am highlighting on all 
fields in the document, with hl.requireFieldMatch=false and hl.fl=*.

INFO  (SolrCore.java:1670) - [rcmCore] webapp=/solr path=/select 
params={fq={!edismax}module:"Alerts"+and+bu:"abcd+Region1"&qf=attachment&qf=all_text&version=2&rows=20&wt=javabin&start=0&q="foo
 bar"} hits=103 status=500 QTime=38 ERROR (SolrException.java:104) - 
null:java.lang.NullPointerException
   at 
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:191)
   at 
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:152)
   at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:209)
   at 
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50)
   at 
org.apache.lucene.analysis.miscellaneous.RemoveDuplicatesTokenFilter.incrementToken(RemoveDuplicatesTokenFilter.java:54)
   at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
   at 
org.apache.solr.highlight.TokenOrderingFilter.incrementToken(DefaultSolrHighlighter.java:629)
   at 
org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:78)
   at 
org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:50)
   at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:225)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
   at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
   at java.lang.Thread.run(Thread.java:736)

Any idea?

Thanks,
Yoni


Re: Still see document after delete with commit in solr 4.0

2012-09-05 Thread Jack Krupansky
Check to make sure that you are not stumbling into SOLR-3432: "deleteByQuery 
silently ignored if updateLog is enabled, but {{_version_}} field does not 
exist in schema".


See:
https://issues.apache.org/jira/browse/SOLR-3432

-- Jack Krupansky

-Original Message- 
From: Paul

Sent: Wednesday, September 05, 2012 10:05 AM
To: solr-user
Subject: Still see document after delete with commit in solr 4.0

I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
statement used to work, but now it doesn't seem to be deleting. I've
been experimenting around, and it seems like this should be the URL
for deleting the document with the uri of "network_24".

In a browser, I first go here:

http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3E&commit=true

I get this response:


 
   0
   5
 


And this is in the log file:

(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}

(timestamp) org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@646dd60e main
(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
S(timestamp) org.apache.solr.core.SolrCore registerSearcher
INFO: [MYCORE] Registered new searcher Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [MYCORE] webapp=/solr path=/update
params={commit=true&stream.body=uri:network_24}
{deleteByQuery=uri:network_24,commit=} 0 5

But if I then go to this URL:

http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24&wt=xml

I get this response:


 
   0
   1
   
 xml
 uri:network_24
   
 
 
   
 network24
 network_24
   
 


Why didn't that document disappear? 



Still see document after delete with commit in solr 4.0

2012-09-05 Thread Paul
I've recently upgraded to solr 4.0 from solr 3.5 and I think my delete
statement used to work, but now it doesn't seem to be deleting. I've
been experimenting around, and it seems like this should be the URL
for deleting the document with the uri of "network_24".

In a browser, I first go here:

http://localhost:8983/solr/MYCORE/update?stream.body=%3Cdelete%3E%3Cquery%3Euri%3Anetwork_24%3C%2Fquery%3E%3C%2Fdelete%3E&commit=true

I get this response:


  
0
5
  


And this is in the log file:

(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
(timestamp) org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@646dd60e main
(timestamp) org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
S(timestamp) org.apache.solr.core.SolrCore registerSearcher
INFO: [MYCORE] Registered new searcher Searcher@646dd60e
main{StandardDirectoryReader(segments_2v:447 _4p(4.0.0.1):C3244)}
(timestamp) org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [MYCORE] webapp=/solr path=/update
params={commit=true&stream.body=uri:network_24}
{deleteByQuery=uri:network_24,commit=} 0 5

But if I then go to this URL:

http://localhost:8983/solr/MYCORE/select?q=uri%3Anetwork_24&wt=xml

I get this response:


  
0
1

  xml
  uri:network_24

  
  

  network24
  network_24

  


Why didn't that document disappear?


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-09-05 Thread Ahmet Arslan
> i want to search with title and empname both. 

I know, I give that URL just to get the idea here.
If you try 
suggest/?q="michael b"&df=title&defType=lucene&fl=title
you will see that your interested will in results section not  section.

> or title or song...). Here (*suggest/?q="michael
> b"&df=title&defType=lucene*) we are specifying the
> title type search. 

q=title:"michael b" OR empname:"michael b"&fl=title,empname would the trick.


> I removed said configurations in solrconfig.xml file, got
> result like below.

If you removed it, then there shouldn't be spellcheck response. And you are 
still looking results in the wrong place.


Re: Setting up two cores in solr.xml for Solr 4.0

2012-09-05 Thread Paul
I don't think I changed by solrconfig.xml file from the default that
was provided in the example folder for solr 4.0.

On Tue, Sep 4, 2012 at 3:40 PM, Chris Hostetter
 wrote:
>
> :   
>
> I'm pretty sure what you hav above tells solr that core MYCORE_test it
> should use the instanceDir MYCORE but ignore the  in that
> solrconfig.xml and use the one you specified.
>
> This on the other hand...
>
> : >   
> : > 
> : >
>
> ...tells solr that the MYCORE_test SolrCore should use the instanceDir
> MYCORE, and when parsing that solrconfig.xml file it should set the
> variable ${dataDir} to be "MYCORE_test" -- but if your solconfig.xml file
> does not ever refer to the  ${dataDir} variable, it would have any effect.
>
> so the question becomes -- what does your solrconfig.xml look like?
>
>
> -Hoss


Solr Cloud partitioning

2012-09-05 Thread dan sutton
Hi,

At the moment, partitioning with solrcloud is hash based on uniqueid.
What I'd like to do is have custom partitioning, e.g. based on date
(shard_MMYY).

I'm aware of https://issues.apache.org/jira/browse/SOLR-2592, but
after a cursory look it seems that with the latest patch, one might
end up with multiple partitions in the same shard, perhaps all (e.g.
if 2 or more partition hash values end up in the same range), which
I'd not want.

Has anyone else implemented custom shard partitioning for solrcloud ?

I think the answer is to have the partition class itself pluggable
(default to hash of unique_key as now), but not sure how to pass the
solrConfig pluggable partition class through to ClusterState (which is
in solrj not core)? any advice?

Cheers,
Dan


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-09-05 Thread aniljayanti
HI,

Thanks,

i want to search with title and empname both. for example when we use any
search engine like google,yahoo... we donot specify any type that is (name
or title or song...). Here (*suggest/?q="michael
b"&df=title&defType=lucene*) we are specifying the title type search. 

I removed said configurations in solrconfig.xml file, got result like below.


  
  
  10 
  1 
  8 
  
  michael 
  michael 
  michael " 
  michael j 
  michael ja 
  michael jac 
  michael jack 
  michael jacks 
  michael jackso 
  michael jackson 
  
  
  
  10 
  9 
  10 
  
  b 
  b 
  ba 
  bab 
  bar 
  barb 
  be 
  ben 
  bi 
  bl 
  
  
  "michael b" 
  
  

I sent my schema and solrconfig xml file configurations. Please check.

Aniljayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4005545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication lag after cache optimizations

2012-09-05 Thread Damien Dudognon
Thanks for all the information.

> I'm not sure how exactly you are measuring/defining "replication lag" but 
> if you mean "lag in how long until the newly replicated documents are 
> visible in searches"

That is exactly what I wanted to say.

I've attached the cache statistics.

If you are interested in, a few more details on our use case :
Actually, we have only few hits on Solr (about 2 req/s) but we will quickly 
have more than 50 req/s. The requests are mainly facet requests. The index 
counts about 1,5M documents and we plan a size of 15M documents in one year.

Best regards,
Damien


CACHE

name:queryResultCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=org.apache.solr.search.SolrIndexSearcher$3@3d762027)  
stats:  lookups : 4 
hits : 4 
hitratio : 1.00 
inserts : 0 
evictions : 0 
size : 1024 
warmupTime : 20 
cumulative_lookups : 1003454 
cumulative_hits : 894365 
cumulative_hitratio : 0.89 
cumulative_inserts : 120343 
cumulative_evictions : 0 

name:fieldCache  
class:   org.apache.solr.search.SolrFieldCacheMBean  
insanity_count : 0 
name:documentCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=null)  
stats:  lookups : 80 
hits : 60 
hitratio : 0.75 
inserts : 20 
evictions : 0 
size : 20 
warmupTime : 0 
cumulative_lookups : 10844723 
cumulative_hits : 8318341 
cumulative_hitratio : 0.76 
cumulative_inserts : 2526382 
cumulative_evictions : 0 

name:fieldValueCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=16384, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=org.apache.solr.search.SolrIndexSearcher$1@38bdc9b3)  
stats:  lookups : 2 
hits : 2 
hitratio : 1.00 
inserts : 0 
evictions : 0 
size : 1 
warmupTime : 1369 
cumulative_lookups : 485281 
cumulative_hits : 485276 
cumulative_hitratio : 0.99 
cumulative_inserts : 2 
cumulative_evictions : 0 
item_tags : 
{field=tags,memSize=5804302,tindexSize=36148,time=1369,phase1=1357,nTerms=118241,bigTerms=0,termInstances=448772,uses=2}
 

name:filterCache  
class:   org.apache.solr.search.FastLRUCache  
version: 1.0  
description: Concurrent LRU Cache(maxSize=16384, initialSize=4096, 
minSize=14745, acceptableSize=15564, cleanupThread=false, autowarmCount=1024, 
regenerator=org.apache.solr.search.SolrIndexSearcher$2@340523df)  
stats:  lookups : 21 
hits : 21 
hitratio : 1.00 
inserts : 0 
evictions : 0 
size : 1024 
warmupTime : 1305 
cumulative_lookups : 5956615 
cumulative_hits : 5868136 
cumulative_hitratio : 0.98 
cumulative_inserts : 88479 
cumulative_evictions : 0

RE: Solr Cloud Implementation with Apache Tomcat

2012-09-05 Thread bsargurunathan
Hi Markus,

Can you please tell me the exact file name in the tomcat folder?
Means where I have to set the properties?
I am using Windows machine and I have the Tomcat6.


Thanks,
Guru



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005535.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-09-05 Thread Ahmet Arslan
Hi,

You are trying to use two different approaches at the same time.

1) Remove 


  suggest
  query


from your requestHandler.

2) Execute this query URL : suggest/?q="michael b"&df=title&defType=lucene

And you will see my point.

--- On Wed, 9/5/12, aniljayanti  wrote:

> From: aniljayanti 
> Subject: Re: AW: AW: auto completion search with solr using NGrams in SOLR
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 5, 2012, 7:29 AM
> Hi,
> 
> thanks,
> 
> I m sending my whole configurations in schema and
> solrconfig.xml files.
> 
> 
> schema.xml
> ---
> 
>  positionIncrementGap="100"
> omitNorms="true">
>     
>            class="solr.KeywordTokenizerFactory" />
>            class="solr.LowerCaseFilterFactory" />
>            class="solr.PatternReplaceFilterFactory" pattern="\s+"
> replacement=" " replace="all"/>
>            class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="15" side="front" />
>     
>     
>           class="solr.KeywordTokenizerFactory" />
>           class="solr.LowerCaseFilterFactory" />
>           class="solr.PatternReplaceFilterFactory" pattern="\s+"
> replacement=" " replace="all"/>
>     
>   
> 
> 
>  type="edgytext"     indexed="true"
>     stored="true" />
>      indexed="true"    
> stored="true" />
> 
>  indexed="true"
> stored="false"  multiValued="true" omitNorms="true"
> omitTermFreqAndPositions="false" />
> 
>  
>  dest="autocomplete_text"/>
> *
> solrconfig.xml
> -
>  name="suggest">
>     
>        name="name">suggest
>        name="classname">org.apache.solr.spelling.suggest.Suggester
>        name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup 
>     
>        name="storeDir">suggest
>        name="field">autocomplete_text
>        name="exactMatchFirst">true
>        name="threshold">0.005
>        name="buildOnCommit">true
>        name="buildOnOptimize">true
>     
>    
>        name="name">jarowinkler 
>        name="field">lowerfilt 
>        name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance
> 
>        name="spellcheckIndexDir">spellchecker 
>    
>       name="queryAnalyzerFieldType">edgytext 
>   
>   
>    class="org.apache.solr.handler.component.SearchHandler"
> name="/suggest" startup="lazy">
> 
>     
>        name="spellcheck">true
>        name="spellcheck.dictionary">suggest
>        name="spellcheck.onlyMorePopular">true
>        name="spellcheck.count">5
>        name="spellcheck.collate">false
>        name="spellcheck.maxCollations">5
>        name="spellcheck.maxCollationTries">1000
>        name="spellcheck.collateExtendedResults">true
>     
>     
>       suggest
>       query
>     
>   
> 
> URL : suggest/?q="michael b"
> -
> Response : 
> 
>  
>  
>  
>   0 
>   3 
>   
>    /> 
>  
>  
>  
>   10 
>   1 
>   8 
>   
>   michael bully herbig 
>   michael bolton 
>   michael bolton: arias 
>   michael falch 
>   michael holm 
>   michael jackson 
>   michael neale 
>   michael penn 
>   michael salgado 
>   michael w. smith 
>   
>   
>  
>   10 
>   9 
>   10 
>   
>   b in the mix - the remixes 
>   b2k 
>   backstreet boys 
>   backyard babies 
>   banda maguey 
>   barbra streisand 
>   barry manilow 
>   benny goodman 
>   beny more 
>   beyonce 
>   
>   
>   "michael bully herbig b
> in the mix - the
> remixes" 
>   
>   
>   
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4005490.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>


RE: Solr Cloud Implementation with Apache Tomcat

2012-09-05 Thread Markus Jelsma
Set the -DzkHost= property in some Tomcat configuration as per the wiki page 
and point it to the Zookeeper(s). On Debian systems you can use 
/etc/default/tomcat6 to configure your properties.

 
 
-Original message-
> From:bsargurunathan 
> Sent: Wed 05-Sep-2012 10:40
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud Implementation with Apache Tomcat
> 
> Hi Rafal,
> 
> I worked with standalone zookeeper, which is starting.
> But the next step is, I want to configure the zookeeper with my solr cloud
> using Apache Tomcat.
> How it is really possible? Can you please tell me the steps, which I have to
> follow to implement the Solr Cloud with Apache Tomcat. Thanks in advance..
> 
> Thanks,
> Guru
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005528.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Solr Cloud Implementation with Apache Tomcat

2012-09-05 Thread bsargurunathan
Hi Rafal,

I worked with standalone zookeeper, which is starting.
But the next step is, I want to configure the zookeeper with my solr cloud
using Apache Tomcat.
How it is really possible? Can you please tell me the steps, which I have to
follow to implement the Solr Cloud with Apache Tomcat. Thanks in advance..

Thanks,
Guru



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Implementation-with-Apache-Tomcat-tp4005209p4005528.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting on mutivalued fields still impossible?

2012-09-05 Thread Toke Eskildsen
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote:
> Imagine you have two entries, aardvark and emu in your
> multiValued field. How should that document sort relative to
> another doc with camel and zebra? Any heuristic
> you apply will be wrong for someone else

I see two obvious choices here:

1) Sort by the value that is ordered first by the comparator function.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
This is what Uwe wants to do and it is normally done by preprocessing
and collapsing to a single value.
It could be implemented with an ordered multi-valued field cache by
comparing on the first (or last, in the case of reverse sort) entry for
each matching document.

2) Make duplicate entries in the result set, one for each value.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
Doc1: (aardvark), emu
Doc2: (camel), zebra
I have a hard time coming up with a real world use case for this.
It could be implemented by using a multi-valued field cache as above and
putting the same document ID into the sliding window sorter once for
each field value.

Collapsing this into a single algorithm:
Step through all IDs. For each ID, give access to the list of field
values and provide a callback for adding one or more (value, ID)-pairs
to the sliding windows sorter. 


Are there some other realistic heuristics that I have missed?