Re: Best practices for multiple languages?

2011-01-18 Thread Paul Libbrecht

But for this, you need a skillfully designed:
- set of fields
- multiplexing analyzer
- query expansion
In one of my projects, we do not split language by fields and it's a pain... 
I'm having recurring issues in one sense or the other.
- the "die" example that Oti s mentioned is a good one: stop-word in German, 
essential verb in English
- I had recently issues with the contribution of the word Fourier (for the name 
of series): in English it stays fourier, in French in becomes fouri. So: if the 
resource is contributed in French, the indexed value is fouri, English seekers 
won't find it; if the resource is contributed in English, French seekers won't 
find it.
So my last lesson: always have a whitespace-lowercase unstemmed field also at 
hand and prefer it over the others in your query expansion.

A wiki page should probably be made.

paul


Le 19 janv. 2011 à 07:53, Vinaya Kumar Thimmappa a écrit :
> I think we should be using lucene with snowball jar's which means one index 
> for all languages (ofcourse size of index is always a matter of concerns).
> 
> Hope this helps.
> -vinaya
> 
> On Tuesday 18 January 2011 11:23 PM, Clemens Wyss wrote:
>> What is the "best practice" to support multiple languages, i.e. 
>> Lucene-Documents that have multiple language content/fields?
>> Should
>> a) each language be indexed in a seperate index/directory or should
>> b) the Documents (in a single directory) hold the diverse localized fields?
>> 
>> We most often will be searching "language dependent" which (at least 
>> performance wise) mandates one-directory-per-language...
>> 
>> Any (lucene specific) white papers on this topic?
>> 
>> Thx in advance
>> Clemens
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



AW: Best practices for multiple languages?

2011-01-18 Thread Clemens Wyss
> 1) Docs in different languages -- every document is one language
> 2) Each document has fields in different languages
We mainly have 1)-models

Clemens

> -Ursprüngliche Nachricht-
> Von: Shai Erera [mailto:ser...@gmail.com]
> Gesendet: Dienstag, 18. Januar 2011 20:28
> An: java-user@lucene.apache.org
> Betreff: Re: Best practices for multiple languages?
> 
> Hi
> 
> There are two types of multi-language docs:
> 1) Docs in different languages -- every document is one language
> 2) Each document has fields in different languages
> 
> I've dealt with both, and there are different solutions to each. Which of them
> is yours?
> 
> Shai
> 
> On Tue, Jan 18, 2011 at 7:53 PM, Clemens Wyss 
> wrote:
> 
> > What is the "best practice" to support multiple languages, i.e.
> > Lucene-Documents that have multiple language content/fields?
> > Should
> > a) each language be indexed in a seperate index/directory or should
> > b) the Documents (in a single directory) hold the diverse localized fields?
> >
> > We most often will be searching "language dependent" which (at least
> > performance wise) mandates one-directory-per-language...
> >
> > Any (lucene specific) white papers on this topic?
> >
> > Thx in advance
> > Clemens
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Best practices for multiple languages?

2011-01-18 Thread Vinaya Kumar Thimmappa
I think we should be using lucene with snowball jar's which means one 
index for all languages (ofcourse size of index is always a matter of 
concerns).


Hope this helps.
-vinaya

On Tuesday 18 January 2011 11:23 PM, Clemens Wyss wrote:

What is the "best practice" to support multiple languages, i.e. 
Lucene-Documents that have multiple language content/fields?
Should
a) each language be indexed in a seperate index/directory or should
b) the Documents (in a single directory) hold the diverse localized fields?

We most often will be searching "language dependent" which (at least 
performance wise) mandates one-directory-per-language...

Any (lucene specific) white papers on this topic?

Thx in advance
Clemens

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Anshum
Where do you get your Lucene/Solr downloads from?

[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[X] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)
--
Anshum Gupta
http://ai-cafe.blogspot.com


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Otis Gospodnetic
> [X] ASF Mirrors (linked in our release announcements or via the Lucene  
>website)

> 
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr,  etc.)
> 
> [X] I/we build them from source via an SVN/Git checkout.
> 
> []  Other (someone in your company mirrors them internally or via a 
> downstream  
>project)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Fabiano Nunes
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Mattmann, Chris A (388J)
On Jan 18, 2011, at 2:24 PM, Glen Newton wrote:

> Where do you get your Lucene/Solr downloads from?
> 
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
> 
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [] I/we build them from source via an SVN/Git checkout.


Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene Revolution 2011 is Coming - May 25 & 26 - Save The Date and Call For Papers

2011-01-18 Thread Michael Bohlig
Mark your calendars today! The largest worldwide conference dedicated to Lucene 
and Solr will take place in the San Francisco/Bay Area May 25-26. 











The 2011 conference will build on the success of last year's Lucene Revolution 
in Boston. Sponsored by Lucid Imagination with additional support from 
community and other commercial co-sponsors, we'll be adding new sessions, new 
speakers, and new training sessions to the agenda. 

Lucid Imagination is the commercial entity exclusively dedicated to Apache 
Lucene/Solr open source search technology. 

Registration will begin shortly - so make sure to save-the-date. 

In the meantime, the Call For Participation (CFP) is now open for Lucene 
Revolution 2011. If you have a great Solr or Lucene talk, this is a fantastic 
opportunity to share it with the community. 

To submit a proposal for a 45-minute presentation, please complete the form at: 
http://www.lucidimagination.com/revolution/2011/cfp 

Topics of interest include: 
- Lucene and Solr in the Enterprise (case studies, implementation, return on 
investment, etc.) 
- Use of LucidWorks Enterprise 
- “How We Did It” development case studies 
- Lucene/Solr technology deep dives: features, how to use, etc. 
- Spatial/Geo/local search 
- Lucene and Solr in the Cloud 
- Scalability and performance tuning 
- Large Scale Search 
- Real Time Search (or NRT search) 
- Data Integration/Data Management 
- Lucene & Solr for Mobile Applications 
- Associated technologies: Mahout, Nutch, NLP, etc. 

All accepted speakers will get complimentary conference passes. Financial 
assistance is available for speakers that qualify. 

Submissions must be received by Wednesday , March 2 , 2011 , 12 Midnight PST 


Key Dates: 
January 18 , 2011 : Call For Participation open; form available for completion 
at: http://www.lucidimagination.com/revolution/2011/cfp 
March 2, 2011 : Call For Participation Closes 
March 9, 2011 : Speaker Acceptance Notification 
May 23-24, 2011 : Lucene and Solr Training Sessions 
May 25-26, 2011 : Lucene Revolution Conference Sessions 

If you have more than one topic that you would like to propose, please complete 
an additional online form. 

To be considered, proposals must be received by 12 Midnight PDT, March 2 , 2011 
. 

Interested in registration or other conference news? Want to be added to the 
conference mailing list? Is your organization interested in sponsorship 
opportunities? Please send an email to: i...@lucenerevolution.org 

We look forward to seeing you in the San Francisco/Bay Area! 

Regards, 
Mike 





Michael Bohlig | Lucid Imagination 
Enterprise Marketing 
p +1 650 353 4057 x132 
m+1 650 703 8383 
www.lucidimagination.com 




[POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Sudarsan, Sithu D.
 


Sincerely,
Sithu D Sudarsan

Grant Ingersoll  wrote:

> Where do you get your Lucene/Solr downloads from?
> 
> [x] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)
> 
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [x] I/we build them from source via an SVN/Git checkout.
> 
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread BrightMinds Dev

[] ASF Mirrors (linked in our release announcements or via the Lucene website)

[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a downstream 
project)


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Bill Janssen
Grant Ingersoll  wrote:

> Where do you get your Lucene/Solr downloads from?
> 
> [x] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)
> 
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [x] I/we build them from source via an SVN/Git checkout.
> 
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Chris Male
>
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downstream project)
>


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Tommaso Teofili
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)

2011/1/18 Grant Ingersoll 

> As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really
> don't have a good sense of how people get Lucene and Solr for use in their
> application.  Because of this, there has been some talk of dropping Maven
> support for Lucene artifacts (or at least make them external).  Before we do
> that, I'd like to conduct an informal poll of actual users out there and see
> how you get Lucene or Solr.
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downstream project)
>
> Please put an X in the box that applies to you.  Multiple selections are OK
> (for instance, if one project uses a mirror and another uses Maven)
>
> Please do not turn this thread into a discussion on Maven and it's
> (de)merits, I simply want to know, informally, where people get their JARs
> from.  In other words, no discussion is necessary (we already have that
> going on d...@lucene.apache.org which you are welcome to join.)
>
> Thanks,
> Grant
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Cristian Vat
> [X] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Christopher St John
On Tue, Jan 18, 2011 at 3:04 PM, Grant Ingersoll  wrote:
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Ahmet Arslan
> [] ASF Mirrors (linked in our release announcements or via
> the Lucene website)
> 
> [X] Maven repository (whether you use Maven, Ant+Ivy,
> Buildr, etc.)
> 
> [] I/we build them from source via an SVN/Git checkout.
> 
> [] Other (someone in your company mirrors them internally
> or via a downstream project)


  

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Lukas Kahwe Smith

On 18.01.2011, at 22:04, Grant Ingersoll wrote:

> As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really 
> don't have a good sense of how people get Lucene and Solr for use in their 
> application.  Because of this, there has been some talk of dropping Maven 
> support for Lucene artifacts (or at least make them external).  Before we do 
> that, I'd like to conduct an informal poll of actual users out there and see 
> how you get Lucene or Solr.
> 
> Where do you get your Lucene/Solr downloads from?
> 
> [X] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)
> 
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [X] I/we build them from source via an SVN/Git checkout.
> 
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)


regards,
Lukas Kahwe Smith
m...@pooteeweet.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Ryan McKinley
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Ian Lea
> Where do you get your Lucene/Solr downloads from?
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)
>
> Please put an X in the box that applies to you.  Multiple selections are OK 
> (for instance, if one project uses a mirror and another uses Maven)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Beatriz Nombela
Where do you get your Lucene/Solr downloads from?

[] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)

>
>

-- 
Beatriz Nombela Escobar
bea...@gmail.com


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread robo -
 [x] ASF Mirrors (linked in our release announcements or via the Lucene website)

 [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [] I/we build them from source via an SVN/Git checkout.


On Tue, Jan 18, 2011 at 1:24 PM, Glen Newton  wrote:
> Where do you get your Lucene/Solr downloads from?
>
> [x] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
>
> -Glen Newton
>
>
> --
>
> -
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Hardy Ferentschik
On Tue, 18 Jan 2011 22:04:01 +0100, Grant Ingersoll   
wrote:


 [] ASF Mirrors (linked in our release announcements or via the Lucene  
website)


 [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [] I/we build them from source via an SVN/Git checkout.

 [] Other (someone in your company mirrors them internally or via a

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Earl Hood
> Where do you get your Lucene/Solr downloads from?
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)

--ewh

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Steven A Rowe
> [x] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
> 
> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [x] I/we build them from source via an SVN/Git checkout.


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Philip Puffinburger
> Where do you get your Lucene/Solr downloads from?
> 
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
> 
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [] I/we build them from source via an SVN/Git checkout.
> 
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)
> 
> Please put an X in the box that applies to you.  Multiple selections are OK 
> (for instance, if one project uses a mirror and another uses Maven)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Markus Jelsma

> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
> 
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [X] I/we build them from source via an SVN/Git checkout.
> 
> [] Other (someone in your company mirrors them internally or via a
> downstream project)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Glen Newton
Where do you get your Lucene/Solr downloads from?

[x] ASF Mirrors (linked in our release announcements or via the Lucene website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.


-Glen Newton


-- 

-

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Ryan Aylward
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Patrick Samborski
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Grant Ingersoll
And here's mine:

On Jan 18, 2011, at 4:04 PM, Grant Ingersoll wrote:
> 
> Where do you get your Lucene/Solr downloads from?
> 
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
> 
> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> 
> [x] I/we build them from source via an SVN/Git checkout.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Luka Stojanovic


Where do you get your Lucene/Solr downloads from?

[] ASF Mirrors (linked in our release announcements or via the Lucene
 website)

[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
 downstream project)


--
Luka Stojanovic
lu...@vast.com
Platform Engineering

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



[POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Grant Ingersoll
As devs of Lucene/Solr, due to the way ASF mirrors, etc. works, we really don't 
have a good sense of how people get Lucene and Solr for use in their 
application.  Because of this, there has been some talk of dropping Maven 
support for Lucene artifacts (or at least make them external).  Before we do 
that, I'd like to conduct an informal poll of actual users out there and see 
how you get Lucene or Solr.

Where do you get your Lucene/Solr downloads from?

[] ASF Mirrors (linked in our release announcements or via the Lucene website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a downstream 
project)

Please put an X in the box that applies to you.  Multiple selections are OK 
(for instance, if one project uses a mirror and another uses Maven)

Please do not turn this thread into a discussion on Maven and it's (de)merits, 
I simply want to know, informally, where people get their JARs from.  In other 
words, no discussion is necessary (we already have that going on 
d...@lucene.apache.org which you are welcome to join.)

Thanks,
Grant
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Best practices for multiple languages?

2011-01-18 Thread Shai Erera
Hi

There are two types of multi-language docs:
1) Docs in different languages -- every document is one language
2) Each document has fields in different languages

I've dealt with both, and there are different solutions to each. Which of
them is yours?

Shai

On Tue, Jan 18, 2011 at 7:53 PM, Clemens Wyss  wrote:

> What is the "best practice" to support multiple languages, i.e.
> Lucene-Documents that have multiple language content/fields?
> Should
> a) each language be indexed in a seperate index/directory or should
> b) the Documents (in a single directory) hold the diverse localized fields?
>
> We most often will be searching "language dependent" which (at least
> performance wise) mandates one-directory-per-language...
>
> Any (lucene specific) white papers on this topic?
>
> Thx in advance
> Clemens
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Best practices for multiple languages?

2011-01-18 Thread Otis Gospodnetic
Hi Clemens,

If you will be searching individual languages, go with language-specific 
indices.  Wunder likes to give an example of "die" in German vs. English. :)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Clemens Wyss 
> To: "java-user@lucene.apache.org" 
> Sent: Tue, January 18, 2011 12:53:57 PM
> Subject: Best practices for multiple languages?
> 
> What is the "best practice" to support multiple languages, i.e. 
>Lucene-Documents  that have multiple language content/fields? 
>
> Should
> a) each language be  indexed in a seperate index/directory or should
> b) the Documents (in a single  directory) hold the diverse localized fields?
> 
> We most often will be  searching "language dependent" which (at least 
>performance wise) mandates  one-directory-per-language...
> 
> Any (lucene specific) white papers on this  topic?
> 
> Thx in  advance
> Clemens
> 
> -
> To  unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For  additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Best practices for multiple languages?

2011-01-18 Thread Clemens Wyss
What is the "best practice" to support multiple languages, i.e. 
Lucene-Documents that have multiple language content/fields? 
Should
a) each language be indexed in a seperate index/directory or should
b) the Documents (in a single directory) hold the diverse localized fields?

We most often will be searching "language dependent" which (at least 
performance wise) mandates one-directory-per-language...

Any (lucene specific) white papers on this topic?

Thx in advance
Clemens

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Large .frq file

2011-01-18 Thread dan sutton
Hi Shai,

What I really wanted to do was reduce the frq file size

Oddly (when tokenizing 3 seperate fields) with the
WhitespaceTokenizer, more terms are produced than with the CJK
analyzer and the CJK frq filesize is much larger ... examples below:

with WhitespaceTokenizer:
89M _0.tis
1.4M_0.tii
71  _0.fnm
5.8M_0.fdx
741K_0.fdt
20 segments.gen
293 segments_2
119M_0.frq

with CJKTokenizer:
31M _0.tis
633K_0.tii
71  _0.fnm
5.8M_0.fdx
741K_0.fdt
20  segments.gen
293 segments_2
166M_0.frq

Also I believe solr calls addDocument with payLoads turned off. I'm
not sure why the size is much larger.

Cheers,
Dan

On Tue, Jan 18, 2011 at 12:41 PM, Shai Erera  wrote:
> If I understand correctly, you compare the size of the .frq when
> WhitespaceTokenizer is used, vs the CJK ones?
>
> I'd bet this is because WhitespaceTokenizer creates far less terms than the
> CJK one. Whitespace tokenizes the text by separating on whitespace, while
> CJK does sort of N-Gram tokenization, which usually leads to much more terms
> created. This affects the .frq file in that there are much more posting
> lists created, which are stored in the .frq file.
>
> See if the .tii and .tis files differ and if their difference is the same
> order of the .frq differences (e.g. if they are 2x larger w/ CJK, so .frq
> should be of the same order of difference), then I believe this is the
> reason.
>
> Shai
>
> On Tue, Jan 18, 2011 at 2:13 PM, dan sutton  wrote:
>
>> Hi,
>>
>> We're trying to create a large index via solr for trends and notice
>> that we have a large '.frq' file after doing the following:
>>
>>
>> make all text fields index="true", stored="false",
>> omitTermFreqAndPositions="true" omitNorms="true" termPositions="false"
>> termOffsets="false" termVectors="false"
>>
>> We are using a variation on org.apache.lucene.analysis.cjk and notice
>> that the .frq is about 4 time larger than, for example, the
>> WhiteSpaceTokenizer.
>>
>>
>> Considering that with omitTermFreqAndPositions="true" for the text
>> fields I'd have thought this should be : "If omitTf were true it would
>> be this sequence of VInts instead:"
>> (http://lucene.apache.org/java/2_9_1/fileformats.html#Frequencies)
>>
>>
>> Can anyone suggest how I can reduce the size of this file?
>>
>>
>> Many thanks,
>> Dan
>>
>> Lucene Specification Version: 2.9.1
>> Solr Specification Version: 1.4.0.2010.09.10.17.10.36
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene Ranking Problem

2011-01-18 Thread Lahiru Samarakoon
HI Ian & Umesh.

This is what I was looking for.
Thank a lot.

Regards,
Lahiru


Re: Large .frq file

2011-01-18 Thread Shai Erera
If I understand correctly, you compare the size of the .frq when
WhitespaceTokenizer is used, vs the CJK ones?

I'd bet this is because WhitespaceTokenizer creates far less terms than the
CJK one. Whitespace tokenizes the text by separating on whitespace, while
CJK does sort of N-Gram tokenization, which usually leads to much more terms
created. This affects the .frq file in that there are much more posting
lists created, which are stored in the .frq file.

See if the .tii and .tis files differ and if their difference is the same
order of the .frq differences (e.g. if they are 2x larger w/ CJK, so .frq
should be of the same order of difference), then I believe this is the
reason.

Shai

On Tue, Jan 18, 2011 at 2:13 PM, dan sutton  wrote:

> Hi,
>
> We're trying to create a large index via solr for trends and notice
> that we have a large '.frq' file after doing the following:
>
>
> make all text fields index="true", stored="false",
> omitTermFreqAndPositions="true" omitNorms="true" termPositions="false"
> termOffsets="false" termVectors="false"
>
> We are using a variation on org.apache.lucene.analysis.cjk and notice
> that the .frq is about 4 time larger than, for example, the
> WhiteSpaceTokenizer.
>
>
> Considering that with omitTermFreqAndPositions="true" for the text
> fields I'd have thought this should be : "If omitTf were true it would
> be this sequence of VInts instead:"
> (http://lucene.apache.org/java/2_9_1/fileformats.html#Frequencies)
>
>
> Can anyone suggest how I can reduce the size of this file?
>
>
> Many thanks,
> Dan
>
> Lucene Specification Version: 2.9.1
> Solr Specification Version: 1.4.0.2010.09.10.17.10.36
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Ranking Problem

2011-01-18 Thread Umesh Prasad
Hi Lahiru,
   Comments are inline:


On Tue, Jan 18, 2011 at 5:42 PM, Lahiru Samarakoon wrote:

> Dear All,
>
>  I have two documents. The analyzed and the tokenized contents are
> mentioned
> below.
>
>  *Document 1 :*
>
>  *when*, null_1, *my*, null_1, money,
>
> fund, amount, payment, creditcard, credit,
>
> card, *bank, account*, debit, deduct,
>
> *charge*, null_1, my, mobile, usage,
>
> *service*, connection
>
>
>  *Document 2:*
>
>  *when*, what, time, what, day,
>
> null_1, money, fund, cash, payment,
>
> null_1, i, do, you, i,
>
> null_1, deduct, *charge*, reduce, debit,
>
> from, *my*, *bank, account*, credit,
>
> card, null_1, *adsl*, adsl1, adsl-2,
>
> adsl-1, adsl2, adsl, 1, adsl,
>
> 2, usage, connection, *service*
>
>
>  Then, I searched for the following text.
>
>  *Query:* when my bank account charge adsl service
>
>  *Scores
> *
>
> Document 1 = 0.74406385
>
> Document 2 = Score = 0.66067594
>
> Please read the documentation of lucene scoring.
 http://lucene.apache.org/java/2_9_1/scoring.html.
That will help you understand the bigger picture.


>  I was expecting to have Document 2 as the top ranked document. But I get
> Document 1 as the top ranked even it does not contains  the term “adsl”.
>
>  The word order of the Document 1 matches with the query very well. Can it
> be the reason ?
>
> Word order doesn't matter. However tf/idf , norms and other factors do
matter as described in above link.

You can get see how , documents got assigned score by using

IndexSearcher.explain(query,docId); as described in
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Searcher.html#explain%28org.apache.lucene.search.Query,%20int%29


If it is, how can I neglect the word order when searching. (I am not using
> phase queries).
>
> My searching code look like below and it is very simple.
>
>
>  *QueryParser parser = new QueryParser(Version.LUCENE_30, *
>
> *"pattern", *
>
> *new StandardAnalyzer(Version.LUCENE_30)); *
>
> *org.apache.lucene.search.Query query1 =
> parser.parse(this.query.getQuestion()); *
>
> *TopDocs hits = is.search(query1, 10); *
>
>  Please advice
>
>
> Thanks,
>
> Lahiru
>



-- 
---
Thanks & Regards
Umesh Prasad


Re: Lucene Ranking Problem

2011-01-18 Thread Ian Lea
See what Searcher.explain() says for each hit. I don't think that word
order will matter with the query you give.  There are several factors
in scoring - see oal.search.Similarity or google lucene scoring.

Or have a play with Luke: invaluable for investigating things with
lucene and will tell you everything about your index.


--
Ian.


On Tue, Jan 18, 2011 at 12:12 PM, Lahiru Samarakoon  wrote:
> Dear All,
>
>  I have two documents. The analyzed and the tokenized contents are mentioned
> below.
>
>  *Document 1 :*
>
>  *when*, null_1, *my*, null_1, money,
>
> fund, amount, payment, creditcard, credit,
>
> card, *bank, account*, debit, deduct,
>
> *charge*, null_1, my, mobile, usage,
>
> *service*, connection
>
>
>  *Document 2:*
>
>  *when*, what, time, what, day,
>
> null_1, money, fund, cash, payment,
>
> null_1, i, do, you, i,
>
> null_1, deduct, *charge*, reduce, debit,
>
> from, *my*, *bank, account*, credit,
>
> card, null_1, *adsl*, adsl1, adsl-2,
>
> adsl-1, adsl2, adsl, 1, adsl,
>
> 2, usage, connection, *service*
>
>
>  Then, I searched for the following text.
>
>  *Query:* when my bank account charge adsl service
>
>  *Scores
> *
>
> Document 1 = 0.74406385
>
> Document 2 = Score = 0.66067594
>
>  I was expecting to have Document 2 as the top ranked document. But I get
> Document 1 as the top ranked even it does not contains  the term “adsl”.
>
>  The word order of the Document 1 matches with the query very well. Can it
> be the reason ?
>
> If it is, how can I neglect the word order when searching. (I am not using
> phase queries).
>
> My searching code look like below and it is very simple.
>
>
>  *QueryParser parser = new QueryParser(Version.LUCENE_30, *
>
> *"pattern", *
>
> *new StandardAnalyzer(Version.LUCENE_30)); *
>
> *org.apache.lucene.search.Query query1 =
> parser.parse(this.query.getQuestion()); *
>
> *TopDocs hits = is.search(query1, 10); *
>
>  Please advice
>
>
> Thanks,
>
> Lahiru
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Large .frq file

2011-01-18 Thread dan sutton
Hi,

We're trying to create a large index via solr for trends and notice
that we have a large '.frq' file after doing the following:


make all text fields index="true", stored="false",
omitTermFreqAndPositions="true" omitNorms="true" termPositions="false"
termOffsets="false" termVectors="false"

We are using a variation on org.apache.lucene.analysis.cjk and notice
that the .frq is about 4 time larger than, for example, the
WhiteSpaceTokenizer.


Considering that with omitTermFreqAndPositions="true" for the text
fields I'd have thought this should be : "If omitTf were true it would
be this sequence of VInts instead:"
(http://lucene.apache.org/java/2_9_1/fileformats.html#Frequencies)


Can anyone suggest how I can reduce the size of this file?


Many thanks,
Dan

Lucene Specification Version: 2.9.1
Solr Specification Version: 1.4.0.2010.09.10.17.10.36

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene Ranking Problem

2011-01-18 Thread Lahiru Samarakoon
Dear All,

 I have two documents. The analyzed and the tokenized contents are mentioned
below.

 *Document 1 :*

 *when*, null_1, *my*, null_1, money,

fund, amount, payment, creditcard, credit,

card, *bank, account*, debit, deduct,

*charge*, null_1, my, mobile, usage,

*service*, connection


 *Document 2:*

 *when*, what, time, what, day,

null_1, money, fund, cash, payment,

null_1, i, do, you, i,

null_1, deduct, *charge*, reduce, debit,

from, *my*, *bank, account*, credit,

card, null_1, *adsl*, adsl1, adsl-2,

adsl-1, adsl2, adsl, 1, adsl,

2, usage, connection, *service*


 Then, I searched for the following text.

 *Query:* when my bank account charge adsl service

 *Scores
*

Document 1 = 0.74406385

Document 2 = Score = 0.66067594

 I was expecting to have Document 2 as the top ranked document. But I get
Document 1 as the top ranked even it does not contains  the term “adsl”.

 The word order of the Document 1 matches with the query very well. Can it
be the reason ?

If it is, how can I neglect the word order when searching. (I am not using
phase queries).

My searching code look like below and it is very simple.


 *QueryParser parser = new QueryParser(Version.LUCENE_30, *

*"pattern", *

*new StandardAnalyzer(Version.LUCENE_30)); *

*org.apache.lucene.search.Query query1 =
parser.parse(this.query.getQuestion()); *

*TopDocs hits = is.search(query1, 10); *

 Please advice


Thanks,

Lahiru