How to compose a query from multiple HTTP URL parameters?

2010-03-24 Thread Conal Tuohy
I would like to be able to specify a query over multiple fields using 
just an HTML form with different parameters for the different fields.


Is it possible to configure Solr to accept a URL of this form:

select?Species=Pseudonaja+textilis&Hospital=Griffith+Base+Hospital

... instead of:

q=Species:'Pseudonaja+textilis'+Hospital:'Griffith+Base+Hospital'

I have read some information about parameter indirection or parameter 
dereferencing, but I haven't been able to get it to work.





Re: Field Collapsing SOLR-236

2010-03-24 Thread Dennis Gearon
Boy, I hope that field collapsing works! I'm planning on using it heavily.
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 3/24/10, blargy  wrote:

> From: blargy 
> Subject: Field Collapsing SOLR-236
> To: solr-user@lucene.apache.org
> Date: Wednesday, March 24, 2010, 12:17 PM
> 
> Has anyone had any luck with the field collapsing patch
> (SOLR-236) with Solr
> 1.4? I tried patching my version of 1.4 with no such luck.
> 
> Thanks
> -- 
> View this message in context: 
> http://old.nabble.com/Field-Collapsing-SOLR-236-tp28019949p28019949.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 
> 


Fwd: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-03-24 Thread Yonik Seeley
Forwarding to solr only - the big cross-post caused my gmail filters
to "file" it.
-Yonik

-- Forwarded message --
From: Grant Ingersoll 
Date: Wed, Mar 24, 2010 at 8:03 PM
Subject: Apache Lucene EuroCon Call For Participation: Prague, Czech
Republic May 20 & 21, 2010
To: Lucene mailing list ,
solr-user@lucene.apache.org, java-u...@lucene.apache.org,
mahout-u...@lucene.apache.org, nutch-u...@lucene.apache.org,
openrelevance-u...@lucene.apache.org, tika-u...@lucene.apache.org,
pylucene-u...@lucene.apache.org, connectors-...@incubator.apache.org,
lucene-net-...@lucene.apache.org


Apache Lucene EuroCon Call For Participation - Prague, Czech Republic
May 20 & 21, 2010

All submissions must be received by Tuesday, April 13, 2010, 12
Midnight CET/6 PM US EDT

The first European conference dedicated to Lucene and Solr is coming
to Prague from May 18-21, 2010. Apache Lucene EuroCon is running on on
not-for-profit basis, with net proceeds donated back to the Apache
Software Foundation. The conference is sponsored by Lucid Imagination
with additional support from community and other commercial
co-sponsors.

Key Dates:
24 March 2010: Call For Participation Open
13 April 2010: Call For Participation Closes
16 April 2010: Speaker Acceptance/Rejection Notification
18-19 May 2010: Lucene and Solr Pre-conference Training Sessions
20-21 May 2010: Apache Lucene EuroCon

This conference creates a new opportunity for the Apache Lucene/Solr
community and marketplace, providing  the chance to gather, learn and
collaborate on the latest in Apache Lucene and Solr search
technologies and what's happening in the community and ecosystem.
There will be two days of Lucene and Solr training offered May 18 &
19, and followed by two days packed with leading edge Lucene and Solr
Open Source Search content and talks by search and open source thought
leaders.

We are soliciting 45-minute presentations for the conference, 20-21
May 2010 in Prague. The conference and all presentations will be in
English.

Topics of interest include:
- Lucene and Solr in the Enterprise (case studies, implementation,
return on investment, etc.)
- “How We Did It”  Development Case Studies
- Spatial/Geo search
- Lucene and Solr in the Cloud
- Scalability and Performance Tuning
- Large Scale Search
- Real Time Search
- Data Integration/Data Management
- Tika, Nutch and Mahout
- Lucene Connectors Framework
- Faceting and Categorization
- Relevance in Practice
- Lucene & Solr for Mobile Applications
- Multi-language Support
- Indexing and Analysis Techniques
- Advanced Topics in Lucene & Solr Development

All accepted speakers will qualify for discounted conference
admission. Financial assistance is available for speakers that
qualify.

To submit a 45-minute presentation proposal, please send an email to
c...@lucene-eurocon.org containing the following information in plain
text:

1. Your full name, title, and organization

2. Contact information, including your address, email, phone number

3. The name of your proposed session (keep your title simple and
relevant to the topic)

4. A 75-200 word overview of your presentation (in English); in
addition to the topic, describe whether your presentation is intended
as a tutorial, description of an implementation, an
theoretical/academic discussion, etc.

5. A 100-200-word speaker bio that includes prior conference speaking
or related experience (in English)

To be considered, proposals must be received by 12 Midnight CET
Tuesday, 13 April 2010 (Tuesday 13 April 6 PM US Eastern time, 3 PM US
Pacific Time).

Please email any questions regarding the conference to
i...@lucene-eurocon.org. To be added to the conference mailing list,
please email sig...@lucene-eurocon.org. If your organization is
interested in sponsorship opportunities, email
spon...@lucene-eurocon.org

Key Dates

24 March 2010: Call For Participation Open
13 April 2010: Call For Participation Closes
16 April 2010: Speaker Acceptance/Rejection Notification
18-19 May 2010  Lucene and Solr Pre-conference Training Sessions
20-21 May 2010: Apache Lucene EuroCon

We look forward to seeing you in Prague!

Grant Ingersoll
Apache Lucene EuroCon Program Chair
www.lucene-eurocon.org


Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-03-24 Thread Grant Ingersoll
Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 & 
21, 2010
 
All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 
PM US EDT

The first European conference dedicated to Lucene and Solr is coming to Prague 
from May 18-21, 2010. Apache Lucene EuroCon is running on on not-for-profit 
basis, with net proceeds donated back to the Apache Software Foundation. The 
conference is sponsored by Lucid Imagination with additional support from 
community and other commercial co-sponsors.

Key Dates:
24 March 2010: Call For Participation Open
13 April 2010: Call For Participation Closes
16 April 2010: Speaker Acceptance/Rejection Notification
18-19 May 2010: Lucene and Solr Pre-conference Training Sessions
20-21 May 2010: Apache Lucene EuroCon

This conference creates a new opportunity for the Apache Lucene/Solr community 
and marketplace, providing  the chance to gather, learn and collaborate on the 
latest in Apache Lucene and Solr search technologies and what's happening in 
the community and ecosystem. There will be two days of Lucene and Solr training 
offered May 18 & 19, and followed by two days packed with leading edge Lucene 
and Solr Open Source Search content and talks by search and open source thought 
leaders.

We are soliciting 45-minute presentations for the conference, 20-21 May 2010 in 
Prague. The conference and all presentations will be in English.

Topics of interest include: 
- Lucene and Solr in the Enterprise (case studies, implementation, return on 
investment, etc.)
- “How We Did It”  Development Case Studies
- Spatial/Geo search
- Lucene and Solr in the Cloud
- Scalability and Performance Tuning
- Large Scale Search
- Real Time Search
- Data Integration/Data Management
- Tika, Nutch and Mahout
- Lucene Connectors Framework
- Faceting and Categorization
- Relevance in Practice
- Lucene & Solr for Mobile Applications
- Multi-language Support
- Indexing and Analysis Techniques
- Advanced Topics in Lucene & Solr Development

All accepted speakers will qualify for discounted conference admission. 
Financial assistance is available for speakers that qualify.

To submit a 45-minute presentation proposal, please send an email to 
c...@lucene-eurocon.org containing the following information in plain text:

1. Your full name, title, and organization

2. Contact information, including your address, email, phone number

3. The name of your proposed session (keep your title simple and relevant to 
the topic)

4. A 75-200 word overview of your presentation (in English); in addition to the 
topic, describe whether your presentation is intended as a tutorial, 
description of an implementation, an theoretical/academic discussion, etc.

5. A 100-200-word speaker bio that includes prior conference speaking or 
related experience (in English)

To be considered, proposals must be received by 12 Midnight CET Tuesday, 13 
April 2010 (Tuesday 13 April 6 PM US Eastern time, 3 PM US Pacific Time).

Please email any questions regarding the conference to i...@lucene-eurocon.org. 
To be added to the conference mailing list, please email 
sig...@lucene-eurocon.org. If your organization is interested in sponsorship 
opportunities, email
spon...@lucene-eurocon.org

Key Dates

24 March 2010: Call For Participation Open
13 April 2010: Call For Participation Closes
16 April 2010: Speaker Acceptance/Rejection Notification
18-19 May 2010  Lucene and Solr Pre-conference Training Sessions
20-21 May 2010: Apache Lucene EuroCon

We look forward to seeing you in Prague!

Grant Ingersoll
Apache Lucene EuroCon Program Chair
www.lucene-eurocon.org

Seattle Hadoop/Scalability/NoSQL Meetup Wednesday, March 31st. w/ LinkedIn's Jake Mannix

2010-03-24 Thread Bradford Stephens
Greetings,

Don't forget that the Hadoop/Scalability/NoSQL meetup is next
Wednesday, March 31st at 6:45pm! We're going to have a very exciting
guest: Jake Mannix from LinkedIn will talk about machine learning on
Hadoop. He's a well-decorated engineer across many disciplines, and
even knows quite a bit about distributed search with Lucene.

We may also hear form Sarah Killcoyne from Systems Biology. She'll be
talking about Big Data in the biology / research fields.

Check out the details here:
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/

Hope to see you there!

Cheers,
Bradford

-- 
http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


multicore embedded swap / reload etc.

2010-03-24 Thread Nagelberg, Kallin
Hi,

I've got a situation where I need to reindex a core once a day. To do this I 
was thinking of having two cores, one 'live' and one 'staging'. The app is 
always serving 'live', but when the daily index happens it goes into 'staging', 
then staging is swapped into 'live'. I can see how to do this sort of thing 
over http, but I'm using an embedded solr setup via solrJ. Any suggestions on 
how to proceed? I could just have two solrServer's built from different 
coreContainers, and then swap the references when I'm ready, but I wonder if 
there is a better approach. Maybe grab a hold of the CoreAdminHandler?

Thanks,
Kallin Nagelberg


Re: Encoding problem with ExtractRequestHandler for HTML indexing

2010-03-24 Thread Teruhiko Kurosaka
I suppose you mean Extract_ing_RequestHandler.

Out of curiosity, I sent in a Japanese HTML file of EUC-JP encoding,
and it converted to Unicode properly and the index has correct
Japanese words.

Does your HTML files have META tag for Content-type with the value
having charset= ? For example, this is what I have:



On Mar 21, 2010, at 9:45 AM, Ukyo Virgden wrote:

> Hi,
> 
> I'm trying to index HTML documents with different encodings. My html are
> either in win-12XX, ISO-8859-X or UTF8 encoding. handler correctly parses
> all html in their respective encodings and indexes. However on the web
> interface I'm developing I enter query terms in UTF-8 which naturally does
> not match with content with different encodings. Also the results I see on
> my web app is not utf8 encoded as expected.
> 
> My question, is there any filter I can use to convert all content extracted
> by the handler to UTF-8 prior to indexing?
> 
> Does it make sense to write a filter which would convert tokens to UTF-8, or
> even is it possible with multiple encodings?
> 
> Thanks in advance.
> Ukyo


Teruhiko "Kuro" Kurosaka
RLP + Lucene & Solr = powerful search for global contents



RE: lowercasing for sorting

2010-03-24 Thread Binkley, Peter
Thanks - wouldn't want to get you into trouble! It's handy when selling
the idea of using Solr in the Canadian academic world to be able to drop
names like the Globe and Mail, though. If I do I'll keep my source
confidential.

Peter 

> -Original Message-
> From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] 
> Sent: Tuesday, March 23, 2010 12:11 PM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: lowercasing for sorting
> 
> Thanks, and my cover is apparently blown :P
> 
> We're looking at solr for a number of applications, from 
> taking the load off the database, to user searching etc. I 
> don't think I'll get fired for saying that :P
> 
> Thanks,
> Kallin Nagelberg
> 
> -Original Message-
> From: Binkley, Peter [mailto:peter.bink...@ualberta.ca]
> Sent: Tuesday, March 23, 2010 2:09 PM
> To: solr-user@lucene.apache.org
> Subject: RE: lowercasing for sorting
> 
> Solr makes this easy:
> 
> 
>  
> 
> You can populate this field from another field using 
> copyField, if you also need to be able to search or display 
> the original values.
> 
> Just out of curiosity, can you tell us anything about what 
> the Globe and Mail is using Solr for? (assuming the question 
> is work-related)
> 
> Peter
> 
> 
> > -Original Message-
> > From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
> > Sent: Tuesday, March 23, 2010 11:07 AM
> > To: 'solr-user@lucene.apache.org'
> > Subject: lowercasing for sorting
> > 
> > I'm trying to perform a case-insensitive sort on a field in 
> my index 
> > that contains values like
> > 
> > aaa
> > bbb
> > AA
> > BB
> > 
> > And I get them sorted like:
> > 
> > aaa
> > bbb
> > AA
> > BB
> > 
> > When I would like them:
> > 
> > aa
> > aaa
> > bb
> > bbb
> > 
> > To do this I'm trying to setup a fieldType who's sole purpose is to 
> > lowercase a value on query and index. I don't want to tokenize the 
> > value, just lowercase it. Any ideas?
> > 
> > Thanks,
> > Kallin Nagelberg
> > 
> 
> 


Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-24 Thread Alexey Serba
You should add this component (suggest or spellcheck, depends how do
you name it) to request handler, i.e. add

  



  suggest

  

And then you can hit the following url and get your suggestions

http://localhost:8983/solr/suggest/?spellcheck=true&spellcheck.dictionary=suggest&spellcheck.build=true&spellcheck.extendedResults=true&spellcheck.count=10&q=prefix

On Wed, Mar 24, 2010 at 8:09 PM, stocki  wrote:
>
> hey.
>
> i got it =)
>
> i checked out with lucene and the build from solr. with ant -verbose
> example.
>
> now, when i put this line into solrconfig:  name="classname">org.apache.solr.spelling.suggest.Suggester
> no exception occurs =) juhu
>
> but how wokrs this component ?? sorry for a new stupid question ^^
>
>
> stocki wrote:
>>
>> okay, thx
>>
>> so i checked out but i cannot build an build.
>>
>> i got 100 errors ...
>>
>> D:\cygwin\home\stock\trunk_\solr\common-build.xml:424: The following error
>> occur
>> red while executing this line:
>> D:\cygwin\home\stock\trunk_\solr\common-build.xml:281: The following error
>> occur
>> red while executing this line:
>> D:\cygwin\home\stock\trunk_\solr\contrib\clustering\build.xml:69: The
>> following
>> error occurred while executing this line:
>> D:\cygwin\home\stock\trunk_\solr\build.xml:155: The following error
>> occurred whi
>> le executing this line:
>> D:\cygwin\home\stock\trunk_\solr\common-build.xml:221: Compile failed; see
>> the c
>> ompiler error output for details.
>>
>>
>>
>> Lance Norskog-2 wrote:
>>>
>>> You need 'ant' to do builds.  At the top level, do:
>>> ant clean
>>> ant example
>>>
>>> These will build everything and set up the example/ directory. After
>>> that, run:
>>> ant test-core
>>>
>>> to run all of the unit tests and make sure that the build works. If
>>> the autosuggest patch has a test, this will check that the patch went
>>> in correctly.
>>>
>>> Lance
>>>
>>> On Tue, Mar 23, 2010 at 7:42 AM, stocki  wrote:

 okay,
 i do this..

 but one file are not right updatet 
 Index: trunk/src/java/org/apache/solr/util/HighFrequencyDictionary.java
 (from the suggest.patch)

 i checkout it from eclipse, apply patch, make an new solr.war ... its
 the
 right way ??
 i thought that is making a war i didnt need to make an build.

 how do i make an build ?




 Alexey-34 wrote:
>
>> Error loading class 'org.apache.solr.spelling.suggest.Suggester'
> Are you sure you applied the patch correctly?
> See http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
>
> Checkout Solr trunk source code (
> http://svn.apache.org/repos/asf/lucene/solr/trunk ), apply patch,
> verify that everything went smoothly, build solr and use built version
> for your tests.
>
> On Mon, Mar 22, 2010 at 9:42 PM, stocki  wrote:
>>
>> i patch an nightly build from solr.
>> patch runs, classes are in the correct folder, but when i replace
>> spellcheck
>> with this spellchecl like in the comments, solr cannot find the
>> classes
>> =(
>>
>> 
>>    
>>      suggest
>>      > name="classname">org.apache.solr.spelling.suggest.Suggester
>>      > name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup
>>      text
>>      american-english
>>    
>>  
>>
>>
>> --> SCHWERWIEGEND: org.apache.solr.common.SolrException: Error loading
>> class
>> 'org.ap
>> ache.solr.spelling.suggest.Suggester'
>>
>>
>> why is it so ??  i think no one has so many trouble to run a patch
>> like
>> me =( :D
>>
>>
>> Andrzej Bialecki wrote:
>>>
>>> On 2010-03-19 13:03, stocki wrote:

 hello..

 i try to implement autosuggest component from these link:
 http://issues.apache.org/jira/browse/SOLR-1316

 but i have no idea how to do this !?? can anyone get me some tipps ?
>>>
>>> Please follow the instructions outlined in the JIRA issue, in the
>>> comment that shows fragments of XML config files.
>>>
>>>
>>> --
>>> Best regards,
>>> Andrzej Bialecki     <><
>>>   ___. ___ ___ ___ _ _   __
>>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>>> http://www.sigram.com  Contact: info at sigram dot com
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/SOLR-1316-How-To-Implement-this-autosuggest-component-tp27950949p27990809.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>

 --
 View this message in context:
 http://old.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp27950949p28001938.html
 Sent from the Solr - User mailing list archive

Re: If you could have one feature in Solr...

2010-03-24 Thread Teruhiko Kurosaka
First of all, I am not really concerned with "per field"
(or per-column in DB term) portion of the original request.
Most documents are monolingual.

How languages are identified depends on your application,
and database support of language tagging is not necessary.

The database schema designer may have created a field that 
stores the language information, for example.

If you are indexing documents that live in a file system,
the directory hierarchy or the name of the documents might
tell the language, assuming you have set up some standard
naming convention.

HTML documents may have the META tag for Content-Language.  
If it is from an HTTP feed, there may be Content-Language header.

And if all else fails, or the information is not reliable, the language 
can be determined by analyzing the document statistically by software
such as Nutch's Language Identifier, or commercial language identifier
software like my employer, Basis Technology, sells.

> Most databases only RECENTLY have set up langauges per column. Languages per 
> ENTRY in a column? I don't think any support that yet. How would you get that 
> information from a database with the corresponding language attribute?
> 
> 
> Dennis Gearon
> 
> Signature Warning
> 
> EARTH has a Right To Life,
>  otherwise we all die.
> 
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php
> 
> 
> --- On Wed, 3/24/10, Teruhiko Kurosaka  wrote:
> 
>> From: Teruhiko Kurosaka 
>> Subject: Re: If you could have one feature in Solr...
>> To: "solr-user@lucene.apache.org" 
>> Date: Wednesday, March 24, 2010, 11:36 AM
>> (Sorry for very late response on this
>> topic.)
>> 
>> On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote:
>> 
>>> - langage attribute for each field
>> 
>> I was thinking about it and it was one of my wishes.
>> Currently, Solr practically requires that we have
>> a field for each natural language that an application
>> supports.  If the app needs to support English, French
>> and
>> German, we would have to have title_en, title_fr, and
>> title_de
>> (suffixes are ISO 2-letter lang codes) instead of just 
>> a title field.  This isn't pretty.  
>> 
>> What if we want to support 15 languages?  It would be
>> much 
>> better if we can have just one title field and language 
>> information associated with the value.  
>> 
>> But after I thought about it a bit deeper, I think the
>> current ugly solution is actually practical.  This is
>> because 
>> most users want to find documents of the languages they 
>> understand.  So if a user indicate they understand
>> English and 
>> German only, we just need to search title_en and title_de.
>> 
>> Maybe I'm missing something...
>> 
>> 
>> Teruhiko "Kuro" Kurosaka, 415-227-9600 x122
>> RLP + Lucene & Solr = powerful search for global
>> contents
>> 
>> 


Teruhiko "Kuro" Kurosaka, 415-227-9600 x122
RLP + Lucene & Solr = powerful search for global contents


Re: update some index documents after indexing process is done with DIH

2010-03-24 Thread ANKITBHATNAGAR



: If you make your EventListener implements SolrCoreAware you can get
: hold of the core on inform. use that to get hold of the
: SolrIndexWriter 

Implementing SolrCoreAware I can get hold of the core and easy get hold of A
SolrIndexSearcher and so a reader. But I can't see the way to get hold of
SolrIndexWriter just holding core...
===
This is how you get 

  SolrIndexWriter writer = new
SolrIndexWriter("SolrCore.initIndex",getIndexDir(), true, schema,
mainIndexConfig);


Ankit
-- 
View this message in context: 
http://old.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p28020805.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: wikipedia and teaching kids search engines

2010-03-24 Thread Grant Ingersoll

On Mar 24, 2010, at 1:53 PM, Andrzej Bialecki wrote:

> On 2010-03-24 16:15, Markus Jelsma wrote:
>> A bit off-topic but how about Nutch grabbing some conent and have it indexed
>> in Solr?
> 
> The problem is not with collecting and submitting the documents, the problem 
> is with parsing the Wikimedia markup embedded in XML. WikipediaTokenizer from 
> Lucene contrib/ is a quick and perhaps acceptable solution ...

Yeah, the WikipediaTokenizer does a pretty decent job, but still has a few bugs 
that need fixing.  It handles most of the syntax and can also assign types to 
the tokens based on the type of token it is.  It can also "roll up" tokens for 
things like categories into a single token (for things like faceting)

-Grant

Field Collapsing SOLR-236

2010-03-24 Thread blargy

Has anyone had any luck with the field collapsing patch (SOLR-236) with Solr
1.4? I tried patching my version of 1.4 with no such luck.

Thanks
-- 
View this message in context: 
http://old.nabble.com/Field-Collapsing-SOLR-236-tp28019949p28019949.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: If you could have one feature in Solr...

2010-03-24 Thread Dennis Gearon
Most databases only RECENTLY have set up langauges per column. Languages per 
ENTRY in a column? I don't think any support that yet. How would you get that 
information from a database with the corresponding language attribute?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 3/24/10, Teruhiko Kurosaka  wrote:

> From: Teruhiko Kurosaka 
> Subject: Re: If you could have one feature in Solr...
> To: "solr-user@lucene.apache.org" 
> Date: Wednesday, March 24, 2010, 11:36 AM
> (Sorry for very late response on this
> topic.)
> 
> On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote:
> 
> > - langage attribute for each field
> 
> I was thinking about it and it was one of my wishes.
> Currently, Solr practically requires that we have
> a field for each natural language that an application
> supports.  If the app needs to support English, French
> and
> German, we would have to have title_en, title_fr, and
> title_de
> (suffixes are ISO 2-letter lang codes) instead of just 
> a title field.  This isn't pretty.  
> 
> What if we want to support 15 languages?  It would be
> much 
> better if we can have just one title field and language 
> information associated with the value.  
> 
> But after I thought about it a bit deeper, I think the
> current ugly solution is actually practical.  This is
> because 
> most users want to find documents of the languages they 
> understand.  So if a user indicate they understand
> English and 
> German only, we just need to search title_en and title_de.
> 
> Maybe I'm missing something...
> 
> 
> Teruhiko "Kuro" Kurosaka, 415-227-9600 x122
> RLP + Lucene & Solr = powerful search for global
> contents
> 
>


Re: wikipedia and teaching kids search engines

2010-03-24 Thread Chris Hostetter

: My goal is to index wikipedia in order to demonstrate search to a class of
: middle school kids that I've volunteered to teach for a couple of hours.
: Which brings me to my next question...

twitter data is a little easier to ingest easily then the wikipedia markup 
(the json based streaming API gives you each tweet on it's own line in a 
way that's really trivial to convert into CSV with a perl script) and 
might seem more interesting to kids then wikipedia, while still having 
some interesting metadata (user, post date, hash tags) and lexigraphic 
challanges (synonyms, abbreviations, @ and # markup, etc...)


: One idea I have is to bring some actual "documents", say a poster board with a
: sentence written largely on it, have the students physically *tokenize* the
: document by cutting it up and lexicographically building the term dictionary.
: Thoughts on taking it further welcome!

cutting up a paper document is a great way to teach textual analysis, but 
i think the real key is having two copies of multiple documents (3 would 
be enough) ... cut up one copy of each doc to build the term dictionary 
and tape all of those to the wall; then tape the second copy of every doc 
on the wall arround them and draw lines from each term to the documents 
it's in  (using differnet a different color for the paper of each doc 
would be an easy way to spot which term is in which doc, and what the term 
frequency is).


-Hoss



Re: If you could have one feature in Solr...

2010-03-24 Thread Teruhiko Kurosaka
(Sorry for very late response on this topic.)

On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote:

> - langage attribute for each field

I was thinking about it and it was one of my wishes.
Currently, Solr practically requires that we have
a field for each natural language that an application
supports.  If the app needs to support English, French and
German, we would have to have title_en, title_fr, and title_de
(suffixes are ISO 2-letter lang codes) instead of just 
a title field.  This isn't pretty.  

What if we want to support 15 languages?  It would be much 
better if we can have just one title field and language 
information associated with the value.  

But after I thought about it a bit deeper, I think the
current ugly solution is actually practical.  This is because 
most users want to find documents of the languages they 
understand.  So if a user indicate they understand English and 
German only, we just need to search title_en and title_de.

Maybe I'm missing something...


Teruhiko "Kuro" Kurosaka, 415-227-9600 x122
RLP + Lucene & Solr = powerful search for global contents



Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
Thanks a lot Ahmet. Now I'm gonna learn new thing: how to apply a new patch
:)

Cheers.

2010/3/24 Ahmet Arslan 

> > Yes, that's what I was expecting. Actually, I'd like
> > to highlight phrases
> > containing stopwords, like Terrain à sehloul
>
> Lucene's FastVectorHighlighter[1] can do that kind of phrase highlighting.
> It seems that solr integration [2] has finished. You need to apply
> SOLR-1268 patch.
>
> [1]
> http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.html
>
> [2]http://issues.apache.org/jira/browse/SOLR-1268
>
>
>
>


Re: Issue w/ highlighting a String field

2010-03-24 Thread Ahmet Arslan
> Yes, that's what I was expecting. Actually, I'd like
> to highlight phrases
> containing stopwords, like Terrain à sehloul

Lucene's FastVectorHighlighter[1] can do that kind of phrase highlighting.
It seems that solr integration [2] has finished. You need to apply SOLR-1268 
patch. 

[1]http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.html

[2]http://issues.apache.org/jira/browse/SOLR-1268





Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
2010/3/24 Ahmet Arslan 

>
> > Thank a lot Ahmet. In addition, I want to highlight phrases
> > containing stop
> > words. I guess that the best way is to use a tokenized type
> > without
> > stopwordFilter. Do you agree with me defining a new type
> > for this purpose ?
>
> I am not sure about that. May be solr.CommonGramsFilterFactory can do the
> job. I personally do not perform stop-word removal.
>
> > By he way, I wanted to highlight a phrase using a tokenized
> > field type, but
> > I got wrong result; I tried 2 cases (q=Terrain\
> > sehloul  and q="Terrain
> > sehloul"), and I got the following:
> > Terrain sehloul
>
> This is okey. Were you expecting this? : Terrain sehloul
>
> Yes, that's what I was expecting. Actually, I'd like to highlight phrases
containing stopwords, like Terrain à sehloul


Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-24 Thread Andrzej Bialecki

On 2010-03-24 15:58, Fábio Aragão da Silva wrote:

hello there,
I'm working on the development of a piece of code that integrates Solr
with Vignette/OpenText Content Management, meaning Vignette content
instances will be indexed in solr when published and deleted from solr
when unpublished. I'm using solr 1.4, solrj and solr cell.

I've implemented most of the code and I've ran into only a single
issue so far: vignette content management supports the attachment of
multiple binary documents (such as .doc, .pdf or .xls files) to a
single content instance. I am mapping each content instance in
Vignette to a solr document, but now I have a content instance in
vignette with multiple binary files attached to it.

So my question is: is it possible to have more than one binary file
indexed into a single document in solr?

I'm a beginner in solr, but from what I understood I have two options
to index content using solrj: either to use UpdateRequest() and the
add() method to add a SolrInputDocument to the request (in case the
document doesn´t represent a binary file), or to use
ContentStreamUpdateRequest() and the addFile() method to add a binary
file to the content stream request.

I don't see a way, though, to say "this document is comprised of two
files, a word and a pdf, so index them as one document in solr using
content1 and content2 fields - or merge their content into a single
'content' field)".

I tried calling the addFile() twice (one call for each file) and no
error but nothing getting indexed as well.

ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
req.addFile(new File("file1.doc"));
req.addFile(new File("file2.pdf"));
req.setParam("literal.id", "multiple_files_test");
req.setParam("uprefix", "attr_");
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(req);

Any thoughts on this would be greatly appreciated.


Write your own RequestHandler that uses the existing 
ExtractingRequestHandler to actually parse the streams, and then you 
combine the results arbitrarily in your handler, eventually sending an 
AddUpdateCommand to the update processor. You can obtain both the update 
processor and SolrCell instance from req.getCore().



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Issue w/ highlighting a String field

2010-03-24 Thread Ahmet Arslan

> Thank a lot Ahmet. In addition, I want to highlight phrases
> containing stop
> words. I guess that the best way is to use a tokenized type
> without
> stopwordFilter. Do you agree with me defining a new type
> for this purpose ?

I am not sure about that. May be solr.CommonGramsFilterFactory can do the job. 
I personally do not perform stop-word removal.

> By he way, I wanted to highlight a phrase using a tokenized
> field type, but
> I got wrong result; I tried 2 cases (q=Terrain\
> sehloul  and q="Terrain
> sehloul"), and I got the following:
> Terrain sehloul

This is okey. Were you expecting this? : Terrain sehloul





Re: wikipedia and teaching kids search engines

2010-03-24 Thread Andrzej Bialecki

On 2010-03-24 16:15, Markus Jelsma wrote:

A bit off-topic but how about Nutch grabbing some conent and have it indexed
in Solr?


The problem is not with collecting and submitting the documents, the 
problem is with parsing the Wikimedia markup embedded in XML. 
WikipediaTokenizer from Lucene contrib/ is a quick and perhaps 
acceptable solution ...


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: wikipedia and teaching kids search engines

2010-03-24 Thread Walter Underwood
This is brilliant. I love it!

Is a computer game a document? How about each level, each room, each player?

If you want some fancy linguistics besides stemming, try compounding or what I 
call "one word or two?" English loves to glom words together.

schoolroom or school room?
babysitter, baby-sitter, or baby sitter?
Ghost Busters or Ghostbusters? 

Note: the poster and movie titles for Ghostbusters disagree, I have screenshots 
of that.

wunder

On Mar 24, 2010, at 9:53 AM, Erick Erickson wrote:

> Erik:
> 
> In a former incarnation, I thought I was going to teach 6th graders. Until I
> found out I can't deal with 25 kids for 6 hours at a stretch for years on
> end
> 
> My thoughts, presented in a "feel free to ignore but this is what I'd do"
> spirit.
> There are some random thoughts below, but here's what I'd think about...
> 
> Do a bit of an intro to the game. 10 minutes tops.
> 
> Make a game of sorts out of it. Some teams are the "indexers" and some are
> the "searchers". Give them some simple rules to follow, perhaps different
> ones for different pairs. Make sure some get surprising results (e.g. have
> one indexing team stem, the paired search team not stem). The searchers
> should rank the documents, you'll get some really surprising results.
> Emphasize that the game isn't pass/fail, it's to show the kinds of things we
> have to deal with.
> 
> Find some random near age-mates and try it once or twice before you present,
> you'll undoubtedly change something. Maybe run it by a teacher or two.
> 
> Use that as a basis to discuss the fact that people who write the programs
> that index/search have to cope with all the stuff they did, and the rules
> are imperfect. And each decision is made to serve a need, and when the user
> needs something *else*, it probably isn't a good match. And how horrible
> things happen when one part of the team assumes something different than the
> other part. And how end users don't care about all the internal stuff, they
> just care about how well their needs were served
> 
> ***here're my random musings, they may even be useful***
> Outline what you want to cover. Then cut out 75% of it. Really. Forget
> running SOLR, the kids don't care. Think about questions like "what's a
> word?" "How is a stupid computer going to figure out what *you* want?"
> "what's a document?"
> 
> Certainly do the exercise of presenting sentences and asking what they'd
> expect, e.g.
> "The dog is running", would you expect "run" to be a hit? ran? the? You can
> work tokenizing in here, perhaps under the guise of "what's important when
> searching?" Maybe even before the game above if you decide to do that.
> 
> Why or why not? Perhaps ask/talk about how a really stupid computer program
> is supposed to figure stuff like this out.
> 
> Back up and tell them what a document is. How hard that is to define. Chris
> M. is right on when he talks about hooking what they're interested in.
> 
> Maybe come up with some examples of really surprising results from searches,
> and do a really *simple* explanation of how it got that way.
> 
> If you decide to go into scoring, stick with simplicity. Like "the more
> times a word appears in a document, the more relevant it is". Can you even
> guarantee that they'd understand phrasing this in terms of percentages?
> 
> FWIW
> Erick
> 
> On Wed, Mar 24, 2010 at 10:40 AM, Erik Hatcher wrote:
> 
>> I've got a couple of questions for the community...
>> 
>> * what's the simplest way to get Solr up and running with a relatively
>> richly schema'd index of a Wikipedia dump?
>> 
>> What I'm looking for is something as easy as something along these lines:
>> 
>> java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar
>> 
>> cat wikipedia.bz2 | wikipedia_solr_indexer
>> 
>> My goal is to index wikipedia in order to demonstrate search to a class of
>> middle school kids that I've volunteered to teach for a couple of hours.
>> Which brings me to my next question...
>> 
>> * anyone have ideas on some basic hands-on ways of teaching search engine
>> fundamentals?
>> 
>> One idea I have is to bring some actual "documents", say a poster board
>> with a sentence written largely on it, have the students physically
>> *tokenize* the document by cutting it up and lexicographically building the
>> term dictionary.  Thoughts on taking it further welcome!
>> 
>> Thanks all.
>> 
>>   Erik
>> 






Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-24 Thread stocki

hey.

i got it =) 

i checked out with lucene and the build from solr. with ant -verbose
example.

now, when i put this line into solrconfig: org.apache.solr.spelling.suggest.Suggester
no exception occurs =) juhu

but how wokrs this component ?? sorry for a new stupid question ^^


stocki wrote:
> 
> okay, thx
> 
> so i checked out but i cannot build an build.
> 
> i got 100 errors ... 
> 
> D:\cygwin\home\stock\trunk_\solr\common-build.xml:424: The following error
> occur
> red while executing this line:
> D:\cygwin\home\stock\trunk_\solr\common-build.xml:281: The following error
> occur
> red while executing this line:
> D:\cygwin\home\stock\trunk_\solr\contrib\clustering\build.xml:69: The
> following
> error occurred while executing this line:
> D:\cygwin\home\stock\trunk_\solr\build.xml:155: The following error
> occurred whi
> le executing this line:
> D:\cygwin\home\stock\trunk_\solr\common-build.xml:221: Compile failed; see
> the c
> ompiler error output for details.
> 
> 
> 
> Lance Norskog-2 wrote:
>> 
>> You need 'ant' to do builds.  At the top level, do:
>> ant clean
>> ant example
>> 
>> These will build everything and set up the example/ directory. After
>> that, run:
>> ant test-core
>> 
>> to run all of the unit tests and make sure that the build works. If
>> the autosuggest patch has a test, this will check that the patch went
>> in correctly.
>> 
>> Lance
>> 
>> On Tue, Mar 23, 2010 at 7:42 AM, stocki  wrote:
>>>
>>> okay,
>>> i do this..
>>>
>>> but one file are not right updatet 
>>> Index: trunk/src/java/org/apache/solr/util/HighFrequencyDictionary.java
>>> (from the suggest.patch)
>>>
>>> i checkout it from eclipse, apply patch, make an new solr.war ... its
>>> the
>>> right way ??
>>> i thought that is making a war i didnt need to make an build.
>>>
>>> how do i make an build ?
>>>
>>>
>>>
>>>
>>> Alexey-34 wrote:

> Error loading class 'org.apache.solr.spelling.suggest.Suggester'
 Are you sure you applied the patch correctly?
 See http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

 Checkout Solr trunk source code (
 http://svn.apache.org/repos/asf/lucene/solr/trunk ), apply patch,
 verify that everything went smoothly, build solr and use built version
 for your tests.

 On Mon, Mar 22, 2010 at 9:42 PM, stocki  wrote:
>
> i patch an nightly build from solr.
> patch runs, classes are in the correct folder, but when i replace
> spellcheck
> with this spellchecl like in the comments, solr cannot find the
> classes
> =(
>
> 
>    
>      suggest
>       name="classname">org.apache.solr.spelling.suggest.Suggester
>       name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup
>      text
>      american-english
>    
>  
>
>
> --> SCHWERWIEGEND: org.apache.solr.common.SolrException: Error loading
> class
> 'org.ap
> ache.solr.spelling.suggest.Suggester'
>
>
> why is it so ??  i think no one has so many trouble to run a patch
> like
> me =( :D
>
>
> Andrzej Bialecki wrote:
>>
>> On 2010-03-19 13:03, stocki wrote:
>>>
>>> hello..
>>>
>>> i try to implement autosuggest component from these link:
>>> http://issues.apache.org/jira/browse/SOLR-1316
>>>
>>> but i have no idea how to do this !?? can anyone get me some tipps ?
>>
>> Please follow the instructions outlined in the JIRA issue, in the
>> comment that shows fragments of XML config files.
>>
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>   ___. ___ ___ ___ _ _   __
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/SOLR-1316-How-To-Implement-this-autosuggest-component-tp27950949p27990809.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp27950949p28001938.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
>> 
>> -- 
>> Lance Norskog
>> goks...@gmail.com
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp27950949p28018196.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: wikipedia and teaching kids search engines

2010-03-24 Thread Erick Erickson
Erik:

In a former incarnation, I thought I was going to teach 6th graders. Until I
found out I can't deal with 25 kids for 6 hours at a stretch for years on
end

My thoughts, presented in a "feel free to ignore but this is what I'd do"
spirit.
There are some random thoughts below, but here's what I'd think about...

Do a bit of an intro to the game. 10 minutes tops.

Make a game of sorts out of it. Some teams are the "indexers" and some are
the "searchers". Give them some simple rules to follow, perhaps different
ones for different pairs. Make sure some get surprising results (e.g. have
one indexing team stem, the paired search team not stem). The searchers
should rank the documents, you'll get some really surprising results.
Emphasize that the game isn't pass/fail, it's to show the kinds of things we
have to deal with.

Find some random near age-mates and try it once or twice before you present,
you'll undoubtedly change something. Maybe run it by a teacher or two.

Use that as a basis to discuss the fact that people who write the programs
that index/search have to cope with all the stuff they did, and the rules
are imperfect. And each decision is made to serve a need, and when the user
needs something *else*, it probably isn't a good match. And how horrible
things happen when one part of the team assumes something different than the
other part. And how end users don't care about all the internal stuff, they
just care about how well their needs were served

***here're my random musings, they may even be useful***
Outline what you want to cover. Then cut out 75% of it. Really. Forget
running SOLR, the kids don't care. Think about questions like "what's a
word?" "How is a stupid computer going to figure out what *you* want?"
"what's a document?"

Certainly do the exercise of presenting sentences and asking what they'd
expect, e.g.
"The dog is running", would you expect "run" to be a hit? ran? the? You can
work tokenizing in here, perhaps under the guise of "what's important when
searching?" Maybe even before the game above if you decide to do that.

Why or why not? Perhaps ask/talk about how a really stupid computer program
is supposed to figure stuff like this out.

Back up and tell them what a document is. How hard that is to define. Chris
M. is right on when he talks about hooking what they're interested in.

Maybe come up with some examples of really surprising results from searches,
and do a really *simple* explanation of how it got that way.

If you decide to go into scoring, stick with simplicity. Like "the more
times a word appears in a document, the more relevant it is". Can you even
guarantee that they'd understand phrasing this in terms of percentages?

FWIW
Erick

On Wed, Mar 24, 2010 at 10:40 AM, Erik Hatcher wrote:

> I've got a couple of questions for the community...
>
>  * what's the simplest way to get Solr up and running with a relatively
> richly schema'd index of a Wikipedia dump?
>
> What I'm looking for is something as easy as something along these lines:
>
>  java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar
>
>  cat wikipedia.bz2 | wikipedia_solr_indexer
>
> My goal is to index wikipedia in order to demonstrate search to a class of
> middle school kids that I've volunteered to teach for a couple of hours.
>  Which brings me to my next question...
>
>  * anyone have ideas on some basic hands-on ways of teaching search engine
> fundamentals?
>
> One idea I have is to bring some actual "documents", say a poster board
> with a sentence written largely on it, have the students physically
> *tokenize* the document by cutting it up and lexicographically building the
> term dictionary.  Thoughts on taking it further welcome!
>
> Thanks all.
>
>Erik
>
>


Re: Impossible Boost Query?

2010-03-24 Thread blargy

This sound a little closer to what I want but I don't want fully randomized
results. 

How exactly does this field work? Is it more than just a simple random sort
(order by rand())? What would be nice is if I could randomize documents
within a certain score percentage of each other. Is this available?

Thanks



Lance Norskog-2 wrote:
> 
> Also, there is a 'random' type which generates random numbers. This
> might help you also.
> 
> On Tue, Mar 23, 2010 at 7:18 PM, Lance Norskog  wrote:
>> At this point (and for almost 3 years :) field collapsing is a source
>> patch. You have to check out the Solr trunk from the Apache subversion
>> server, apply the patch with the 'patch' command, and build the new
>> Solr with 'ant'.
>>
>> On Tue, Mar 23, 2010 at 4:13 PM, blargy  wrote:
>>>
>>> Thanks but Im not quite show on how to apply the patch. I just use the
>>> packaged solr-1.4.0.war in my deployment (no compiling, etc). Is there a
>>> way
>>> I can patch the war file?
>>>
>>> Any instructions would be greatly appreciated. Thanks
>>>
>>>
>>> Otis Gospodnetic wrote:

 You'd likely want to get the latest patch and trunk and try applying.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/



 - Original Message 
> From: blargy 
> To: solr-user@lucene.apache.org
> Sent: Tue, March 23, 2010 6:10:22 PM
> Subject: Re: Impossible Boost Query?
>
>
 Maybe a better question is... how can I install this and will it work
> with
 1.4?

 Thanks


 blargy wrote:
>
> Possibly.
> How can I install this as a contrib or do I need to actually
> perform the
> patch?
>
>
> Otis Gospodnetic wrote:
>>
>
>> Would Field Collapsing from SOLR-236 do the job for
> you?
>>
>> Otis
>> 
>> Sematext ::
> href="http://sematext.com/"; target=_blank >http://sematext.com/ ::
> Solr -
> Lucene - Nutch
>> Hadoop ecosystem search ::
> href="http://search-hadoop.com/"; target=_blank
> >http://search-hadoop.com/
>>
>>
>>
>
>> - Original Message 
>>> From: blargy <
> ymailto="mailto:zman...@hotmail.com";
> href="mailto:zman...@hotmail.com";>zman...@hotmail.com>
>>>
> To:
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>>
> Sent: Tue, March 23, 2010 2:39:48 PM
>>> Subject: Impossible Boost
> Query?
>>>
>>>
>> I was wondering if this is
> even possible. I'll try to explain what I'm
>>> trying
>>
> to do to the best of my ability.
>>
>> Ok, so our site has a
> bunch
>>> of products that are sold by any number of
>>
> sellers. Currently when I search
>>> for some product I get back
> all products
>> matching that search term but the
>>>
> problem is there may be multiple products
>> sold by the same seller
> that are
>>> all closely related, therefore their
> scores
>> are related. So basically the
>>> search ends up
> with results that are all
>> closely clumped together by the same
>
>>> seller but I would much rather prefer
>> to distribute
> these results across
>>> sellers (given each seller a fair shot
> to
>> sell their goods).
>>
>> Is there
>
>>> any way to add some boost query for example that will
> start
>> weighing products
>>> lower when their seller has
> already been listed a few
>> times. For example,
>>> right
> now I have
>>
>> Product foo by Seller A
>> Product
> foo by Seller
>>> A
>> Product foo by Seller A
>>
> Product foo by Seller B
>> Product foo by Seller
>>>
> B
>> Product foo by Seller B
>> Product foo by Seller
> C
>> Product foo by Seller
>>> C
>> Product foo
> by Seller C
>>
>> where each result is very close in score. I
>
>>> would like something like this
>>
>> Product
> foo by Seller A
>> Product foo by
>>> Seller B
>>
> Product foo by Seller C
>> Product foo by Seller A
>> Product
> foo by
>>> Seller B
>> Product foo by Seller C
>>
> 
>>
>> basically distributing the
>>>
> results over the sellers. Is something like this
>> possible? I don't
> care if
>>> the solution involves a boost query or not. I
> just
>> want some way to
>>> distribute closely related
> documents.
>>
>> Thanks!!!
>> --
>> View
> this
>>> message in context:
>>> href="
> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
> target=_blank
> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>
>>> target=_blank
>>> >
> href="

Re: How do I create a solr core with the data from an existing one?

2010-03-24 Thread gwk

Hi,

I'm not sure if it's the best option but you could use replication to 
copy the index (http://wiki.apache.org/solr/SolrReplication). As long as 
you core is configured as a master you can use the fetchindex command to 
do a one-time replication from the new core (see the HTTP API section in 
the wiki page).


Regards,

gwk


On 3/24/2010 5:31 PM, Steve Dupree wrote:

*Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
of the core, and then swapping it in for the main core. I tried following
these steps:

1. Create prep core:

http://localhost:8983/solr/admin/cores?action=CREATE&name=prep&instanceDir=main
2. Perform index update, then commit/optimize on prep core.
3. Swap main and prep core:
http://localhost:8983/solr/admin/cores?action=SWAP&core=main&other=prep
4. Unload prep core:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=prep

The problem I am having is, the core created in step 1 doesn't have any data
in it. If I am going to do a full index of everything and the kitchen sink,
that would be fine, but if I just want to update a (large) subset of the
documents - that's obviously not going to work.

(I could merge the cores, but part of what I'm trying to do is get rid of
any deleted documents without trying to make a list of them.)

Is there some flag to the CREATE action that I'm missing? The Solr Wiki page
for CoreAdmin  is a little sparse on
details.

Is this approach wrong? I found at least one message on this list that
stated that performing updates in a separate core on the same machine won't
help, given that they're both using the same CPU. Is that true?
thanks in advance
~stannius

   


How do I create a solr core with the data from an existing one?

2010-03-24 Thread Steve Dupree
*Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
of the core, and then swapping it in for the main core. I tried following
these steps:

   1. Create prep core:
   
http://localhost:8983/solr/admin/cores?action=CREATE&name=prep&instanceDir=main
   2. Perform index update, then commit/optimize on prep core.
   3. Swap main and prep core:
   http://localhost:8983/solr/admin/cores?action=SWAP&core=main&other=prep
   4. Unload prep core:
   http://localhost:8983/solr/admin/cores?action=UNLOAD&core=prep

The problem I am having is, the core created in step 1 doesn't have any data
in it. If I am going to do a full index of everything and the kitchen sink,
that would be fine, but if I just want to update a (large) subset of the
documents - that's obviously not going to work.

(I could merge the cores, but part of what I'm trying to do is get rid of
any deleted documents without trying to make a list of them.)

Is there some flag to the CREATE action that I'm missing? The Solr Wiki page
for CoreAdmin  is a little sparse on
details.

Is this approach wrong? I found at least one message on this list that
stated that performing updates in a separate core on the same machine won't
help, given that they're both using the same CPU. Is that true?
thanks in advance
~stannius


Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
2010/3/24 Ahmet Arslan 

> > With this configuration, the title field is highlighted
> > only when there's a
> > perfect match, i.e., the quoted query equals the title
> > content (f.i.,
> > q="Terrain sehloul" allows highlighting the entire title
> > containing "Terrain
> > sehloul",
>
> Exactly. There should be a *perfect* match for string typed fields to
> return snippets.
>
> > but q=Terrain sehloul doesn't enable to highlight
> > this title. Is
> > there a solution to this problem?
>
> Escaping (using backslash) whitespace can solve this problem.
> q=Terrain\ sehloul
>
> Now i clearly understand you. You have a title field containing 'Terrain
> sehloul' and you want to get highlighting with the query Terrain. You cannot
> do that with type="string". You need a tokenized field type in your case.
>


Thank a lot Ahmet. In addition, I want to highlight phrases containing stop
words. I guess that the best way is to use a tokenized type without
stopwordFilter. Do you agree with me defining a new type for this purpose ?

By he way, I wanted to highlight a phrase using a tokenized field type, but
I got wrong result; I tried 2 cases (q=Terrain\ sehloul  and q="Terrain
sehloul"), and I got the following: Terrain sehloul

Any ideas?
Thanks


Re: wikipedia and teaching kids search engines

2010-03-24 Thread Markus Jelsma
A bit off-topic but how about Nutch grabbing some conent and have it indexed 
in Solr?

On Wednesday 24 March 2010 16:08:43 Christopher Laux wrote:
> Hi Erik,
> 
> I'm working on Wikipedia search and use Solr. Afaik it can't easily be
> done. The Wikipedia XML dump only provided the page title and author
> in terms of data one would search for. The rest requires parsing the
> Mediawiki markup for which there is no good one freely available
> (still writing my own). If you are happy with individual pages you
> could go with the HTML parser.
> 
> For the second part of your question, why don't you let them try
> competing tokenization strategies (with and w/o stemming etc.) and
> compare?
> 
> -Chris
> 
> On Wed, Mar 24, 2010 at 3:40 PM, Erik Hatcher  
wrote:
> > I've got a couple of questions for the community...
> >
> >  * what's the simplest way to get Solr up and running with a relatively
> > richly schema'd index of a Wikipedia dump?
> >
> > What I'm looking for is something as easy as something along these lines:
> >
> >  java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar
> >
> >  cat wikipedia.bz2 | wikipedia_solr_indexer
> >
> > My goal is to index wikipedia in order to demonstrate search to a class
> > of middle school kids that I've volunteered to teach for a couple of
> > hours. Which brings me to my next question...
> >
> >  * anyone have ideas on some basic hands-on ways of teaching search
> > engine fundamentals?
> >
> > One idea I have is to bring some actual "documents", say a poster board
> > with a sentence written largely on it, have the students physically
> > *tokenize* the document by cutting it up and lexicographically building
> > the term dictionary.  Thoughts on taking it further welcome!
> >
> > Thanks all.
> >
> >Erik
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: wikipedia and teaching kids search engines

2010-03-24 Thread Christopher Laux
Hi Erik,

I'm working on Wikipedia search and use Solr. Afaik it can't easily be
done. The Wikipedia XML dump only provided the page title and author
in terms of data one would search for. The rest requires parsing the
Mediawiki markup for which there is no good one freely available
(still writing my own). If you are happy with individual pages you
could go with the HTML parser.

For the second part of your question, why don't you let them try
competing tokenization strategies (with and w/o stemming etc.) and
compare?

-Chris


On Wed, Mar 24, 2010 at 3:40 PM, Erik Hatcher  wrote:
> I've got a couple of questions for the community...
>
>  * what's the simplest way to get Solr up and running with a relatively
> richly schema'd index of a Wikipedia dump?
>
> What I'm looking for is something as easy as something along these lines:
>
>  java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar
>
>  cat wikipedia.bz2 | wikipedia_solr_indexer
>
> My goal is to index wikipedia in order to demonstrate search to a class of
> middle school kids that I've volunteered to teach for a couple of hours.
>  Which brings me to my next question...
>
>  * anyone have ideas on some basic hands-on ways of teaching search engine
> fundamentals?
>
> One idea I have is to bring some actual "documents", say a poster board with
> a sentence written largely on it, have the students physically *tokenize*
> the document by cutting it up and lexicographically building the term
> dictionary.  Thoughts on taking it further welcome!
>
> Thanks all.
>
>        Erik
>
>


multiple binary documents into a single solr document - Vignette/OpenText integration

2010-03-24 Thread Fábio Aragão da Silva
hello there,
I'm working on the development of a piece of code that integrates Solr
with Vignette/OpenText Content Management, meaning Vignette content
instances will be indexed in solr when published and deleted from solr
when unpublished. I'm using solr 1.4, solrj and solr cell.

I've implemented most of the code and I've ran into only a single
issue so far: vignette content management supports the attachment of
multiple binary documents (such as .doc, .pdf or .xls files) to a
single content instance. I am mapping each content instance in
Vignette to a solr document, but now I have a content instance in
vignette with multiple binary files attached to it.

So my question is: is it possible to have more than one binary file
indexed into a single document in solr?

I'm a beginner in solr, but from what I understood I have two options
to index content using solrj: either to use UpdateRequest() and the
add() method to add a SolrInputDocument to the request (in case the
document doesn´t represent a binary file), or to use
ContentStreamUpdateRequest() and the addFile() method to add a binary
file to the content stream request.

I don't see a way, though, to say "this document is comprised of two
files, a word and a pdf, so index them as one document in solr using
content1 and content2 fields - or merge their content into a single
'content' field)".

I tried calling the addFile() twice (one call for each file) and no
error but nothing getting indexed as well.

ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
req.addFile(new File("file1.doc"));
req.addFile(new File("file2.pdf"));
req.setParam("literal.id", "multiple_files_test");
req.setParam("uprefix", "attr_");
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(req);

Any thoughts on this would be greatly appreciated.

greetings from Brazil,
Fábio.


Re: Issue w/ highlighting a String field

2010-03-24 Thread Ahmet Arslan
> With this configuration, the title field is highlighted
> only when there's a
> perfect match, i.e., the quoted query equals the title
> content (f.i.,
> q="Terrain sehloul" allows highlighting the entire title
> containing "Terrain
> sehloul", 

Exactly. There should be a *perfect* match for string typed fields to return 
snippets.

> but q=Terrain sehloul doesn't enable to highlight
> this title. Is
> there a solution to this problem?

Escaping (using backslash) whitespace can solve this problem. 
q=Terrain\ sehloul

Now i clearly understand you. You have a title field containing 'Terrain 
sehloul' and you want to get highlighting with the query Terrain. You cannot do 
that with type="string". You need a tokenized field type in your case. 


  


Re: wikipedia and teaching kids search engines

2010-03-24 Thread Mattmann, Chris A (388J)
Hey Erik,

One thing to think about (and I'm no expert at middle school kids) would be
to relate search somehow to a topic they are interested in. My 12 year old
nephew loves the NBA, so if I were to talk to him about search, I would try
and relate it to e.g., NBA.com, or understanding the difference between Kobe
(beef) say, and Kobe Bryant. Or trying to explain relevance in the context
of looking at Cars (the movie) versus looking for Cars (automobiles).

As far as interactivity, cutting up the document is a great idea. You may
also want to make a handout with some I don't want to call them "problems"
but let's say exercises that the kids can do involving using some of the
fundamentals that you cover with the cutting exercise to maybe then
identifying why (and most importantly how) the search engine can begin to
figure out if you were looking for a Kobe steak, versus Kobe the NBA star.

Just my 2 cents...

Cheers,
Chris



On 3/24/10 7:40 AM, "Erik Hatcher"  wrote:

> I've got a couple of questions for the community...
> 
>* what's the simplest way to get Solr up and running with a
> relatively richly schema'd index of a Wikipedia dump?
> 
> What I'm looking for is something as easy as something along these
> lines:
> 
>java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar
> 
>cat wikipedia.bz2 | wikipedia_solr_indexer
> 
> My goal is to index wikipedia in order to demonstrate search to a
> class of middle school kids that I've volunteered to teach for a
> couple of hours.  Which brings me to my next question...
> 
>   * anyone have ideas on some basic hands-on ways of teaching search
> engine fundamentals?
> 
> One idea I have is to bring some actual "documents", say a poster
> board with a sentence written largely on it, have the students
> physically *tokenize* the document by cutting it up and
> lexicographically building the term dictionary.  Thoughts on taking it
> further welcome!
> 
> Thanks all.
> 
> Erik
> 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




wikipedia and teaching kids search engines

2010-03-24 Thread Erik Hatcher

I've got a couple of questions for the community...

  * what's the simplest way to get Solr up and running with a  
relatively richly schema'd index of a Wikipedia dump?


What I'm looking for is something as easy as something along these  
lines:


  java -Dsolr.solr.home=./wikipedia_solr_home -jar start.jar

  cat wikipedia.bz2 | wikipedia_solr_indexer

My goal is to index wikipedia in order to demonstrate search to a  
class of middle school kids that I've volunteered to teach for a  
couple of hours.  Which brings me to my next question...


 * anyone have ideas on some basic hands-on ways of teaching search  
engine fundamentals?


One idea I have is to bring some actual "documents", say a poster  
board with a sentence written largely on it, have the students  
physically *tokenize* the document by cutting it up and  
lexicographically building the term dictionary.  Thoughts on taking it  
further welcome!


Thanks all.

Erik



Re: Configuring multiple SOLR apps to play nice with MBeans / JMX

2010-03-24 Thread Constantijn Visinescu
> it would probably be pretty trivial to add if you want to take a stab at a
patch for it.
> -Hoss

*stab*
https://issues.apache.org/jira/browse/SOLR-1843
:)


Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
> I didn't know that you are using dismax. In your query fields list there is
> no title field. Probably match is coming from title_tokenized, and when you
> request highlighting from title (hl.fl=title) it returns empty snippets. If
> thats the case it is pretty expected because string typed fields are not
> analyzed. I mean there is no partial matches on string fields. If your title
> contains "Terrain something" q=Terrain won't match this document.
> What are the title fields of returned documents?
>
>
You are right, the match is coming from the title_tokenized, but I also
added the field title to the qf clause, but still not working.


> We should re-write this url (just to query on title field) accourding to
> dismax: /?q=Terrain&debugQuery=on&hl=true&hl.fl=title&qf=title
>
>
/?q=Terrain&debugQuery=on&hl=true&hl.fl=title&qf=title   is not giving any
result, perhaps because title is not tokenized. I tried even phrases with
"", but still not working. On the other hand, I got highlighting
*working*by adding to the above URL the following:
&qf=title_tokenized.

With this configuration, the title field is highlighted only when there's a
perfect match, i.e., the quoted query equals the title content (f.i.,
q="Terrain sehloul" allows highlighting the entire title containing "Terrain
sehloul", but q=Terrain sehloul doesn't enable to highlight this title. Is
there a solution to this problem?

Thanks a lot.


Re: Configuring multiple SOLR apps to play nice with MBeans / JMX

2010-03-24 Thread Constantijn Visinescu
Don't know about other servlet containers, but i can confirm Resin 3 breaks
if you try to load 2 completely independent webapps into it that both use
solr with jmx enabled.

I also had a similar issue with Blaze DS (library for flash remoting that
I'm using to power the UI for my webapp), but Blaze DS uses the display-name
from the web.xml to register MBeans under. So making sure each webapp had a
different display name fixed that issue.

However solr always insists in using "solr" regardless of ... well pretty
much everything ;)

here's a part of the stack trace for informational purposes:

13:23:19,572  WARN JmxMonitoredMap:139 - Failed to register info bean:
org.apache.solr.highlight.HtmlFormatter
javax.management.InstanceNotFoundException:
solr:type=org.apache.solr.highlight.HtmlFormatter,id=org.apache.solr.highlight.HtmlFormatter
at
com.caucho.jmx.MBeanContext.unregisterMBean(MBeanContext.java:285)
at
com.caucho.jmx.AbstractMBeanServer.unregisterMBean(AbstractMBeanServer.java:477)
at
org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:135)
at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:508)
at org.apache.solr.core.SolrCore.(SolrCore.java:605)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
com.caucho.server.dispatch.FilterManager.createFilter(FilterManager.java:144)
at
com.caucho.server.dispatch.FilterManager.init(FilterManager.java:91)
at com.caucho.server.webapp.WebApp.start(WebApp.java:1871)

note1:
I get a bunch of these, one for every mbean

note2:
I did notice the javax.management.InstanceNotFoundException that suggests it
means the mentioned Mbean doesn't exist. However if i enable jmx for any one
of my applications everything works. However if i enable it for more then
one then one webapp it starts giving these errors for every webapp after the
first that gets loaded. That and my experiences with fixing this same error
under Blaze DS suggests to me a very high likelyhood of some sort of name
collision going on.

Constantijn


On Tue, Mar 23, 2010 at 11:23 PM, Chris Hostetter
wrote:

>
> : I'm having a problem trying to get multiple solr applications to run in
> the
> : same servlet container because they all try to claim "solr" as a
>
> Hmmm... i think you're in new territory here.   I don't know that anyone
> has ever mentioned doing this before.
>
> Honestly: I thought the hierarchical nature of JMX would mean that
> the Servlet Container would start up a JMX server, and present a seperate
> "branch" to each webapp in isolation -- based on what you're saying it
> sounds like different webapps can't actually break eachother by mucking
> with JMX Beans/values.
>
> : If a configuration option like  exists that'd fix my
> : problem but i can't seem to find it in the documentation.
>
> It doesn't, but it would probably be pretty trivial to add if you want to
> take a stab at a patch for it.
>
>
> -Hoss
>
>


Re: Issue w/ highlighting a String field

2010-03-24 Thread Ahmet Arslan

> I don't have defaultSearchField, instead, I have the
> following qf clause,
> where title_tokenized is a tokenized version of title 
>         name="qf"> title_tokenized^3 text_description_tokenized
> phonetic_text^0.5


I didn't know that you are using dismax. In your query fields list there is no 
title field. Probably match is coming from title_tokenized, and when you 
request highlighting from title (hl.fl=title) it returns empty snippets. If 
thats the case it is pretty expected because string typed fields are not 
analyzed. I mean there is no partial matches on string fields. If your title 
contains "Terrain something" q=Terrain won't match this document.
What are the title fields of returned documents?


> /?q=title:Terrain&debugQuery=on&hl=true&hl.fl=title
> 
> 
> > if it is zero, then it means that your match comes
> from your
> > defaultSearchField (not from title field).
> >
> > if it is not zero, highlighting should work. can you
> confirm this?
> >
> >
> this URL gives zero answer.  Again, I don't have
> defaultSearchField, the
> result is coming from the "qf" clause.

We should re-write this url (just to query on title field) accourding to 
dismax: /?q=Terrain&debugQuery=on&hl=true&hl.fl=title&qf=title





Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-24 Thread stocki

okay, thx 

i installed ant and want to build with ant. but java cannot compile, because
all the lucene files missed ... !?

package org.apache.lucene.search does not exist
and more... did i checkout the wrong trunk ? 
.../lucee/dev/solr/trunk


Lance Norskog-2 wrote:
> 
> You need 'ant' to do builds.  At the top level, do:
> ant clean
> ant example
> 
> These will build everything and set up the example/ directory. After that,
> run:
> ant test-core
> 
> to run all of the unit tests and make sure that the build works. If
> the autosuggest patch has a test, this will check that the patch went
> in correctly.
> 
> Lance
> 
> On Tue, Mar 23, 2010 at 7:42 AM, stocki  wrote:
>>
>> okay,
>> i do this..
>>
>> but one file are not right updatet 
>> Index: trunk/src/java/org/apache/solr/util/HighFrequencyDictionary.java
>> (from the suggest.patch)
>>
>> i checkout it from eclipse, apply patch, make an new solr.war ... its the
>> right way ??
>> i thought that is making a war i didnt need to make an build.
>>
>> how do i make an build ?
>>
>>
>>
>>
>> Alexey-34 wrote:
>>>
 Error loading class 'org.apache.solr.spelling.suggest.Suggester'
>>> Are you sure you applied the patch correctly?
>>> See http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
>>>
>>> Checkout Solr trunk source code (
>>> http://svn.apache.org/repos/asf/lucene/solr/trunk ), apply patch,
>>> verify that everything went smoothly, build solr and use built version
>>> for your tests.
>>>
>>> On Mon, Mar 22, 2010 at 9:42 PM, stocki  wrote:

 i patch an nightly build from solr.
 patch runs, classes are in the correct folder, but when i replace
 spellcheck
 with this spellchecl like in the comments, solr cannot find the classes
 =(

 
    
      suggest
      >>> name="classname">org.apache.solr.spelling.suggest.Suggester
      >>> name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup
      text
      american-english
    
  


 --> SCHWERWIEGEND: org.apache.solr.common.SolrException: Error loading
 class
 'org.ap
 ache.solr.spelling.suggest.Suggester'


 why is it so ??  i think no one has so many trouble to run a patch
 like
 me =( :D


 Andrzej Bialecki wrote:
>
> On 2010-03-19 13:03, stocki wrote:
>>
>> hello..
>>
>> i try to implement autosuggest component from these link:
>> http://issues.apache.org/jira/browse/SOLR-1316
>>
>> but i have no idea how to do this !?? can anyone get me some tipps ?
>
> Please follow the instructions outlined in the JIRA issue, in the
> comment that shows fragments of XML config files.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

 --
 View this message in context:
 http://old.nabble.com/SOLR-1316-How-To-Implement-this-autosuggest-component-tp27950949p27990809.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp27950949p28001938.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp27950949p28013791.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
2010/3/24 Ahmet Arslan 

> > There's a match between the query and
> > the content of field I want to
> > highlight on. Solr is giving me the id of the document
> > matching my query,
> > but it's not displaying the field I want to highlight on.
> >
> > Here's the definition of the field I want to highlight
> > on: > name="title" type="string" indexed="false"
> > stored="true"  />
> >
> > And here's part of my URL:
> > /?q=Terrain&debugQuery=on&hl=true&hl.fl=title
>
> With &q=Terrain you are querying your defaultSearchField and requesting
> highlighting from title field.
>

I don't have defaultSearchField, instead, I have the following qf clause,
where title_tokenized is a tokenized version of title  title_tokenized^3 text_description_tokenized
phonetic_text^0.5


>
> What is numFound when you hit this url? Highlighting comes?
>

the numFound is not zero, I get results, and also, in the highlighting
section, I get the id of the docs that matched my query


> /?q=title:Terrain&debugQuery=on&hl=true&hl.fl=title


> if it is zero, then it means that your match comes from your
> defaultSearchField (not from title field).
>
> if it is not zero, highlighting should work. can you confirm this?
>
>
this URL gives zero answer.  Again, I don't have defaultSearchField, the
result is coming from the "qf" clause.

What do you think?

Thanks.


Re: Issue w/ highlighting a String field

2010-03-24 Thread Ahmet Arslan
> There's a match between the query and
> the content of field I want to
> highlight on. Solr is giving me the id of the document
> matching my query,
> but it's not displaying the field I want to highlight on.
> 
> Here's the definition of the field I want to highlight
> on:         name="title" type="string" indexed="false"
> stored="true"  />
> 
> And here's part of my URL:
> /?q=Terrain&debugQuery=on&hl=true&hl.fl=title

With &q=Terrain you are querying your defaultSearchField and requesting 
highlighting from title field. 

What is numFound when you hit this url? Highlighting comes?

/?q=title:Terrain&debugQuery=on&hl=true&hl.fl=title

if it is zero, then it means that your match comes from your defaultSearchField 
(not from title field). 

if it is not zero, highlighting should work. can you confirm this?







Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-24 Thread Grant Ingersoll

On Mar 23, 2010, at 7:29 PM, brad anderson wrote:

> I see, so when you do a commit it adds it to Zoie's ramdirectory. So, could
> you just commit after every document without having a performance impact and
> have real time search?
> 

Not likely, maybe on really, really small indexes.  Zoie also does a 
writethrough, AIUI, to a file based index.


> Thanks,
> Brad
> 
> On 20 March 2010 00:34, Janne Majaranta  wrote:
> 
>> To my understanding it adds a in-memory index which holds the recent
>> commits and which is flushed to the main index based on the config options.
>> Not sure if it helps to get solr near real time. I am evaluating it
>> currently, and I am really not sure if it adds anything because of the cache
>> regeneration of solr on every commit ??
>> 
>> -Janne
>> 
>> Lähetetty iPodista
>> 
>> brad anderson  kirjoitti 19.3.2010 kello 20.53:
>> 
>> 
>> Indeed, which is why I'm wondering what is Zoie adding if you still need
>>> to
>>> commit to search recent documents. Does anyone know?
>>> 
>>> Thanks,
>>> Brad
>>> 
>>> On 18 March 2010 19:41, Erik Hatcher  wrote:
>>> 
>>> "When I don't do the commit, I cannot search the documents I've indexed."
 -
 that's exactly how Solr without Zoie works, and it's how Lucene itself
 works.  Gotta commit to see the documents indexed.
 
 Erik
 
 
 
 On Mar 18, 2010, at 5:41 PM, brad anderson wrote:
 
 Tried following their tutorial for plugging zoie into solr:
 
> http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Server
> 
> It appears it only allows you to search on documents after you do a
> commit?
> Am I missing something here, or does plugin not doing anything.
> 
> Their tutorial tells you to do a commit when you index the docs:
> 
> curl http://localhost:8983/solr/update/csv?commit=true --data-binary
> @books.csv -H 'Content-type:text/plain; charset=utf-8'
> 
> 
> When I don't do the commit, I cannot search the documents I've indexed.
> 
> Thanks,
> Brad
> 
> On 9 March 2010 23:34, Don Werve  wrote:
> 
> 2010/3/9 Shalin Shekhar Mangar 
> 
>> 
>> I think Don is talking about Zoie - it requires a long uniqueKey.
>> 
>>> 
>>> 
>>> Yep; we're using UUIDs.
>> 
>> 
>> 
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Cannot fetch urls with "target=_blank"

2010-03-24 Thread Stefano Cherchi
Right. Sorry for the OT.

S

 -- 
"Anyone proposing to run Windows on servers should be prepared to explain 
what they know about servers that Google, Yahoo, and Amazon don't."
Paul Graham


"A mathematician is a device for turning coffee into theorems."
Paul Erdos (who obviously never met a sysadmin)



- Messaggio originale -
> Da: Otis Gospodnetic 
> A: solr-user@lucene.apache.org
> Inviato: Mar 23 marzo 2010, 21:40:20
> Oggetto: Re: Cannot fetch urls with "target=_blank"
> 
> hi Stefano,

nutch-user@ is a much better place to ask this question 
> really.  You'll also want to include more info about "Nutch 
> fails".

Otis

Sematext :: 
> target=_blank >http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop 
> ecosystem search :: 
> >http://search-hadoop.com/



- Original Message 
> 
> From: Stefano Cherchi <
> ymailto="mailto:stefanocher...@yahoo.it"; 
> href="mailto:stefanocher...@yahoo.it";>stefanocher...@yahoo.it>
> 
> To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
> 
> Sent: Tue, March 23, 2010 1:40:46 PM
> Subject: Cannot fetch urls with 
> "target=_blank"
> 
> As in subject: when I try to fetch a page whose 
> link should open in new window, 
> Nutch fails. 

I know it is not a 
> Solr issue, actually, but I beg for a 
> 
> hint.

S

-- 
"Anyone proposing 
> 
> to run Windows on servers should be prepared to explain 
what they 
> know about 
> servers that Google, Yahoo, and Amazon don't."
Paul 
> Graham


"A 
> mathematician is a device for turning coffee into 
> theorems."
Paul Erdos (who 
> obviously never met a sysadmin)






Re: Issue w/ highlighting a String field

2010-03-24 Thread Saïd Radhouani
There's a match between the query and the content of field I want to
highlight on. Solr is giving me the id of the document matching my query,
but it's not displaying the field I want to highlight on.

Here's the definition of the field I want to highlight on:

And here's part of my URL: /?q=Terrain&debugQuery=on&hl=true&hl.fl=title

If I change the type to "text" instead of "string", the highlighting works
well!

Thanks for your help.
-S.



2010/3/23 Ahmet Arslan 

> > Thanks Erik. Actually, I restarted
> > and reindexed numers of time, but still
> > not working.
>
> Highlighting on string typed fields perferctly works. See the output of :
>
>
> http://localhost:8983/solr/select/?q=id%3ASOLR1000&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=id
>
> But there must be a match/hit to get highlighting. What is your query and
> candidate field content that you want to highlight?
>
>
>
>