Re: Facets with an IDF concept

2009-06-23 Thread Asif Rahman
Hi again,

I guess nobody has used facets in the way I described below before.  Do any
of the experts have any ideas as to how to do this efficiently and
correctly?  Any thoughts would be greatly appreciated.

Thanks,

Asif

On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman  wrote:

> Hi all,
>
> We have an index of news articles that are tagged with news topics.
> Currently, we use solr facets to see which topics are popular for a given
> query or time period.  I'd like to apply the concept of IDF to the facet
> counts so as to penalize the topics that occur broadly through our index.
> I've begun to write custom facet component that applies the IDF to the facet
> counts, but I also wanted to check if anyone has experience using facets in
> this way.
>
> Thanks,
>
> Asif
>



-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com


Re: Facets with an IDF concept

2009-06-23 Thread Kent Fitch
Hi Asif,

I was holding back because we have a similar problem, but we're not
sure how best to approach it, or even whether approaching it at all is
the right thing to do.

Background:
- large index (~35m documents)
- about 120k on these include full text book contents plus metadata,
the rest are just metadata
- we plan to increase number of full text books to around 1m, number
of records will greatly increase

We've found that because of the sheer volume of content in full text,
we get lots of results in full text of very low relevance. The Lucene
relevance ranking works wonderfully to "hide" these way down the list,
and when these are the only results at all, the user may be delighted
to find obscure hits.

But when you search for, say : soldier of fortune : one of the 55k+
results is Huck Finn, with 4 "soldier(s)" and 6 "fortunes", but it
probably isn't relevant.  The searcher will find it in the result
sets, but should the author, subject, dates, formats etc (our facets)
of Huck Finn be contributing to the facets shown to the user as
equally as, say, the top 500 results?  Maybe, but perhaps they are
"diluting" the value of facets contributed by the more relevant
results.

So, we are considering restricting the contents of the result bit set
used for faceting to exclude results with a very very low score (with
our own QueryComponent).  But there are problems:

- what's a low score?  How will a low score threshold vary across
queries? (Or should we use a rank cutoff instead, which is much more
expensive to compute, or some combo that works with results that only
have very low relevance results?)

- should we do this for all facets, or just some (where the less
relevant results seem particularly annoying, as they can "mask" facets
from the most relevant results - the authors, years and subjects we
have full text for are not representative of the whole corpus)

- if a searcher pages through to the 1000th result page, down to these
less relevant results, should we somehow include these results in the
facets we show?

sorry, only more questions!

Regards,

Kent Fitch

On Tue, Jun 23, 2009 at 5:58 PM, Asif Rahman wrote:
> Hi again,
>
> I guess nobody has used facets in the way I described below before.  Do any
> of the experts have any ideas as to how to do this efficiently and
> correctly?  Any thoughts would be greatly appreciated.
>
> Thanks,
>
> Asif
>
> On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman  wrote:
>
>> Hi all,
>>
>> We have an index of news articles that are tagged with news topics.
>> Currently, we use solr facets to see which topics are popular for a given
>> query or time period.  I'd like to apply the concept of IDF to the facet
>> counts so as to penalize the topics that occur broadly through our index.
>> I've begun to write custom facet component that applies the IDF to the facet
>> counts, but I also wanted to check if anyone has experience using facets in
>> this way.
>>
>> Thanks,
>>
>> Asif
>>
>
>
>
> --
> Asif Rahman
> Lead Engineer - NewsCred
> a...@newscred.com
> http://platform.newscred.com
>


No wildcards with solr.ASCIIFoldingFilterFactory?

2009-06-23 Thread vladimirneu

Hi all,

could somebody help me to understand why I can not search with wildcard if I
use the solr.ASCIIFoldingFilterFactory?

So I get results if I am searching for "münchen", "munchen" or "munchen*",
but I get no results if I do the search for "münchen*". The original records
contain the terms "München" and "Münchener".

The solr.ASCIIFoldingFilterFactory is configured on both sides index and
query. We are using the 1.4-dev version from trunk.


  



  
  



  


Thank you very much!

Regards,

Vladimir
-- 
View this message in context: 
http://www.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--tp24162104p24162104.html
Sent from the Solr - User mailing list archive at Nabble.com.



Search returning 0 results for "U2"

2009-06-23 Thread John G. Moylan
We are moving from Lucene indexer and readers to a Hybrid solution where
we still use the Lucene Indexer but use Solr for querying the index.

I am indexing our content using Lucene 2.3 and have a field called
"contents" which is tokenized and stored. When I search the contents
field for "U2" using Luke the correct document turns up. However,
searching for U2 in solr returns nothing. 

Any ideas?

Lucene field is as follows:

doc.add(new Field("contents", body.replaceAll("  ", ""),
Field.Store.YES, Field.Index.TOKENIZED));

Lucene is using the following analyzer:

result = new ISOLatin1AccentFilter(result);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//,
stopTable);
result = new PorterStemFilter(result);


In Solr I have the field mapped as a text field stored and indexed. The
text field uses


  







  
  








  



Regards,
John




***
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails to, from and within RTÉ may be subject to the Freedom
of Information Act 1997 and may be liable to disclosure.



Bug with QueryParser.

2009-06-23 Thread saurabhs_iitk

Hi,
I have tried both queryParser and MultifieldQueryParser. Suppose you want to
search Douglas Adams and give default operator as AND. Lets say on the field
author. 
Instead of generating query like +(author:douglas author:dougla)
+(author:adams author:adam) it generates
+author:douglas +author:adams +author:dougla +author:adam.

Can any one tell me how to fix this.
TIA 
Saurabh
-- 
View this message in context: 
http://www.nabble.com/Bug-with-QueryParser.-tp24163501p24163501.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExtractRequestHandler - not properly indexing office docs?

2009-06-23 Thread Grant Ingersoll
Can you change the text field to be stored and then point the  
LukeRequestHandler at that field (/admin/luke) and report back?  Also,  
can you post your full schema and config?


Finally, can you get the example to work?


On Jun 23, 2009, at 1:41 AM, cloax wrote:



I've tried 'text' ( taken from the example config ) and then tried  
creating a

new field called doc_content and using that. Neither has worked.


Grant Ingersoll-6 wrote:


What's your default search field?

On Jun 22, 2009, at 12:29 PM, cloax wrote:



Yep, I've tried both of those and still no joy. Here's both my curl
statement
and the resulting Solr log output.

curl
http://localhost:8983/solr/update/extract?ext.def.fl=text
\&ext.literal.id=1\&ext.map.div=text\&ext.capture=div
-F "myfi...@dj_character.doc"

Curls output:


0317


Solr log:
Jun 22, 2009 12:21:42 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params
={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
status=0 QTime=544
Jun 22, 2009 12:22:26 PM
org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[1]} 0 317
Jun 22, 2009 12:22:26 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract
params
={ext.map.div=text&ext.def.fl=text&ext.capture=div&ext.literal.id=1}
status=0 QTime=317
Jun 22, 2009 12:22:37 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params
=
{wt
=
standard
&rows
=
10
&start
=
0
&explainOther
=&hl.fl=&indent=on&q=kondel&fl=*,score&qt=standard&version=2.2}
hits=0 status=0 QTime=2

The submitted document has "kondel" in it numerous times, so Solr
should
have a hit. Yet it returns nothing. I also made sure I committed,
but that
didn't seem to help either.


Grant Ingersoll-6 wrote:


Do you have a default field declared?  &ext.default.fl=
Either that, or you need to explicitly capture the fields you are
interested in using &ext.capture=

You could add this to your curl statement to try out.

-Grant




--
View this message in context:
http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24150763.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search





--
View this message in context: 
http://www.nabble.com/ExtractRequestHandler---not-properly-indexing-office-docs--tp24120125p24159267.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Facets with an IDF concept

2009-06-23 Thread Grant Ingersoll


On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote:


Hi again,

I guess nobody has used facets in the way I described below before.   
Do any

of the experts have any ideas as to how to do this efficiently and
correctly?  Any thoughts would be greatly appreciated.

Thanks,

Asif

On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman   
wrote:



Hi all,

We have an index of news articles that are tagged with news topics.
Currently, we use solr facets to see which topics are popular for a  
given
query or time period.  I'd like to apply the concept of IDF to the  
facet
counts so as to penalize the topics that occur broadly through our  
index.
I've begun to write custom facet component that applies the IDF to  
the facet
counts, but I also wanted to check if anyone has experience using  
facets in

this way.



I'm not sure I'm following.  Would you be faceting on one field, but  
using the DF from some other field?  Faceting is already a count of  
all the documents that contain the term on a given field for that  
search.  If I'm understanding, you would still do the typical  
faceting, but then rerank by the global DF values, right?


Backing up, what is the problem you are seeing that you are trying to  
solve?


I think you could do this, but you'd have to hook it in yourself.  By  
penalize, do you mean remove, or just have them in the sort?   
Generally speaking, looking up the DF value can be expensive,  
especially if you do a lot of skipping around.  I don't know how  
pluggable the sort capabilities are for faceting, but that might be  
the place to start if you are just looking at the sorting options.




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Facets with an IDF concept

2009-06-23 Thread Asif Rahman
Hi Kent,

Your problem is close cousin of the problem that we're tackling.  We have
experience the same problem as you when calculating facets on MoreLikeThis
queries, since those queries tend to match a lot of documents.  We used one
of the solutions that you mentioned, rank cutoff, to solve it.  We first run
the MoreLikeThis query, then use the top N documents' unique ids as a filter
query for a second query.  The performance is still acceptable, however our
index size is smaller than yours by an order of magnitude.

Regards,

Asif

On Tue, Jun 23, 2009 at 10:34 AM, Kent Fitch  wrote:

> Hi Asif,
>
> I was holding back because we have a similar problem, but we're not
> sure how best to approach it, or even whether approaching it at all is
> the right thing to do.
>
> Background:
> - large index (~35m documents)
> - about 120k on these include full text book contents plus metadata,
> the rest are just metadata
> - we plan to increase number of full text books to around 1m, number
> of records will greatly increase
>
> We've found that because of the sheer volume of content in full text,
> we get lots of results in full text of very low relevance. The Lucene
> relevance ranking works wonderfully to "hide" these way down the list,
> and when these are the only results at all, the user may be delighted
> to find obscure hits.
>
> But when you search for, say : soldier of fortune : one of the 55k+
> results is Huck Finn, with 4 "soldier(s)" and 6 "fortunes", but it
> probably isn't relevant.  The searcher will find it in the result
> sets, but should the author, subject, dates, formats etc (our facets)
> of Huck Finn be contributing to the facets shown to the user as
> equally as, say, the top 500 results?  Maybe, but perhaps they are
> "diluting" the value of facets contributed by the more relevant
> results.
>
> So, we are considering restricting the contents of the result bit set
> used for faceting to exclude results with a very very low score (with
> our own QueryComponent).  But there are problems:
>
> - what's a low score?  How will a low score threshold vary across
> queries? (Or should we use a rank cutoff instead, which is much more
> expensive to compute, or some combo that works with results that only
> have very low relevance results?)
>
> - should we do this for all facets, or just some (where the less
> relevant results seem particularly annoying, as they can "mask" facets
> from the most relevant results - the authors, years and subjects we
> have full text for are not representative of the whole corpus)
>
> - if a searcher pages through to the 1000th result page, down to these
> less relevant results, should we somehow include these results in the
> facets we show?
>
> sorry, only more questions!
>
> Regards,
>
> Kent Fitch
>
> On Tue, Jun 23, 2009 at 5:58 PM, Asif Rahman wrote:
> > Hi again,
> >
> > I guess nobody has used facets in the way I described below before.  Do
> any
> > of the experts have any ideas as to how to do this efficiently and
> > correctly?  Any thoughts would be greatly appreciated.
> >
> > Thanks,
> >
> > Asif
> >
> > On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman  wrote:
> >
> >> Hi all,
> >>
> >> We have an index of news articles that are tagged with news topics.
> >> Currently, we use solr facets to see which topics are popular for a
> given
> >> query or time period.  I'd like to apply the concept of IDF to the facet
> >> counts so as to penalize the topics that occur broadly through our
> index.
> >> I've begun to write custom facet component that applies the IDF to the
> facet
> >> counts, but I also wanted to check if anyone has experience using facets
> in
> >> this way.
> >>
> >> Thanks,
> >>
> >> Asif
> >>
> >
> >
> >
> > --
> > Asif Rahman
> > Lead Engineer - NewsCred
> > a...@newscred.com
> > http://platform.newscred.com
> >
>



-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com


Re: SolrCore, reload, synonyms not reloaded

2009-06-23 Thread ranjitr

Hello,

I was having a similar problem during indexing with my synonyms file. But I
was able to resolve it using the steps you had outlined. Thanks!

I am wondering if there is a way to reload the index with a new synonym file
without Solr/Multi core support? I would really appreciate it if you could
post the steps to accomplish this.





hossman wrote:
> 
> 
> : I'm using Solr 1.3 and I've never been able to get the SolrCore
> (formerly
> : MultiCore) reload feature to pick up changes I made to my synonyms file. 
> At
> : index time I expand synonyms.  If I change my synonyms.txt file then do
> a
> : MultiCore RELOAD and then reindex my data and then do a query that
> should
> : work now that I added a synonym, it doesn't work.  If I go to the
> analysis
> : page and try putting in the text I see that it did pick up the changes. 
> I'm
> : forced to bring down the the webapp for the changes to truly be
> reloaded. 
> : Has anyone else seen this?  
> 
> David: I don't really use the Multi Core support, but your problem 
> descripting intrigued me so i tried it out, and i can *not* reproduce the 
> problem you are having.
> 
>   Steps i took
> 
> 1) applied the patch listed at the end of this email to the Solr trunk.  
> note that it adds a "text" field to the multicore "core1" example configs.  
> this field uses SynonymFilter at index time.  I also added a synonyms file 
> with "chris, hostetter" as the only entry.
> 
> 2) cd example; java -Dsolr.solr.home=multicore -jar start.jar
> 
> 3) java -Ddata=args -Durl=http://localhost:8983/solr/core1/update -jar
> post.jar '1chris and
> david'
> 
> 4) checked luke handler, confirmed that chris, hostetter, and, & david 
> were indexed terms.
> 
> 5) added "david, smiley" to my synonyms file
> 
> 6) http://localhost:8983/solr/admin/cores?action=RELOAD&core=core1
> 
> 7) repeated step #3
> 
> 8) confirmed with luke that "smiley" was now an indexed term.  also 
> confirmed that query for text:smiley found my doc
> 
> 
> Here's the patch...
> 
> 
> 
> Index: example/multicore/core1/conf/schema.xml
> ===
> --- example/multicore/core1/conf/schema.xml   (revision 693303)
> +++ example/multicore/core1/conf/schema.xml   (working copy)
> @@ -19,6 +19,18 @@
>  
>
>  omitNorms="true"/>
> +
> + positionIncrementGap="100">
> +  
> +
> +
> + synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> +  
> +  
> +
> +
> +  
> +
>
>  
>  
> @@ -27,6 +39,7 @@
> multiValued="false" /> 
> multiValued="false" /> 
> multiValued="false" /> 
> +   multiValued="false" /> 
>   
>  
>   
> Index: example/multicore/core1/conf/index_synonyms.txt
> ===
> --- example/multicore/core1/conf/index_synonyms.txt   (revision 0)
> +++ example/multicore/core1/conf/index_synonyms.txt   (revision 0)
> @@ -0,0 +1,2 @@
> +chris, hostetter
> +
> 
> Property changes on: example/multicore/core1/conf/index_synonyms.txt
> ___
> Name: svn:keywords
>+ Date Author Id Revision HeadURL
> Name: svn:eol-style
>+ native
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SolrCore%2C-reload%2C-synonyms-not-reloaded-tp19339767p24164306.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Numerical range faceting

2009-06-23 Thread gwk

gwk wrote:

Hi,

I'm currently using facet.query to do my numerical range faceting. I 
basically use a fixed price range of €0 to €1 in steps of €500 
which means 20 facet.queries plus an extra facet.query for anything 
above €1. I use the inclusive/exclusive query as per my question 
two days ago so the facets add up to the total number of products. 
This is done so that the javascript on my search page can accurately 
show the amount of products returned for a specified range before 
submitting it to the server by adding up the facet counts for the 
selected range.


I'm a bit concerned about the amount and size of my request to the 
server. Especially because there are other numerical values which 
might be interesting to facet on and I've noticed the server won't 
response correctly if I add (many) more facet.queries by decreasing 
the step size. I was really hoping for faceting options for numerical 
ranges similar to the date faceting options. The functionality would 
be practically identical as far as I can tell (which isn't very far as 
I know very little about the internals of Solr) so I was wondering if 
such options are planned or if I'm overlooking something.


Regards,

gwk

Hello,

Well since I got no response, I flexed my severely atrophied 
Java-muscles (Last time I used the language Swing was new) and dove 
straight into the Solr code. Well, not really, mostly I did some 
copy-pasting and with some assistance from the API Reference I was able 
to add numerical faceting on sortable numerical fields (it seems to work 
for both integers and floating point numbers) with a similar syntax to 
the date faceting.  I also added an extra parameter for whether the 
ranges should be inclusive or exclusive (on either end). And it seems to 
work. Although the quality of my code is not of the same grade as the 
rest of the Solr code (I was amazed how easy it was for me to add this 
feature).
I was wondering if someone is interested in a patch file and if so, 
where should I post it?


Regards,

gwk



As an example, the following query:

http://localhost:8080/select/?q=*%3A*&echoParams=none&rows=0&indent=on&facet=true&;
   facet.number=price&f.price.facet.number.start=0&
   f.price.facet.number.end=100&f.price.facet.number.gap=1&
   f.price.facet.number.other=all&f.price.facet.number.exclusive=end

yields the following results:





0
3








 
   1820
   2697
   2588
   2622

   2459
   2455
   2597
   2530
   2518
   2389

   

   18
   54
   19
   23
   43
   67

   1.0
   100.0
   0
   2733
   60974
 







Re: Facets with an IDF concept

2009-06-23 Thread Asif Rahman
Hi Grant,

I'll give a real life example of the problem that we are trying to solve.

We index a large number of current news articles on a continuing basis.  We
tag these articles with news topics (e.g. Barack Obama, Iran, etc.).  We
then use these tags to facet our queries.  For example, we might issue a
query for all articles in the last 24 hours.  The facets would then tell us
which news topics have been written about the most in that period.  The
problem is that "Barack Obama", for example, is always written about in high
frequency, as opposed to "Iran" which is currently very hot in the news, but
which has not always been the case.  In this case, we'd like to see "Iran"
show up higher than "Barack Obama" in the facet results.

To me, this seems identical to the tf-idf scoring expression that is used in
normal search.  The facet count is analogous to the tf and I can access the
facet term idf's through the Similarity API.

Is my reasoning sound?  Can you provide any guidance as to the best way to
implement this?

Thanks for your help,

Asif


On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll wrote:

>
> On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote:
>
>  Hi again,
>>
>> I guess nobody has used facets in the way I described below before.  Do
>> any
>> of the experts have any ideas as to how to do this efficiently and
>> correctly?  Any thoughts would be greatly appreciated.
>>
>> Thanks,
>>
>> Asif
>>
>> On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman  wrote:
>>
>>  Hi all,
>>>
>>> We have an index of news articles that are tagged with news topics.
>>> Currently, we use solr facets to see which topics are popular for a given
>>> query or time period.  I'd like to apply the concept of IDF to the facet
>>> counts so as to penalize the topics that occur broadly through our index.
>>> I've begun to write custom facet component that applies the IDF to the
>>> facet
>>> counts, but I also wanted to check if anyone has experience using facets
>>> in
>>> this way.
>>>
>>
>
> I'm not sure I'm following.  Would you be faceting on one field, but using
> the DF from some other field?  Faceting is already a count of all the
> documents that contain the term on a given field for that search.  If I'm
> understanding, you would still do the typical faceting, but then rerank by
> the global DF values, right?
>
> Backing up, what is the problem you are seeing that you are trying to
> solve?
>
> I think you could do this, but you'd have to hook it in yourself.  By
> penalize, do you mean remove, or just have them in the sort?  Generally
> speaking, looking up the DF value can be expensive, especially if you do a
> lot of skipping around.  I don't know how pluggable the sort capabilities
> are for faceting, but that might be the place to start if you are just
> looking at the sorting options.
>
>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com


Re: Numerical range faceting

2009-06-23 Thread Shalin Shekhar Mangar
On Tue, Jun 23, 2009 at 4:55 PM, gwk  wrote:

>
> I was wondering if someone is interested in a patch file and if so, where
> should I post it?
>

This seems useful. Please open an issue and submit a patch. I'm sure there
will be interest.

http://wiki.apache.org/solr/HowToContribute

-- 
Regards,
Shalin Shekhar Mangar.


Re: Facets with an IDF concept

2009-06-23 Thread Ian Holsman

Asif Rahman wrote:

Hi Grant,

I'll give a real life example of the problem that we are trying to solve.

We index a large number of current news articles on a continuing basis.  We
tag these articles with news topics (e.g. Barack Obama, Iran, etc.).  We
then use these tags to facet our queries.  For example, we might issue a
query for all articles in the last 24 hours.  The facets would then tell us
which news topics have been written about the most in that period.  The
problem is that "Barack Obama", for example, is always written about in high
frequency, as opposed to "Iran" which is currently very hot in the news, but
which has not always been the case.  In this case, we'd like to see "Iran"
show up higher than "Barack Obama" in the facet results.

  


your not looking for a IDF based function.
you need to figure out what a 'normal' amount of news flow for a given 
topic is and then determine when an abnormal amount is happening.

note.. that an abnormal amount is positive or negative.
we use a similar method to this on http://love.com, so we know for 
example something is going on with Ed McMahon as I type.


I wouldn't be looking at using SOLR to do this kind of thing btw. try 
something like esper. I think it might hold some promise to this kind of 
thing (esper is a open source stream database).


Regards


To me, this seems identical to the tf-idf scoring expression that is used in
normal search.  The facet count is analogous to the tf and I can access the
facet term idf's through the Similarity API.

Is my reasoning sound?  Can you provide any guidance as to the best way to
implement this?

Thanks for your help,

Asif


On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll wrote:

  

On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote:

 Hi again,


I guess nobody has used facets in the way I described below before.  Do
any
of the experts have any ideas as to how to do this efficiently and
correctly?  Any thoughts would be greatly appreciated.

Thanks,

Asif

On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman  wrote:

 Hi all,
  

We have an index of news articles that are tagged with news topics.
Currently, we use solr facets to see which topics are popular for a given
query or time period.  I'd like to apply the concept of IDF to the facet
counts so as to penalize the topics that occur broadly through our index.
I've begun to write custom facet component that applies the IDF to the
facet
counts, but I also wanted to check if anyone has experience using facets
in
this way.



I'm not sure I'm following.  Would you be faceting on one field, but using
the DF from some other field?  Faceting is already a count of all the
documents that contain the term on a given field for that search.  If I'm
understanding, you would still do the typical faceting, but then rerank by
the global DF values, right?

Backing up, what is the problem you are seeing that you are trying to
solve?

I think you could do this, but you'd have to hook it in yourself.  By
penalize, do you mean remove, or just have them in the sort?  Generally
speaking, looking up the DF value can be expensive, especially if you do a
lot of skipping around.  I don't know how pluggable the sort capabilities
are for faceting, but that might be the place to start if you are just
looking at the sorting options.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search






  




Re: Solr Authentication Problem

2009-06-23 Thread Allahbaksh Asadullah
HI All,
As I am using SVN build of solr 1.4 and in it I am not able to find this
method. Whether there is some changed in solr1.4 java client api.
Thanks in advance
Regards,
Allahbaksh


2009/6/23 Noble Paul നോബിള്‍ नोब्ळ् 

> I have raised an issue https://issues.apache.org/jira/browse/SOLR-1238
>
> there is patch attached to the issue.
>
>
> On Mon, Jun 22, 2009 at 1:40 PM, Allahbaksh Asadullah
>  wrote:
> >
> > Hi All,
> > I am facing getting error when I am using Authentication in Solr. I
> > followed Wiki. The error doesnot appear when I searching. Below is the
> > code snippet and the error.
> >
> > Please note I am using Solr 1.4 Development build from SVN.
> >
> >
> >HttpClient client=new HttpClient();
> >
> >AuthScope scope = new
> AuthScope(AuthScope.ANY_HOST,
> > AuthScope.ANY_PORT,null, null);
> >
> >client.getState().setCredentials(
> >
> >   scope,
> >
> >new UsernamePasswordCredentials("guest",
> "guest")
> >
> >);
> >
> >SolrServer server =new
> > CommonsHttpSolrServer("http://localhost:8983/solr",client);
> >
> >
> >
> >
> >
> >SolrInputDocument doc1=new SolrInputDocument();
> >
> >//Add fields to the document
> >
> >doc1.addField("employeeid", "1237");
> >
> >doc1.addField("employeename", "Ann");
> >
> >doc1.addField("employeeunit", "etc");
> >
> >doc1.addField("employeedoj",
> "1995-11-31T23:59:59Z");
> >
> >server.add(doc1);
> >
> >
> >
> >
> >
> > Exception in thread "main"
> > org.apache.solr.client.solrj.SolrServerException:
> > org.apache.commons.httpclient.ProtocolException: Unbuffered entity
> > enclosing request can not be repeated.
> >
> >at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:468)
> >
> >at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
> >
> >at
> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
> >
> >at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63)
> >
> >at
> test.SolrAuthenticationTest.(SolrAuthenticationTest.java:49)
> >
> >at
> test.SolrAuthenticationTest.main(SolrAuthenticationTest.java:113)
> >
> > Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
> > entity enclosing request can not be repeated.
> >
> >at
> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
> >
> >at
> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
> >
> >at
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
> >
> >at
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
> >
> >at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
> >
> >at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> >
> >at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> >
> >at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415)
> >
> >... 5 more.
> >
> > Thanks and regards,
> > Allahbaksh
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Allahbaksh Mohammedali Asadullah,
Software Engineering & Technology Labs,
Infosys Technolgies Limited, Electronic City,
Hosur Road, Bangalore 560 100, India.
(Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
Fax: 91-80-28520362 | Mobile: 91-9845505322.


RE: Data Import Handler

2009-06-23 Thread Mukerjee, Neiloy (Neil)
With the data-config file filled out, I am receiving errors telling me that the 
indexing of my database has failed. I think I have filled out everything I need 
to in the data-config file and that I have everything in the right directory. 
My details are described below, including locations of files, contents of the 
data-config file, and the errors I am seeing. Has anyone else seen problems 
like this? 

As of right now, I have data-config.xml in 
/usr/local/tomcat6.0.20/webapps/solr/, and I have the database bell_labs.sql in 
the solr/home directory /usr/local/tomcat6.0.20/solr/. 

Data-config.xml has the following contents:















When I go to http://localhost:8080/solr/dataimport, I see the following 
displayed to my browser:
This XML file does not appear to have any style information associated with it. 
The document tree is shown below.
  
−

−

0
0

−

−

−

/usr/local/tomcat6.0.20/webapps/solr/data-config.xml



idle

−

0:0:35.614
0
0
0
0
2009-06-23 09:24:15
Indexing failed. Rolled back all changes.
2009-06-23 09:24:15

−

This response format is experimental.  It is likely to change in the future.



When I go to http://localhost:8080/solr/admin/dataimport.jsp, I see two frames, 
the left frame having the DataImportHandler Development Console, and the right 
frame displaying the following:
This XML file does not appear to have any style information associated with it. 
The document tree is shown below.
  
−

−

0
24

−

−

−

/usr/local/tomcat6.0.20/webapps/solr/data-config.xml



full-import
debug

idle
Configuration Re-loaded sucessfully
−

0:0:0.19
0
0
0
0
2009-06-23 09:26:15
Indexing failed. Rolled back all changes.
2009-06-23 09:26:15

−

This response format is experimental.  It is likely to change in the future.




-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Monday, June 22, 2009 1:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Data Import Handler

On Mon, Jun 22, 2009 at 10:51 PM, Mukerjee, Neiloy (Neil) <
neil.muker...@alcatel-lucent.com> wrote:

>
> I suspect that the fact that the data-config file is blank is causing these
> issues, but per the documentation on the website, there is no indication of
> what, if anything, should go there - is there an alternate resource that
> anyone knows of which I could use?
>
>
The data-config.xml is the file which specified how and from where Solr can
pull data.

For example look at the full-import from a database data-config.xml at
http://wiki.apache.org/solr/DataImportHandler#head-c24dc86472fa50f3e87f744d3c80ebd9c31b791c

Or, look at the Slashdot feed example at
http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
-- 
Regards,
Shalin Shekhar Mangar.


Re: SolrCore, reload, synonyms not reloaded

2009-06-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
a singlecore single system has no admin commands.But it is possible to
setup a multicore with only one core and you will be able to reload
the core

On Tue, Jun 23, 2009 at 5:09 PM, ranjitr  wrote:
>
> Hello,
>
> I was having a similar problem during indexing with my synonyms file. But I
> was able to resolve it using the steps you had outlined. Thanks!
>
> I am wondering if there is a way to reload the index with a new synonym file
> without Solr/Multi core support? I would really appreciate it if you could
> post the steps to accomplish this.
>
>
>
>
>
> hossman wrote:
> >
> >
> > : I'm using Solr 1.3 and I've never been able to get the SolrCore
> > (formerly
> > : MultiCore) reload feature to pick up changes I made to my synonyms file.
> > At
> > : index time I expand synonyms.  If I change my synonyms.txt file then do
> > a
> > : MultiCore RELOAD and then reindex my data and then do a query that
> > should
> > : work now that I added a synonym, it doesn't work.  If I go to the
> > analysis
> > : page and try putting in the text I see that it did pick up the changes.
> > I'm
> > : forced to bring down the the webapp for the changes to truly be
> > reloaded.
> > : Has anyone else seen this?
> >
> > David: I don't really use the Multi Core support, but your problem
> > descripting intrigued me so i tried it out, and i can *not* reproduce the
> > problem you are having.
> >
> >       Steps i took
> >
> > 1) applied the patch listed at the end of this email to the Solr trunk.
> > note that it adds a "text" field to the multicore "core1" example configs.
> > this field uses SynonymFilter at index time.  I also added a synonyms file
> > with "chris, hostetter" as the only entry.
> >
> > 2) cd example; java -Dsolr.solr.home=multicore -jar start.jar
> >
> > 3) java -Ddata=args -Durl=http://localhost:8983/solr/core1/update -jar
> > post.jar '1chris and
> > david'
> >
> > 4) checked luke handler, confirmed that chris, hostetter, and, & david
> > were indexed terms.
> >
> > 5) added "david, smiley" to my synonyms file
> >
> > 6) http://localhost:8983/solr/admin/cores?action=RELOAD&core=core1
> >
> > 7) repeated step #3
> >
> > 8) confirmed with luke that "smiley" was now an indexed term.  also
> > confirmed that query for text:smiley found my doc
> >
> >
> > Here's the patch...
> >
> >
> >
> > Index: example/multicore/core1/conf/schema.xml
> > ===
> > --- example/multicore/core1/conf/schema.xml   (revision 693303)
> > +++ example/multicore/core1/conf/schema.xml   (working copy)
> > @@ -19,6 +19,18 @@
> >  
> >    
> >      > omitNorms="true"/>
> > +
> > +     > positionIncrementGap="100">
> > +      
> > +        
> > +        
> > +         > synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
> > +      
> > +      
> > +        
> > +        
> > +      
> > +    
> >    
> >
> >   
> > @@ -27,6 +39,7 @@
> >     > multiValued="false" />
> >     > multiValued="false" />
> >     > multiValued="false" />
> > +   > multiValued="false" />
> >   
> >
> >   
> > Index: example/multicore/core1/conf/index_synonyms.txt
> > ===
> > --- example/multicore/core1/conf/index_synonyms.txt   (revision 0)
> > +++ example/multicore/core1/conf/index_synonyms.txt   (revision 0)
> > @@ -0,0 +1,2 @@
> > +chris, hostetter
> > +
> >
> > Property changes on: example/multicore/core1/conf/index_synonyms.txt
> > ___
> > Name: svn:keywords
> >    + Date Author Id Revision HeadURL
> > Name: svn:eol-style
> >    + native
> >
> >
> >
> >
>
> --
> View this message in context: 
> http://www.nabble.com/SolrCore%2C-reload%2C-synonyms-not-reloaded-tp19339767p24164306.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Numerical range faceting

2009-06-23 Thread gwk

Shalin Shekhar Mangar wrote:

On Tue, Jun 23, 2009 at 4:55 PM, gwk  wrote:

  

I was wondering if someone is interested in a patch file and if so, where
should I post it?




This seems useful. Please open an issue and submit a patch. I'm sure there
will be interest.

  

Hi,

I cleaned up the code a bit, added some javadoc (I hope I did it 
correctly) and created a ticket: 
http://issues.apache.org/jira/browse/SOLR-1240


Regards,

gwk


Re: Search returning 0 results for "U2"

2009-06-23 Thread John G. Moylan
To answer my own question, the issue was with the WordDelimiter filter. 

Issue now resolved.

J

On Tue, 2009-06-23 at 11:13 +0100, John G. Moylan wrote:
> We are moving from Lucene indexer and readers to a Hybrid solution where
> we still use the Lucene Indexer but use Solr for querying the index.
> 
> I am indexing our content using Lucene 2.3 and have a field called
> "contents" which is tokenized and stored. When I search the contents
> field for "U2" using Luke the correct document turns up. However,
> searching for U2 in solr returns nothing. 
> 
> Any ideas?
> 
> Lucene field is as follows:
> 
> doc.add(new Field("contents", body.replaceAll("  ", ""),
> Field.Store.YES, Field.Index.TOKENIZED));
> 
> Lucene is using the following analyzer:
> 
> result = new ISOLatin1AccentFilter(result);
> result = new StandardFilter(result);
> result = new LowerCaseFilter(result);
> result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//,
> stopTable);
> result = new PorterStemFilter(result);
> 
> 
> In Solr I have the field mapped as a text field stored and indexed. The
> text field uses
> 
>  positionIncrementGap="100">
>   
> 
> 
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
>   
> 
> 
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>  words="stopwords.txt"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
> 
> 
> 
> Regards,
> John
> 
> 
> 
> 
> ***
> The information in this e-mail is confidential and may be legally privileged.
> It is intended solely for the addressee. Access to this e-mail by anyone else
> is unauthorised. If you are not the intended recipient, any disclosure,
> copying, distribution, or any action taken or omitted to be taken in reliance
> on it, is prohibited and may be unlawful.
> Please note that emails to, from and within RTÉ may be subject to the Freedom
> of Information Act 1997 and may be liable to disclosure.
> 

***
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails to, from and within RTÉ may be subject to the Freedom
of Information Act 1997 and may be liable to disclosure.



building custom RequestHandlers

2009-06-23 Thread Julian Davchev
I am using solr and php quite nicely.
Currently the work flow includes some manipulation on php side so I
correctly format the query string and pass to tomcat/solr.
I somehow want to build own request handler in java so I skip the whole
apache/php request that is just for formating.
This will saves me tons of requests to apache since I use solr directly
from javascript.

Would like to ask if there is something ready that I can use and adjust.
I am kinda new in Java but once I get the pointers
I think should be able to pull out.
Thanks,
JD




why EnglishPorterFilterFactory transforms germany to germani?

2009-06-23 Thread Julian Davchev
Hi,
Might be normal but I am confused why EnglishPorterFilterFactory 
transforms germany to germani

Cheers


Re: building custom RequestHandlers

2009-06-23 Thread Eric Pugh

Are you using the JavaScript interface to Solr?  
http://wiki.apache.org/solr/SolrJS

It may provide much of what you are looking for!

Eric

On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:


I am using solr and php quite nicely.
Currently the work flow includes some manipulation on php side so I
correctly format the query string and pass to tomcat/solr.
I somehow want to build own request handler in java so I skip the  
whole

apache/php request that is just for formating.
This will saves me tons of requests to apache since I use solr  
directly

from javascript.

Would like to ask if there is something ready that I can use and  
adjust.

I am kinda new in Java but once I get the pointers
I think should be able to pull out.
Thanks,
JD




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Re: why EnglishPorterFilterFactory transforms germany to germani?

2009-06-23 Thread Walter Underwood
That is how Porter stemmers work. They do not produce dictionary stems.
They produce a common token for different inflections of the same word.

The stem for "germanies" is also "germani". Example sentence: "The two
Germanies merged in 1990."

More info here: http://tartarus.org/~martin/PorterStemmer/

wunder

On 6/23/09 7:36 AM, "Julian Davchev"  wrote:

> Hi,
> Might be normal but I am confused why EnglishPorterFilterFactory
> transforms germany to germani
> 
> Cheers



Re: building custom RequestHandlers

2009-06-23 Thread Julian Davchev
Never used it.. I am just looking in docs how can I extend solr but no
luck so far :(
Hoping for some docs or real extend example.



Eric Pugh wrote:
> Are you using the JavaScript interface to Solr? 
> http://wiki.apache.org/solr/SolrJS
>
> It may provide much of what you are looking for!
>
> Eric
>
> On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:
>
>> I am using solr and php quite nicely.
>> Currently the work flow includes some manipulation on php side so I
>> correctly format the query string and pass to tomcat/solr.
>> I somehow want to build own request handler in java so I skip the whole
>> apache/php request that is just for formating.
>> This will saves me tons of requests to apache since I use solr directly
>> from javascript.
>>
>> Would like to ask if there is something ready that I can use and adjust.
>> I am kinda new in Java but once I get the pointers
>> I think should be able to pull out.
>> Thanks,
>> JD
>>
>>
>
> -
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>



Re: spellcheck. limit the suggested words by some field

2009-06-23 Thread Julian Davchev
I seem to have found answer to this one by digging 30mins in archives.
One approach is to use copyfield and store only stuff that is
interesting  e.g spell_city field that will have stuff only where
type is city

Second approach includes extending IndexBasedSpellchecker ..alas  I can
find nowhere in  docs how is this done.



Julian Davchev wrote:
> Hi,
> I have build spellcheck dictionary based on name field.
> It works like a charm but I'd like to limit the returned suggestion.
> For example we have following sturcutre
>
> id  name   type
> 1Berlin  city
> 2berganphony
>
>
> So when I search for  suggested words of "ber" I would get both Berlin
> and bergan  but I somehow want to limit to only those of type city.
> I tried with fq=type:city but this didn't help either.
>
> Any pointers are more than welcome.  The other approeach would be makind
> different spellcheck dictionaries based on type and just use the
> specific dictionary but then againI didn't see option howto build
> dictionary based on type.
>
> Thanks.
>   



Re: Data Import Handler

2009-06-23 Thread Shalin Shekhar Mangar
On Tue, Jun 23, 2009 at 7:12 PM, Mukerjee, Neiloy (Neil) <
neil.muker...@alcatel-lucent.com> wrote:

> With the data-config file filled out, I am receiving errors telling me that
> the indexing of my database has failed. I think I have filled out everything
> I need to in the data-config file and that I have everything in the right
> directory. My details are described below, including locations of files,
> contents of the data-config file, and the errors I am seeing. Has anyone
> else seen problems like this?
>

What error are you seeing? Can you please post the stack trace?


> As of right now, I have data-config.xml in
> /usr/local/tomcat6.0.20/webapps/solr/, and I have the database bell_labs.sql
> in the solr/home directory /usr/local/tomcat6.0.20/solr/.


What is bell_labs.sql? DataImportHandler imports from databases not from sql
dumps. Is your database jdbc:mysql://localhost/bell_labs running?


> Data-config.xml has the following contents:
> 
>  url="jdbc:mysql://localhost/bell_labs" user="root" password=""/>
>
>
>
>
>
>
>
>
>
>
>
> 
>

Note that if the column name in your database and the name of the field in
Solr is same, then you do not need to write the 'name' attribute in the
field tags.


>
> When I go to http://localhost:8080/solr/dataimport, I see the following
> displayed to my browser:
> This XML file does not appear to have any style information associated with
> it. The document tree is shown below.
>
> Indexing failed. Rolled back all changes.
> 2009-06-23 09:24:15
> 
>

It says that indexing failed. You should be able to see some exceptions in
the solr log. If you can post them here, we might be able to help you more.

-- 
Regards,
Shalin Shekhar Mangar.


Trie vs long string for sorting

2009-06-23 Thread Bill Dueber
I've having trouble understanding how the Trie type compares (speed- and
memory-wise) with dealing with long *string* (as opposed to integers).

My data are library call numbers, normalized to be comparable, resulting in
(maximum) 21-character strings of the form "RK 052180H359~999~999"

Now, these are fine -- they work for sorting and ranges and the whole thing,
but right now I can't use them because I've got two or three for each of my
6M documents and on a 32-bit machine I run out of heap.

Another option would be to turn them into longs (using roughly 56 bits of
the 64 bit space) and use a trie type. Is there any sort of a win involved
there?

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: building custom RequestHandlers

2009-06-23 Thread Eric Pugh
Like most things JavaScript, I found that I had to just dig through it  
and play with it.  However, the Reuters demo site was very easy to  
customize to interact with my own Solr instance, and I went from there.


On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote:


Never used it.. I am just looking in docs how can I extend solr but no
luck so far :(
Hoping for some docs or real extend example.



Eric Pugh wrote:

Are you using the JavaScript interface to Solr?
http://wiki.apache.org/solr/SolrJS

It may provide much of what you are looking for!

Eric

On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:


I am using solr and php quite nicely.
Currently the work flow includes some manipulation on php side so I
correctly format the query string and pass to tomcat/solr.
I somehow want to build own request handler in java so I skip the  
whole

apache/php request that is just for formating.
This will saves me tons of requests to apache since I use solr  
directly

from javascript.

Would like to ask if there is something ready that I can use and  
adjust.

I am kinda new in Java but once I get the pointers
I think should be able to pull out.
Thanks,
JD




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal








-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Initialize SOLR DataImportHandler

2009-06-23 Thread ice


We use the DataImportHandler for indexes from a RDBMS. Is there any way to
make sure that the import is run when the SOLR webapp/core starts up? Do we
need to send a command to SOLR to make this happen?
-- 
View this message in context: 
http://www.nabble.com/Initialize-SOLR-DataImportHandler-tp24167359p24167359.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: building custom RequestHandlers

2009-06-23 Thread Bill Dueber
Is it possible to change the javascript  output? I find some of the
information choices (e.g., that facet information is returned in a flat
list, with facet names in the even-numbered indexes and number-of-items
following them in the odd-numbered indexes) kind of annoying.

On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh  wrote:

> Like most things JavaScript, I found that I had to just dig through it and
> play with it.  However, the Reuters demo site was very easy to customize to
> interact with my own Solr instance, and I went from there.
>
>
> On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote:
>
>  Never used it.. I am just looking in docs how can I extend solr but no
>> luck so far :(
>> Hoping for some docs or real extend example.
>>
>>
>>
>> Eric Pugh wrote:
>>
>>> Are you using the JavaScript interface to Solr?
>>> http://wiki.apache.org/solr/SolrJS
>>>
>>> It may provide much of what you are looking for!
>>>
>>> Eric
>>>
>>> On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:
>>>
>>>  I am using solr and php quite nicely.
 Currently the work flow includes some manipulation on php side so I
 correctly format the query string and pass to tomcat/solr.
 I somehow want to build own request handler in java so I skip the whole
 apache/php request that is just for formating.
 This will saves me tons of requests to apache since I use solr directly
 from javascript.

 Would like to ask if there is something ready that I can use and adjust.
 I am kinda new in Java but once I get the pointers
 I think should be able to pull out.
 Thanks,
 JD



>>> -
>>> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
>>> http://www.opensourceconnections.com
>>> Free/Busy: http://tinyurl.com/eric-cal
>>>
>>>
>>>
>>>
>>>
>>
> -
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>
>


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: Facets with an IDF concept

2009-06-23 Thread Grant Ingersoll


On Jun 23, 2009, at 8:05 AM, Asif Rahman wrote:


Hi Grant,

I'll give a real life example of the problem that we are trying to  
solve.


We index a large number of current news articles on a continuing  
basis.  We
tag these articles with news topics (e.g. Barack Obama, Iran,  
etc.).  We
then use these tags to facet our queries.  For example, we might  
issue a
query for all articles in the last 24 hours.  The facets would then  
tell us
which news topics have been written about the most in that period.   
The
problem is that "Barack Obama", for example, is always written about  
in high
frequency, as opposed to "Iran" which is currently very hot in the  
news, but
which has not always been the case.  In this case, we'd like to see  
"Iran"

show up higher than "Barack Obama" in the facet results.

To me, this seems identical to the tf-idf scoring expression that is  
used in
normal search.  The facet count is analogous to the tf and I can  
access the

facet term idf's through the Similarity API.


I'd say faceting is akin to the DF (doc freq) part of search, not TF.   
TF is per document, DF is across all the docs.  Faceting is just  
counting all of docs that contain the various terms in that field  
across the results set.


Regardless of the semantics, it doesn't sound like DF would give you  
what you want.  It could be entirely possible that in some short  
timespan the number of docs on Iran could match up w/ the number on  
Obama (maybe not for that particular example) in which case your "hot"  
item would no longer appear hot.


One idea is that you could take baselines of all the facets nightly  
for that field (via *:* or something) and then you could track the  
trends that way by calculating the diffs.  Of course, you could then  
do this hour to hour and get into all kinds of trend detection stuff.   
In other words, it does seem like it's something you could do with Solr.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Question about index sizes.

2009-06-23 Thread Jim Adams
Can anyone give me a rule of thumb for knowing when you need to go to
multicore or shards?  How many records can be in an index before it breaks
down?  Does it break down?  Is it 10 million? 20 million?  50 million?

Thanks, Jim


Re: Upgrading 1.2.0 to 1.3.0 solr

2009-06-23 Thread Ryan Grange
Actually, it was a very straightforward installation.  I just tweaked 
the configurations afterward to better support for the new 1.3.0 
features I wanted to use (spelling suggestions and faceting).


Ryan T. Grange, IT Manager
DollarDays International, Inc.
rgra...@dollardays.com (480)922-8155 x106



Francis Yakin wrote:

DO you have experience to upgrade from 1.2.0 to 1.3.0?
In other words, do you have any suggestions or best if you have any docs or 
instructions for doing this.

I appreciate if you can help me.

Thanks

Francis


-Original Message-
From: Ryan Grange [mailto:rgra...@dollardays.com]
Sent: Thursday, June 11, 2009 8:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Upgrading 1.2.0 to 1.3.0 solr

I disagree with waiting that month.  At this point, most of the kinks in the 
upgrade from 1.2 to 1.3 have been worked out.  Waiting for 1.4 to come out 
risks you becoming a guinea pig for the upgrade procedure.
Plus, if any show-stoppers come along delaying 1.4, you delay implementation of 
your auto-complete function.  When 1.4 comes out, if it has any features you 
feel compel an upgrade, you can begin another round of testing and migration, 
but don't upgrade a production system just for the sake of being bleeding edge.

Ryan T. Grange, IT Manager
DollarDays International, Inc.
rgra...@dollardays.com (480)922-8155 x106



Otis Gospodnetic wrote:
  

Francis,

If you can wait another month or so, you could skip 1.3.0, and jump to 1.4 
which will be released soon.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Francis Yakin 
To: "solr-user@lucene.apache.org" 
Sent: Wednesday, June 10, 2009 1:17:25 AM
Subject: Upgrading 1.2.0 to 1.3.0 solr


I am in process to upgrade our solr 1.2.0 to solr 1.3.0

Our solr 1.2.0 now is working fine, we just want to upgrade it cause we have an 
application that requires some function from 1.3.0( we call it autocomplete).

Currently our config files on 1.2.0 are as follow:

Solrconfig.xml
Schema.xml ( we wrote this in house)
Index_synonyms.txt ( we also modified and wrote this in house)
Scripts.conf Protwords.txt Stopwords.txt Synonyms.txt

I understand on 1.3.0 , it has new solrconfig.xml .

My questions are:

1) what config files that I can reuse from 1.2.0 for 1.3.0
  can I use the same schema.xml
2) Solrconfig.xml, can I use the 1.2.0 version or I have to stick with 1.3.0
  If I need to stick with 1.3.0, what that I need to change.

As of right I am testing it in my sandbox, so it doesn't work.

Please advice, if you have any docs for upgrading 1.2.0 to 1.3.0 let me know.

Thanks in advance

Francis

Note: I attached my solrconfigand schema.xml in this email



-Inline Attachment Follows-
{edited out by Ryan for brevity}

  



  


RE: Question about index sizes.

2009-06-23 Thread Ensdorf Ken
That's a great question.  And the answer is, of course, it depends.  Mostly on 
the size of the documents you are indexing.  50 million rows from a database 
table with a handful of columns is very different from 50 million web pages,  
pdf documents, books, etc.

We currently have about 50 million documents split across 2 servers with 
reasonable performance - sub-second response time in most cases.  The total 
size of the 2 indices is about 300G.  I'd say most of the size is from stored 
fields, though we index just about everything.  This is on 64-bit ubuntu boxes 
with 32G of memory.  We haven't pushed this into production yet, but initial 
load-testing results look promising.

Hope this helps!

> -Original Message-
> From: Jim Adams [mailto:jasolru...@gmail.com]
> Sent: Tuesday, June 23, 2009 1:24 PM
> To: solr-user@lucene.apache.org
> Subject: Question about index sizes.
>
> Can anyone give me a rule of thumb for knowing when you need to go to
> multicore or shards?  How many records can be in an index before it
> breaks
> down?  Does it break down?  Is it 10 million? 20 million?  50 million?
>
> Thanks, Jim


Function query using Map

2009-06-23 Thread David Baker

Hi,

I'm trying to use the map function with a function query.  I want to map 
a particular value to 1 and all other values to 0.  We currently use the 
map function that has 4 parameters with no problem.  However, for the 
map function with 5 parameters, I get a parse error.  The following are 
the query and error returned:


_query_
id:[* TO *] _val_:"map(ethnicity,3,3,1,0)"

_error message_

*type* Status report
*message* _org.apache.lucene.queryParser.ParseException: Cannot parse 
'id:[* TO *] _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 
20 in 'map(ethnicity,3,3,1,0)'_
*description* _The request sent by the client was syntactically 
incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 
'id:[* TO *] _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 
20 in 'map(ethnicity,3,3,1,0)').

_

It appears that the parser never evaluates the map string for anything 
other than the 4 parameters version.  Could anyone give me some insight 
into this?  Thanks in advance.




Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-23 Thread Bradford Stephens
Greetings,

I've gotten a few replies on this, but I'd really like to know who
else is coming. Just send me a quick note :)

Cheers,
Bradford

On Mon, Jun 22, 2009 at 5:40 PM, Bradford
Stephens wrote:
> Hey all, just a friendly reminder that this is Wednesday! I hope to see
> everyone there again. Please let me know if there's something interesting
> you'd like to talk about -- I'll help however I can. You don't even need a
> Powerpoint presentation -- there's many whiteboards. I'll try to have a
> video cam, but no promises.
> Feel free to call at 904-415-3009 if you need directions or any questions :)
> ~~`
> Greetings,
>
> On the heels of our smashing success last month, we're going to be
> convening the Pacific Northwest (Oregon and Washington)
> Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the
> 24th.  The meeting should start at 6:45, organized chats will end
> around  8:00, and then there shall be discussion and socializing :)
>
> The meeting will be at the University of Washington in
> Seattle again. It's in the Computer Science building (not electrical
> engineering!), room 303, located
> here: http://www.washington.edu/home/maps/southcentral.html?80,70,792,660
>
> If you've ever wanted to learn more about distributed computing, or
> just see how other people are innovating with Hadoop, you can't miss
> this opportunity. Our focus is on learning and education, so every
> presentation must end with a few questions for the group to research
> and discuss. (But if you're an introvert, we won't mind).
>
> The format is two or three 15-minute "deep dive" talks, followed by
> several 5 minute "lightning chats". We had a few interesting topics
> last month:
>
> -Building a Social Media Analysis company on the Apache Cloud Stack
> -Cancer detection in images using Hadoop
> -Real-time OLAP on HBase -- is it possible?
> -Video and Network Flow Analysis in Hadoop vs. Distributed RDBMS
> -Custom Ranking in Lucene
>
> We already have one "deep dive" scheduled this month, on truly
> scalable Lucene with Katta. If you've been looking for a way to handle
> those large Lucene indices, this is a must-attend!
>
> Looking forward to seeing everyone there again.
>
> Cheers,
> Bradford
>
> http://www.roadtofailure.com -- The Fringes of Distributed Computing,
> Computer Science, and Social Media.


Re: building custom RequestHandlers

2009-06-23 Thread Julian Davchev
I am not sure we talk about same thing at all. I want to extend solr
(java)  so that I have another request handler in java
And I can do for example  /select?qt=myhandler&q=querystring

Then in this myhandler class in java ot whoever I will parse querystring
so I build the final correct query to pass to the engine.

So question is howto extend the class where to place the file, howto
recomplie, set in solrconfig.xml etc... so that it's all glued together
and can make use of it.

Bill Dueber wrote:
> Is it possible to change the javascript  output? I find some of the
> information choices (e.g., that facet information is returned in a flat
> list, with facet names in the even-numbered indexes and number-of-items
> following them in the odd-numbered indexes) kind of annoying.
>
> On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh
>> wrote:
>> 
>
>   
>> Like most things JavaScript, I found that I had to just dig through it and
>> play with it.  However, the Reuters demo site was very easy to customize to
>> interact with my own Solr instance, and I went from there.
>>
>>
>> On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote:
>>
>>  Never used it.. I am just looking in docs how can I extend solr but no
>> 
>>> luck so far :(
>>> Hoping for some docs or real extend example.
>>>
>>>
>>>
>>> Eric Pugh wrote:
>>>
>>>   
 Are you using the JavaScript interface to Solr?
 http://wiki.apache.org/solr/SolrJS

 It may provide much of what you are looking for!

 Eric

 On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:

  I am using solr and php quite nicely.
 
> Currently the work flow includes some manipulation on php side so I
> correctly format the query string and pass to tomcat/solr.
> I somehow want to build own request handler in java so I skip the whole
> apache/php request that is just for formating.
> This will saves me tons of requests to apache since I use solr directly
> from javascript.
>
> Would like to ask if there is something ready that I can use and adjust.
> I am kinda new in Java but once I get the pointers
> I think should be able to pull out.
> Thanks,
> JD
>
>
>
>   
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
 http://www.opensourceconnections.com
 Free/Busy: http://tinyurl.com/eric-cal





 
>> -
>> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com
>> Free/Busy: http://tinyurl.com/eric-cal
>>
>>
>>
>>
>>
>> 
>
>
>   



Re: Auto suggest.. how to do mixed case

2009-06-23 Thread Mani Kumar
hi shalin,
can you please share code or tutorial documents for (it'll be great help)

  1. Prefix search on shingles
  2. Exact (phrase) search on n-grams

The regular prefix search also works. The good thing with these is that you
can filter and different stored value is also possible.

??


thanks!
mani

On Mon, Jun 22, 2009 at 4:41 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Jun 22, 2009 at 2:55 PM, Ingo Renner  wrote:
>
> >
> > Hi Shalin,
> >
> >  I think
> >> that by naming it as /autoSuggest, a lot of users have been misled since
> >> there are other techniques available.
> >>
> >
> > what would you suggest?
> >
> >
> There are many techniques. Personally, I've used
>
>   1. Prefix search on shingles
>   2. Exact (phrase) search on n-grams
>
> The regular prefix search also works. The good thing with these is that you
> can filter and different stored value is also possible.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Solr Logging in Weblogic

2009-06-23 Thread Ryan Heinen
Has anyone been able to successfully configure logging from Solr in Weblogic? I 
am trying to increase the verbosity of the logs as I am seeing some strange 
behavior during indexing, but have not been able to make this work. Changes to 
Server -> Configuration -> Logging section in the Weblogic admin console don't 
seem to have any effect. Is even a way to configure the log level using 
Weblogic, or does it need to be done using logging.properties in the JVM?

Ryan
--
Ryan Heinen, Sr. Software Engineer
Phone 604.408.8078 ext. 243
Email: ryan.hei...@elasticpath.com

Elastic Path Software, Inc.
800 - 1045 Howe Street, Vancouver, BC V6Z 2A9
Fax: 604.408.8079
Web: www.elasticpath.com
Blog: www.getelastic.com
Community: http://grep.elasticpath.com






Re: building custom RequestHandlers

2009-06-23 Thread Chris Hostetter

: Is it possible to change the javascript  output? I find some of the
: information choices (e.g., that facet information is returned in a flat
: list, with facet names in the even-numbered indexes and number-of-items
: following them in the odd-numbered indexes) kind of annoying.

Did you look at the optional params for the JSON output format? (ie: 
json.nl)...

http://wiki.apache.org/solr/SolJSON




-Hoss



Re: building custom RequestHandlers

2009-06-23 Thread Chris Hostetter

: So question is howto extend the class where to place the file, howto
: recomplie, set in solrconfig.xml etc... so that it's all glued together
: and can make use of it.

I would start here...
http://wiki.apache.org/solr/SolrPlugins

...and then ask specific questions as you encounter them.




-Hoss



Re: building custom RequestHandlers

2009-06-23 Thread Julian Davchev
Is it just me or this is thread steal? nothing todo with what thread is
originally about.
Cheers

Bill Dueber wrote:
> Is it possible to change the javascript  output? I find some of the
> information choices (e.g., that facet information is returned in a flat
> list, with facet names in the even-numbered indexes and number-of-items
> following them in the odd-numbered indexes) kind of annoying.
>
> On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh
>> wrote:
>> 
>
>   
>> Like most things JavaScript, I found that I had to just dig through it and
>> play with it.  However, the Reuters demo site was very easy to customize to
>> interact with my own Solr instance, and I went from there.
>>
>>
>> On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote:
>>
>>  Never used it.. I am just looking in docs how can I extend solr but no
>> 
>>> luck so far :(
>>> Hoping for some docs or real extend example.
>>>
>>>
>>>
>>> Eric Pugh wrote:
>>>
>>>   
 Are you using the JavaScript interface to Solr?
 http://wiki.apache.org/solr/SolrJS

 It may provide much of what you are looking for!

 Eric

 On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:

  I am using solr and php quite nicely.
 
> Currently the work flow includes some manipulation on php side so I
> correctly format the query string and pass to tomcat/solr.
> I somehow want to build own request handler in java so I skip the whole
> apache/php request that is just for formating.
> This will saves me tons of requests to apache since I use solr directly
> from javascript.
>
> Would like to ask if there is something ready that I can use and adjust.
> I am kinda new in Java but once I get the pointers
> I think should be able to pull out.
> Thanks,
> JD
>
>
>
>   
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
 http://www.opensourceconnections.com
 Free/Busy: http://tinyurl.com/eric-cal





 
>> -
>> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com
>> Free/Busy: http://tinyurl.com/eric-cal
>>
>>
>>
>>
>>
>> 
>
>
>   



Re: No wildcards with solr.ASCIIFoldingFilterFactory?

2009-06-23 Thread Mark Miller
Wildcard queries are not analyzed, so you are getting what you type - 
which doesnt match what went through an analyzer and into the index. I 
don't think Solr has a solution for this at the moment. I think Lucene 
has a special analyzer with deals with this to some degree, but I have 
never used it.



--
- Mark

http://www.lucidimagination.com



vladimirneu wrote:

Hi all,

could somebody help me to understand why I can not search with wildcard if I
use the solr.ASCIIFoldingFilterFactory?

So I get results if I am searching for "münchen", "munchen" or "munchen*",
but I get no results if I do the search for "münchen*". The original records
contain the terms "München" and "Münchener".

The solr.ASCIIFoldingFilterFactory is configured on both sides index and
query. We are using the 1.4-dev version from trunk.


  



  
  



  


Thank you very much!

Regards,

Vladimir
  






Re: Facets with an IDF concept

2009-06-23 Thread Chris Hostetter

: Regardless of the semantics, it doesn't sound like DF would give you what you
: want.  It could be entirely possible that in some short timespan the number of
: docs on Iran could match up w/ the number on Obama (maybe not for that
: particular example) in which case your "hot" item would no longer appear hot.

but if hte numbers match up in that timespan then the "hot" item isn't as 
"hot" anymore.

Myabe i'm missunderstanding: but it sounds like Asif's question esentailly 
boils down to getting facet constraints sorted after using some 
normalizing fraction ... the simplest case being the inverse ratio (this 
is where i think Asif is comparing it to IDF) of the number of matches for 
that facet in some larger docset to the size of the docset-- typically 
that docset could be the entire index, but it could also be the same 
search over a large window of time.

So if i was doing a news search for all docs in the last 24 hours, I could 
multiple each of those facet counts by the ratio of the corrisponding 
counts from the past month to the number of articles from the past monght 
see how much "hotter" they are in my smaller result set...

current result set facet counts (X)...
  News:1100
  Obama:1000
  Iran:800
  Miley Cyrus:700
  iPod:500

facet counts from the past month (Y), during which type 9000 (Z)
documents were published...
  News:9000
  Obama:7000
  Iran:1000
  Miley Cyrus:4000
  iPod:5000

X*(Z/Y)...
  Iran:7200
  Miley Cyrus:1575
  Obama:1285.7
  News:1100
  iPod:900
  

Doing this in a Solr plugin would be the best way to to this -- because 
otherwise your "hot" terms might not even show up in the facet lists.  
any attempt to do it on the client would just be an approximation, and 
could easily miss the "hottest" item if it was just below cutoff for hte 
number of constraints to be returned.


-Hoss



Re: Solr relevancy score - conversion

2009-06-23 Thread Chris Hostetter

On Mon, 8 Jun 2009, Vijay_here wrote:

: Would need an more proportionate score like rounded to 100% (95% relevant,
: 80 % relevant and so on). Is there a way to make solr returns such scores of
: such relevance. Any other approach to arrive at this scores also be
: appreciated

There is a reason Solr doens't return scores like that -- they are 
meaningless, more info is availabe on the Lucene-Java wiki...

http://wiki.apache.org/lucene-java/ScoresAsPercentages



-Hoss



Re: qf boost Versus field boost for Dismax queries

2009-06-23 Thread Chris Hostetter

On Tue, 9 Jun 2009, ashokc wrote:

: When 'dismax' queries are use, where is the best place to apply boost
: values/factors? While indexing by supplying the 'boost' attribute to the
: field, or in solrconfig.xml by specifying the 'qf' parameter with the same
: boosts? What are the advantages/disadvantages to each? What happens if both

This is discussed in the Lucene-Java FAQ...

http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7

What is the difference between field (or document) boosting and query 
boosting?

Index time field boosts (field.setBoost(boost)) are a way to express 
things like "this document's title is worth twice as much as the title of 
most documents". Query time boosts (query.setBoost(boost)) are a way to 
express "I care about matches on this clause of my query twice as much as 
I do about matches on other clauses of my query".

Index time field boosts are worthless if you set them on every document.

Index time document boosts (doc.setBoost(float)) are equivalent to setting 
a field boost on ever field in that document. 




-Hoss



Re: Search returning 0 results for "U2"

2009-06-23 Thread Otis Gospodnetic

John,

This is simply the case of mismatching index-time and query-time analyzers.  
When you use Luke you get the match, but Luke doesn't use the tokenizer+filters 
you specified in Solr for your field.  In your Solr installation, go to Solr 
Admin page, then to Analysis page, enter U2 and select all other relevant 
checkboxes to see how U2 is getting analyzed by Solr.  You should be able to 
spot the incompatibility then.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: John G. Moylan 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 23, 2009 6:13:32 AM
> Subject: Search returning 0 results for "U2"
> 
> We are moving from Lucene indexer and readers to a Hybrid solution where
> we still use the Lucene Indexer but use Solr for querying the index.
> 
> I am indexing our content using Lucene 2.3 and have a field called
> "contents" which is tokenized and stored. When I search the contents
> field for "U2" using Luke the correct document turns up. However,
> searching for U2 in solr returns nothing. 
> 
> Any ideas?
> 
> Lucene field is as follows:
> 
> doc.add(new Field("contents", body.replaceAll("  ", ""),
> Field.Store.YES, Field.Index.TOKENIZED));
> 
> Lucene is using the following analyzer:
> 
> result = new ISOLatin1AccentFilter(result);
> result = new StandardFilter(result);
> result = new LowerCaseFilter(result);
> result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//,
> stopTable);
> result = new PorterStemFilter(result);
> 
> 
> In Solr I have the field mapped as a text field stored and indexed. The
> text field uses
> 
> 
> positionIncrementGap="100">
>   
> 
> 
> 
> ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
> 
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> protected="protwords.txt"/>
> 
>   
>   
> 
> 
> 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> words="stopwords.txt"/>
> 
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
> 
> protected="protwords.txt"/>
> 
>   
> 
> 
> 
> Regards,
> John



Re: Facets with an IDF concept

2009-06-23 Thread Otis Gospodnetic

Hi,

Hm, I don't think facets (nor pure search/Solr) are the right tool for this 
job.  I think you have to do what Ian said, which is to compute the baseline 
for various concepts of interest (Barack Obama and Iran in your example), and 
then compare.

Look at point #2 on http://www.sematext.com/product-key-phrase-extractor.html . 
 I think this is what you are after, and you will even see an example that 
matches yours very closely.  My guess is that's how 
http://www.google.com/trends/hottrends works, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Asif Rahman 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 23, 2009 8:05:48 AM
> Subject: Re: Facets with an IDF concept
> 
> Hi Grant,
> 
> I'll give a real life example of the problem that we are trying to solve.
> 
> We index a large number of current news articles on a continuing basis.  We
> tag these articles with news topics (e.g. Barack Obama, Iran, etc.).  We
> then use these tags to facet our queries.  For example, we might issue a
> query for all articles in the last 24 hours.  The facets would then tell us
> which news topics have been written about the most in that period.  The
> problem is that "Barack Obama", for example, is always written about in high
> frequency, as opposed to "Iran" which is currently very hot in the news, but
> which has not always been the case.  In this case, we'd like to see "Iran"
> show up higher than "Barack Obama" in the facet results.
> 
> To me, this seems identical to the tf-idf scoring expression that is used in
> normal search.  The facet count is analogous to the tf and I can access the
> facet term idf's through the Similarity API.
> 
> Is my reasoning sound?  Can you provide any guidance as to the best way to
> implement this?
> 
> Thanks for your help,
> 
> Asif
> 
> 
> On Tue, Jun 23, 2009 at 1:19 PM, Grant Ingersoll wrote:
> 
> >
> > On Jun 23, 2009, at 3:58 AM, Asif Rahman wrote:
> >
> >  Hi again,
> >>
> >> I guess nobody has used facets in the way I described below before.  Do
> >> any
> >> of the experts have any ideas as to how to do this efficiently and
> >> correctly?  Any thoughts would be greatly appreciated.
> >>
> >> Thanks,
> >>
> >> Asif
> >>
> >> On Wed, Jun 17, 2009 at 12:42 PM, Asif Rahman wrote:
> >>
> >>  Hi all,
> >>>
> >>> We have an index of news articles that are tagged with news topics.
> >>> Currently, we use solr facets to see which topics are popular for a given
> >>> query or time period.  I'd like to apply the concept of IDF to the facet
> >>> counts so as to penalize the topics that occur broadly through our index.
> >>> I've begun to write custom facet component that applies the IDF to the
> >>> facet
> >>> counts, but I also wanted to check if anyone has experience using facets
> >>> in
> >>> this way.
> >>>
> >>
> >
> > I'm not sure I'm following.  Would you be faceting on one field, but using
> > the DF from some other field?  Faceting is already a count of all the
> > documents that contain the term on a given field for that search.  If I'm
> > understanding, you would still do the typical faceting, but then rerank by
> > the global DF values, right?
> >
> > Backing up, what is the problem you are seeing that you are trying to
> > solve?
> >
> > I think you could do this, but you'd have to hook it in yourself.  By
> > penalize, do you mean remove, or just have them in the sort?  Generally
> > speaking, looking up the DF value can be expensive, especially if you do a
> > lot of skipping around.  I don't know how pluggable the sort capabilities
> > are for faceting, but that might be the place to start if you are just
> > looking at the sorting options.
> >
> >
> >
> > --
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> > Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
> 
> 
> -- 
> Asif Rahman
> Lead Engineer - NewsCred
> a...@newscred.com
> http://platform.newscred.com



Query regarding Solr search options..

2009-06-23 Thread Silent Surfer

Hi,

Can Solr search be customized to provide N number of lines before and after the 
line that contains matches the keyword.

For eg: Suppose i have a document with 10 lines, and 5th line contains the key 
word 'X' I am interested in. Now if I am fire a Solr search for the keyword 
'X'. Is there any preference/option available in Solr, which can be set so the 
search results contains only the 3 lines above and 3 lines after the line where 
the Keyword match successfully.

Thanks,
Silent Surfer


  


Re: Query regarding Solr search options..

2009-06-23 Thread Otis Gospodnetic

Hello,

Not quite "lines", but look at the various Highlighter options on the Wiki and 
in the example solrconfig.xml.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Silent Surfer 
> To: Solr User 
> Sent: Tuesday, June 23, 2009 11:04:53 PM
> Subject: Query regarding Solr search options..
> 
> 
> Hi,
> 
> Can Solr search be customized to provide N number of lines before and after 
> the 
> line that contains matches the keyword.
> 
> For eg: Suppose i have a document with 10 lines, and 5th line contains the 
> key 
> word 'X' I am interested in. Now if I am fire a Solr search for the keyword 
> 'X'. 
> Is there any preference/option available in Solr, which can be set so the 
> search 
> results contains only the 3 lines above and 3 lines after the line where the 
> Keyword match successfully.
> 
> Thanks,
> Silent Surfer



Re: Facets with an IDF concept

2009-06-23 Thread Grant Ingersoll


On Jun 23, 2009, at 6:23 PM, Chris Hostetter wrote:



: Regardless of the semantics, it doesn't sound like DF would give  
you what you
: want.  It could be entirely possible that in some short timespan  
the number of
: docs on Iran could match up w/ the number on Obama (maybe not for  
that
: particular example) in which case your "hot" item would no longer  
appear hot.


but if hte numbers match up in that timespan then the "hot" item  
isn't as

"hot" anymore.


Not necessarily true.  Consider the case where over the year there are  
50 stories about Obama.  Then, in the span of 5 days, there are 50  
stories about Iran.  Iran, in my view, is still hotter than Obama.  In  
Asif's case, he was suggesting comparing against the global DF.


Not to worry, though, your proposal is much the same as mine, namely  
take a baseline based on some set of docs (I chose *:*, you chose past  
month) and then compare.




Myabe i'm missunderstanding: but it sounds like Asif's question  
esentailly

boils down to getting facet constraints sorted after using some
normalizing fraction ... the simplest case being the inverse ratio  
(this
is where i think Asif is comparing it to IDF) of the number of  
matches for

that facet in some larger docset to the size of the docset-- typically
that docset could be the entire index, but it could also be the same
search over a large window of time.

So if i was doing a news search for all docs in the last 24 hours, I  
could

multiple each of those facet counts by the ratio of the corrisponding
counts from the past month to the number of articles from the past  
monght

see how much "hotter" they are in my smaller result set...

current result set facet counts (X)...
 News:1100
 Obama:1000
 Iran:800
 Miley Cyrus:700
 iPod:500

facet counts from the past month (Y), during which type 9000 (Z)
documents were published...
 News:9000
 Obama:7000
 Iran:1000
 Miley Cyrus:4000
 iPod:5000

X*(Z/Y)...
 Iran:7200
 Miley Cyrus:1575
 Obama:1285.7
 News:1100
 iPod:900


Doing this in a Solr plugin would be the best way to to this --  
because

otherwise your "hot" terms might not even show up in the facet lists.
any attempt to do it on the client would just be an approximation, and
could easily miss the "hottest" item if it was just below cutoff for  
hte

number of constraints to be returned.


-Hoss



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Function query using Map

2009-06-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
The five parameter feature is added in solr1.4 . which version of solr
are you using?

On Wed, Jun 24, 2009 at 12:57 AM, David Baker  wrote:
>
> Hi,
>
> I'm trying to use the map function with a function query.  I want to map a 
> particular value to 1 and all other values to 0.  We currently use the map 
> function that has 4 parameters with no problem.  However, for the map 
> function with 5 parameters, I get a parse error.  The following are the query 
> and error returned:
>
> _query_
> id:[* TO *] _val_:"map(ethnicity,3,3,1,0)"
>
> _error message_
>
> *type* Status report
> *message* _org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* 
> TO *] _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 20 in 
> 'map(ethnicity,3,3,1,0)'_
> *description* _The request sent by the client was syntactically incorrect 
> (org.apache.lucene.queryParser.ParseException: Cannot parse 'id:[* TO *] 
> _val_:"map(ethnicity,3,3,1,0)"': Expected ')' at position 20 in 
> 'map(ethnicity,3,3,1,0)').
> _
>
> It appears that the parser never evaluates the map string for anything other 
> than the 4 parameters version.  Could anyone give me some insight into this?  
> Thanks in advance.
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Initialize SOLR DataImportHandler

2009-06-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
yes , you will need to fire a command

On Tue, Jun 23, 2009 at 9:51 PM, ice  wrote:
>
>
> We use the DataImportHandler for indexes from a RDBMS. Is there any way to
> make sure that the import is run when the SOLR webapp/core starts up? Do we
> need to send a command to SOLR to make this happen?
> --
> View this message in context: 
> http://www.nabble.com/Initialize-SOLR-DataImportHandler-tp24167359p24167359.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: building custom RequestHandlers

2009-06-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
this part of the doc explains what you shold do to write a custom requesthandler

http://wiki.apache.org/solr/SolrPlugins#head-7c0d03515c496017f6c0116ebb096e34a872cb61

On Wed, Jun 24, 2009 at 3:35 AM, Julian Davchev wrote:
> Is it just me or this is thread steal? nothing todo with what thread is
> originally about.
> Cheers
>
> Bill Dueber wrote:
>> Is it possible to change the javascript  output? I find some of the
>> information choices (e.g., that facet information is returned in a flat
>> list, with facet names in the even-numbered indexes and number-of-items
>> following them in the odd-numbered indexes) kind of annoying.
>>
>> On Tue, Jun 23, 2009 at 12:16 PM, Eric Pugh >
>>> wrote:
>>>
>>
>>
>>> Like most things JavaScript, I found that I had to just dig through it and
>>> play with it.  However, the Reuters demo site was very easy to customize to
>>> interact with my own Solr instance, and I went from there.
>>>
>>>
>>> On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote:
>>>
>>>  Never used it.. I am just looking in docs how can I extend solr but no
>>>
 luck so far :(
 Hoping for some docs or real extend example.



 Eric Pugh wrote:


> Are you using the JavaScript interface to Solr?
> http://wiki.apache.org/solr/SolrJS
>
> It may provide much of what you are looking for!
>
> Eric
>
> On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:
>
>  I am using solr and php quite nicely.
>
>> Currently the work flow includes some manipulation on php side so I
>> correctly format the query string and pass to tomcat/solr.
>> I somehow want to build own request handler in java so I skip the whole
>> apache/php request that is just for formating.
>> This will saves me tons of requests to apache since I use solr directly
>> from javascript.
>>
>> Would like to ask if there is something ready that I can use and adjust.
>> I am kinda new in Java but once I get the pointers
>> I think should be able to pull out.
>> Thanks,
>> JD
>>
>>
>>
>>
> -
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>
>
>
>>> -
>>> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
>>> http://www.opensourceconnections.com
>>> Free/Busy: http://tinyurl.com/eric-cal
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Changing the score of a document based on the value of a field

2009-06-23 Thread Martin Davidsson
The SolrRelevancyFAQ has a heading that's the same as my message's  
subject:


http://wiki.apache.org/solr/SolrRelevancyFAQ#head-f013f5f2811e3ed28b200f326dd686afa491be5e

There's a TODO on the wiki to provide an actual example. Does anybody  
happen to have an example handy that I could model my query after?  
Thank you


-- Martin


Solrj no search results

2009-06-23 Thread pof

Hi, I'm using an EmbeddedSolrServer. Adding documents with the example jetty
server example using this method worked fine:
doc1.addField( "id", "id1");
doc1.addField( "name", "doc1");
doc1.addField( "price", 10);
server.add(doc1)

However now I have changed the schema.xml so I can use my own fields and the
documents will add to index (no compilation errors), however I will not get
any search results back what so ever. Any Ideas?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Solrj-no-search-results-tp24179484p24179484.html
Sent from the Solr - User mailing list archive at Nabble.com.