date:20090820


: > The request I would like to write is
: >
: > "Delete from my solr Index the id that are no longer present in my
: > table_document"
: >
: > With Lucene I had a way to do that :
: > open IndexReader,
: > for each lucene document : check in table_document and remove in lucene
: > index if document is no longer present in the table

you can still do that with Solr, you can even do it as a Solr plugin (I 
would suggest a RequestHandler that you hit after each DIH call) so 
you can reuse your existing code that deals directly with an IndexReader 
-- but you'd probably want to tweak it a bit to use the UpdateHandler from 
the SolrCore instead of deleting the doc direclty, that way it's logged 
properly and you can trigger a commit to make Solr aware you've modified 
the index.



-Hoss

Re: how do i - include the items without a facet

2009-08-20 Thread Noble Paul നോബിള്‍ नोब्ळ्


: "location_name" is a text field , copyto puts it in "facet.location_name"
: 
: i'm thinking this could be because the field was not entered as NULL but an
: empty string ?

assuming "facet.location_name" is a StrField then that would certainly be 
your problem -- because the empty string is a legitimate string value.

i would suggest properly indexing without the empty string value, but i 
suppose fq=facet.location_name:"" would probably work as well (untested)



-Hoss

Re: Snapshot backups with new replication

cleaning of snapshots is not a feature in the current version.
probably a feature can be added

On Fri, Aug 21, 2009 at 3:11 AM, KaktuChakarabati wrote:
>
> Hey,
> I was wondering if there is any equivalent in new in-process replication to
> what
> could previously be achieved by running snapcleaner -N which would
> essentially
> allow me to keep backups of N latest indices pulls on a search node.
> This is of course very important for failover operation in production
> environments,
> And I assume this is a functionality important to many potential users
>
> Thanks,
> -Chak
> --
> View this message in context: 
> http://www.nabble.com/Snapshot-backups-with-new-replication-tp25070608p25070608.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr Quoted search confusions

2009-08-20 Thread Vannia Rajan

Hi,

On Thu, Aug 20, 2009 at 9:13 PM, Chris Male  wrote:

> Hi,
>
> What analyzers/filters have you configured for the field that you are
> searching? One could be causing the various versions of "ilike" to be
> indexed the same way.
>

  I'm using "text" field with the following analyzers / filters for the
field "description" (which has various forms of word "ilike":






















Is there anything that i could tune here to get the intended results?


>
> Thanks
> Chris
>
> On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan  >wrote:
>
> > Hi,*
> >
> >   *I need some help to clarify how solr indexes documents. I have 6
> > documents with various forms of the word "ilike" (complete word and not
> "i
> > like") - one having "ilike" as such and others having a special character
> > in
> > between "i" and "like".
> >
> >   What i expected from solr is that, when i do a Quoted search "ilike",
> it
> > should return only the document that had "ilike" exactly. But, what i get
> > from solr is that various forms of the word "ilike" are also included in
> > the
> > results. Is there an option/configuration that i can do to solr so that i
> > will get only the result with exact word "ilike"?
> > *
> >
> >  The result i obtained from solr is shown below,
> >
> > http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> > 
> > -
> > 
> > 0
> > 20
> > -
> > 
> > description,score
> > "ilike"
> > 
> > 
> > -
> > 
> > -
> > 
> > 0.5
> > Ilike company is doing great!
> > 
> > -
> > 
> > 0.375
> > I:like company is doing great!
> > 
> > -
> > 
> > 0.3125
> > I-like it very much. Really, this can come
> > up!.
> > 
> > -
> > 
> > 0.3125
> > I;like it very much. Really, i say.
> > 
> > -
> > 
> > 0.25
> > -
> > 
> > i.like it very much. full stop can come? i don't know.
> > 
> > 
> > 
> >  >
> > --
> > Thanks,
> > Vanniarajan
> >
>



-- 
Thanks,
Vanniarajan

Re: Is wildcard search not correctly analyzed at query? [solved]

2009-08-20 Thread Alexander Herzog

Hi

Thanks for the info!

best,
Alexander

Avlesh Singh schrieb:
> Wildcard queries are not analyzed by Lucene and hence the behavior. A
> similar thread earlier -
> http://www.lucidimagination.com/search/document/a6b9144ecab9d0ff/search_phrase_wildcard
> 
> Cheers
> Avlesh
> 
> On Thu, Aug 20, 2009 at 7:03 PM, Alexander Herzog  wrote:
> 
>> It seems like the analyzer/filter isn't affected at all, since the query
>>
>> http://localhost:8983/solr/select/?q=PhysicalDescription:nü*&debugQuery=true
>>
>> does not return a
>> PhysicalDescription:nu*
>> as I would expect.
>>
>> So can I just have a "you're right, wildcard search is passed to lucene
>> directly without any analyzing".
>>
>> If it is like this, I'm happy with that as well.
>>
>> best,
>> Alexander
>>
>>
>> Alexander Herzog schrieb:
>>> Hi all
>>>
>>> sorry for the long post
>>>
>>> We are switching from indexdata's zebra to solr for a new book
>>> archival/preservation project with multiple languages, so expect more
>>> questions soon (sorry for that)
>>> The features of solr are pretty cool and more or less overwhelming!
>>>
>>> But there is one thing I found after a little test with wildcards.
>>>
>>> I'm using the latest svn build and didn't change anything except the
>>> schema.xml
>>> Solr Specification Version: 1.3.0.2009.08.20.07.53.52
>>> Solr Implementation Version: 1.4-dev 806060 - ait015 - 2009-08-20
>> 07:53:52
>>> Lucene Specification Version: 2.9-dev
>>> Lucene Implementation Version: 2.9-dev 804692 - 2009-08-16 09:33:41
>>>
>>> I have a text_ws field with this schema config:
>>>
>>> > positionIncrementGap="100">
>>>
>>>   >> mapping="mapping-ISOLatin1Accent.txt"/>
>>>   
>>>   
>>>
>>> 
>>> ...
>>> and I added a dynamic field for everything since I'm not sure what field
>>> we will use...
>>>
>>> >> multiValued="true"/>
>>> ...
>>>
>>>
>>> So I ed this content:
>>> ...
>>> 
>>>X, 143, XIV S.:
>>>124 feine Farbendrucktafeln mit über 600 Abbildungen;
>>>24,5 cm.
>>> 
>>> ...
>>>
>>> since it's German, and I couldn't find a tokenizer for German compound
>>> words (any help appreciated) I wanted to search for 'Farb*'
>>>
>>> The final row of the query analyzer in the admin section told me:
>>> farb*
>>> for the content:
>>> x,143,xiv s.: 124 feine   farbendrucktafeln   mit
>> uber600 abbildungen;
>>> 24,5  cm.
>>>
>>> so everything seems to be ok, everything in lower case
>>>
>>> Now, for the rest service:
>>>
>> http://localhost:8983/solr/select/?q=PhysicalDescription:Farb*&debugQuery=true
>>> PhysicalDescription:Farb*
>>> PhysicalDescription:Farb*
>>> PhysicalDescription:Farb*
>>> PhysicalDescription:Farb*
>>>
>>> Since Farb* has a capital letter, nothing is found.
>>> When using farb* as query, I get the result.
>>>
>>> Where can I add/change a query anaylizer that "lower cases" wildcard
>>> searches?
>>>
>>> thanks, best wishes,
>>> Alexander
>>>
>

Implementing a logout

2009-08-20 Thread Rahul R

Hello,
Can somebody give me some pointers on the Solr objects I need to clean
up/release while doing a logout on a Solr Application. I find that only the
SolrCore object has a close() method. I typically do a lot of faceting
queries on a large dataset with my application. I am using Solr 1.3.0.

Regards
Rahul

Re: Remove data from index

2009-08-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Thu, Aug 20, 2009 at 8:39 PM, clico wrote:
>
> I hope it could be a solution.
>
> But I think I understood that u can use deletePkQuery like this
>
> "select document_id from table_document where statusDeleted= 'Y'"
>
> In my case I have no status like "statusDeleted".

I don't think there is a straight solution w/o doing a full-import
>
> The request I would like to write is
>
> "Delete from my solr Index the id that are no longer present in my
> table_document"
>
> With Lucene I had a way to do that :
> open IndexReader,
> for each lucene document : check in table_document and remove in lucene
> index if document is no longer present in the table
>
>
>
>
> --
> View this message in context: 
> http://www.nabble.com/Remove-data-from-index-tp25063736p25063965.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Results from Solr

2009-08-20 Thread Avlesh Singh

Or maybe tweak the "splitOnCaseChange" property in the
WordDelimiterFilterFactory for the text field.

Cheers
Avlesh

On Fri, Aug 21, 2009 at 8:46 AM, Stephen Weiss wrote:

> If I'm not mistaken, you should index name as field type "string" - right
> now you are probably using "text" so it is tokenizing on the Uppercase
> characters.  If you use "string" type this shouldn't happen.  You could use
> a copyField to make a separate "name_string" field so that you can do both
> #1 and #2, depending on the circumstance.
>
> --
> Steve
>
>
> On Aug 20, 2009, at 10:26 PM, bhaskar chandrasekar wrote:
>
>  Hi,
>>
>> Can some one help me with the below situation?
>>
>> To elaborate more on this.
>> Assuming i give "BHASKAR" as input string.
>>
>> Scenario 1: It should give me search results pertaining to BHASKAR only.
>> Select * from MASTER where name ="Bhaskar";
>> Example:It should not display search results as "ChandarBhaskar" or
>> "BhaskarC".
>> Should display Bhaskar only.
>>
>> Scenario 2:
>>
>> Select * from MASTER where name like "%BHASKAR%";
>> It should display records containing the word BHASKAR
>>
>> Ex:
>> Bhaskar
>> ChandarBhaskar
>> BhaskarC
>> Bhaskarabc
>>
>> How to achieve Scenario 1 in Solr ?.
>>
>> Thanks
>> Bhaskar
>>
>>
>>
>>
>

Re: Results from Solr

2009-08-20 Thread Stephen Weiss

If I'm not mistaken, you should index name as field type "string" -  
right now you are probably using "text" so it is tokenizing on the  
Uppercase characters.  If you use "string" type this shouldn't  
happen.  You could use a copyField to make a separate "name_string"  
field so that you can do both #1 and #2, depending on the circumstance.


--
Steve

On Aug 20, 2009, at 10:26 PM, bhaskar chandrasekar wrote:


Hi,

Can some one help me with the below situation?

To elaborate more on this.
Assuming i give "BHASKAR" as input string.

Scenario 1: It should give me search results pertaining to BHASKAR  
only.

Select * from MASTER where name ="Bhaskar";
Example:It should not display search results as "ChandarBhaskar" or  
"BhaskarC".

Should display Bhaskar only.

Scenario 2:

Select * from MASTER where name like "%BHASKAR%";
It should display records containing the word BHASKAR

Ex:
Bhaskar
ChandarBhaskar
BhaskarC
Bhaskarabc

How to achieve Scenario 1 in Solr ?.

Thanks
Bhaskar

Re: Solr Range Query Anomalities


: Subject: Solr Range Query Anomalities
: In-Reply-To: <42aac72d6e244561cb364739bf3c7517.squir...@webmail01.one.com>
: References: <42aac72d6e244561cb364739bf3c7517.squir...@webmail01.one.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking




-Hoss

Re: Passing a Cookie in SolrJ


: > but I can't see an easy way to be able to pass a cookie with the request.
: > The cookie is needed to be able to get through the SSO layer but will just

Unless i'm remembering wrong, and HttpClient instance will manage cookies 
for you, so why not just document how your users can use an HttpClient 
instance to talk to a server that will set this cookie, and then reuse 
that HttpClient instance in your CommonsHttpSolrServer instance.

: > be ignored by Solr. I see that you are using Apache Commons Http Client and
: > with that I would be able to write the cookie if I had access to the
: > HttpMethod being used (GetMethod or PostMethod). However, I can not find an
: > easy way to get access to this with SolrJ and thought I would ask before

: There's no easy way I think. You can extend CommonsHttpSolrServer and
: override the request method. Copy/paste the code from
: CommonsHttpSolrServer#request and make the changes. It is not an elegant way
: but it will work.

If he really needs to hardcode the cookie value into code, wouldn't it be 
easier to extend HttpClient and modify the methods which take in an 
HttpMethod object to first set the cookie on those obejcts before 
delegating to super?



-Hoss

response issues with ruby and json

2009-08-20 Thread Matt Mitchell

Hi,

I was using the spellcheck component a while ago and noticed that parts of
the response are hashes, that use duplicate keys. This is the issue here:
http://issues.apache.org/jira/browse/SOLR-1071

Also, the facet/facet_fields response is a hash, where the keys are field
names. This is mostly fine BUT, when eval'd in Ruby, the resulting key order
is not consistent; I think this is pretty normal for most languages. It
seems to me that an array of hashes would be more useful to preserve the
ordering? For example, we have an application that uses a custom handler
that specifies the facet fields. It'd be nice if the response ordering could
also be controlled in the solrconfig.xml

I guess I have 2 questions:

1. Does anyone if the spellcheck component going to get updated so there are
not duplicate keys

2. How could we get the facet fields into arrays instead of hashes for the
ruby response writer? Should I submit a patch? Is this important to anyone
else? I guess the alternative is to use the xml response.

Thanks,
Matt

Re: How to boost some documents at query-time ?


: - The CustomScoreQuery of Lucene :
: 
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/function/package-summary.html

this is a variation on the FunctionQuery class in Solr, which uses the 
ValueSource API Shalin refered to -- at present the only way to implement 
a ValueSource is either using an index field or an ExternalFile ... but if 
you someone already have the popularity values in memory, you can 
certainly implement your own ValueSource as Shalin suggested...

: > Something like this should be possible by creating your own ValueSource.
: > Look at the ExternalFileField and FileFloatSource in Solr as an example.
: > Instead of loading from a file, you can load it from some other source and
: > refresh it periodically.
: > 
: > You won't be able to sort on it using the sort parameter until [1] is
: > complete. However, you can achieve a similar effect by boosting documents by
: > a function of popularity using function queries in the "q" parameter.
: > 
: > However, if you can re-index your documents periodically, just add the
: > popularity value into the document itself and avoid a complicated system. We
: > do this for a multi-million document index (storing daily popularity as a
: > field). We also have another smaller index which we re-index every 15
: > minutes with the latest popularity values.
: > 
: > [1] - https://issues.apache.org/jira/browse/SOLR-1297
: > 
: >   
: 
: 
: -- 
: Fabrice Esti�venart, Ing�nieur R&D, CETIC
: T�l : +32 (0)71/49.07.28
: Web : http://www.cetic.be
: 



-Hoss

Re: Using Lucene's payload in Solr


: of the field are correct but the delimiter and payload are stored so they
: appear in the response also.  Here is an example:
...
: I am thinking maybe I can do this instead when indexing:
: 
: XML for indexing:
: Solr In Action
: 
: This will simplify indexing as I don't have to repeat the payload for each

but now you're into a custom request handler for the updates to deal with 
the custom XML attribute so you can't use DIH, or CSV loading.

It seems like it might be simpler have two new (generic) UpdateProcessors: 
one that can clone fieldA into fieldB, and one that can do regex mutations 
on fieldB ... neither needs to know about payloads at all, but the first 
can made a copy of "2.0|Solr In Action" and the second can strip off the 
"2.0|" from the copy.

then you can write a new NumericPayloadRegexTokenizer that takes in two 
regex expressions -- one that knows how to extract the payload from a 
piece of input, and one that specifies the tokenization.

those three classes seem easier to implemnt, easier to maintain, and more 
generally reusable then a custom xml request handler for your updates.


-Hoss

Re: Overview of Query Parsing API Stack? / Dismax parsing, new 1.4 parsing, etc.

2009-08-20 Thread Yonik Seeley

On Thu, Aug 20, 2009 at 10:16 PM, Chris
Hostetter wrote:
> coming in Lucene 2.9 (which is what Solr 1.4 will use) is a completley new
> QueryParser framework, which (i'm told) is suppose to make it much easier
> to create custom query parser syntaxs,

I've quickly looked, but haven't seen this to be the case.
The new query parser framework uses the same JavaCC grammar and
creates intermediate objects that eventually create Lucene Query
objects.

>From an IBM perspective (where this parser came from), it makes it
easier to add a new syntax because they have multiple back-ends
(Lucene being one, probably OmniFind or other proprietary search
engines being others).  But from the Lucene perspective, there is only
Lucene as a back-end.

So if you want to try and extend the syntax of the lucene query
parser, it still seems to come down to hacking on the JavaCC grammar
(the hard part).

-Yonik
http://www.lucidimagination.com

Results from Solr

2009-08-20 Thread bhaskar chandrasekar

Hi,
 
Can some one help me with the below situation?

To elaborate more on this.
Assuming i give "BHASKAR" as input string.

Scenario 1: It should give me search results pertaining to BHASKAR only.
Select * from MASTER where name ="Bhaskar";
Example:It should not display search results as "ChandarBhaskar" or "BhaskarC". 
Should display Bhaskar only.

Scenario 2:
 
Select * from MASTER where name like "%BHASKAR%";
It should display records containing the word BHASKAR

Ex:
Bhaskar
ChandarBhaskar
BhaskarC
Bhaskarabc

How to achieve Scenario 1 in Solr ?.
 
Thanks
Bhaskar

Re: Common Solr Question

2009-08-20 Thread Uri Boness

Hi,

1. that change you made should work. Just remember that request
parameters (query string parameters) override the configured defaults.

2. That is correct
3. not quite sure what you mean by that.
4. I guess you're asking in your statement is correct... it is.

I think you should have a look at
http://wiki.apache.org/solr/SolrRequestHandler and
http://wiki.apache.org/solr/UpdateCSV

Cheers,
Uri

darniz wrote:
Hi
i have some basic yet important question about solr and that too with

terminology.
I want to be crystal clear about it.
Please answer the following questions.

http://dl1rap711-epe.media.edmunds.com:8983/solr/select/?q=make%3AHonda&version=2.2&start=0&rows=10&indent=on

the question is where is it defined in solrconfig.xml. if i have to change
the default size for my result set from 10 to for example say 50 where
should i change it.
i tried to do this

explicit
55
-

But did not work.

Question 2
2)
When we issue an update command something like this
http://localhost:8983/solr/update?stream.body=2007HyundaiSonata

The following request handler will be used which is mentioned in the
solrconfig.xml file

Is this correct.

Question 3
3) To upload CSV data we need to use /update/csv handler.
I would appreciate how to specify this in the url if i have to upload a csv
file.

Question 4
3)If this is the case, every url request is mapped to a request handler.
For load csv file use /update/csv which is implemented by
solr.CSVRequestHandler
For analysis file use /analysis which is implement by
solr.AnalysisRequestHandler

For now this is it.
More to follow

Thanks

Re: Question mark glyphs in indexed content


: Hello, I am using the latest Solr4j to index content. When I look at
: that content in the Solr Admin web utility I see weird characters like
: this:
: 
: http://brockwine.com/images/solrglyphs.png
: 
: When I look at the text in the MySQL DB those chars appear to just be
: plain hyphens. The MySQL table character set is utf8 and the collation
: is utf8.

What do you mean by "Solr4j" ?

more then likely, there is a character encoding problem somewhere between 
your database and Solr ... solr expects utf8 when you index content, but 
justbecause it's utf8 in your database doesn't mean the code reading from 
your database and sending it to Solr is using utf8 along the way ... 
knowing exactly waht that code looks like is neccessary to understand what 
might be happening here.


-Hoss

Re: Overview of Query Parsing API Stack? / Dismax parsing, new 1.4 parsing, etc.


: Subject: Overview of Query Parsing API Stack? / Dismax parsing,
: new 1.4  parsing, etc.

Oh, what i would give for time to sit and document in depth how some of 
this stuff works (assuming i first had time to verify that it really does 
work the way i think)

The nutshell answer is that as far as solr (1.4) is concerned, the main 
unit of "query parsing" is a QParser ... lots of places in the code base 
may care about parsing differnet strngs for the purposes of producting a 
Query object, but ultimately they all use a QParser.

QParsers are plugins that you can configure instances of in your 
solrcinfog.xml and assign names to.  by default, all of various pieces of 
code in solr that do any sort of query related parsing use some basic 
convention to pick a QParser by name -- so StandardRequestHandler uses the 
QParser named "lucene" for parsing the "q" param, while 
DisMaxRequestHandler uses a QParser named "dismax" for "q", and "func" for 
the "bf" param.  so if you wanted to make some change so that *any* code 
path anywhere attempting to use the lucene syntax got your custom query 
parsing logic, you could configure a QParser with the name "lucene" and 
override the default.

The brilliantly confusing magic comes into play when strings to be parsed 
start with the "local params" syntax (ie: "{!foo a=f,b=z}blah blah" ... 
that tells the parsing code to override whatever QParser it would have 
used for that string, and to pass everything after the "}" charcter to the 
parser named "foo", with a=f and b=z added to the list of SolrParams it's 
already got (from the query string, or default params in solrconfig, 
etc...)

For most types of queries, the QParser ultimately uses Lucenes 
"QueryParser" class, or some subclass of it (DisMaxQueryParser used by the 
DisMaxQPlugin is a subclass of QueryParser") and 9 times out of 10 if 
people want to customize query parsing without inventing a 100% new 
syntax, they also write a subclass.

coming in Lucene 2.9 (which is what Solr 1.4 will use) is a completley new 
QueryParser framework, which (i'm told) is suppose to make it much easier 
to create custom query parser syntaxs, but i haven't had time to look at 
it to see what all hte fuss is about.  so in theory you could use it to 
implement a new QPlugin in SOlr 1.4.

no matter how you ultimately implement code that goes from "String" to 
"Query" you have to be concerned about the type of data in the field that 
Query objects refers to (if it was lowercased at index time, you want to 
lowercase at query time, etc...).  Solr does it's best to help query 
parsers out by supporting an  in the schema.xml so 
that the schema creator that specify how to "analyze" a piece of 
input when building queries, but depending on the query syntax it's not 
always easy to get the behavior you expect from a particular query parser 
/ analyzer pair (This part of query parsing typically trips people up when 
dealing with multiword synonyms, or analyzers that don't tokenize on 
whitespace, because the normal Lucene QueryParser uses whitespace as part 
of it's markup, and breaks up the input on the whitespace boundaries 
before it ever passes those chunks of input to the analyzers)

: But trying traipse through the code to get "the big picture" is a bit
: involved.

like i said: the world of query parsing in solr all revolves arround the 
QParser API ... if you want to make sense of it, start there, and work out 
in both directions.

PS: please, please, please ... as you make progress on understanding these 
internals, feel free to plagerize this email as the starting point of a 
new wiki page documenting your understanding for others who come along 
with teh same question.


-Hoss

Re: Facet filtering

2009-08-20 Thread Avlesh Singh

You can use a dynamic field called "tag_*". If a patch for SOLR
247gets committed, you
can perform a facet query like facet.field=tag_*.

Cheers
Avlesh

On Fri, Aug 21, 2009 at 3:21 AM, Asif Rahman  wrote:

> Is there any way to assign metadata to terms in a field and then filter on
> that metadata when using that field as a facet?
>
> For example, I have a collection of news articles in my index.  Each
> article
> has a field that contains tags based on the topics discussed in the
> article.  An article might have the tags "Barack Obama" and "Chicago".  I
> want to assign metadata describing what type of entity each tag is.  For
> these tags the metadata would be "person" for "Barack Obama" and "place"
> for
> "Chicago".  Then I want to issue a facet query that returns only "person"
> facets.
>
> I can think of two possible solutions for this, both with shortcomings.
>
> 1) I could create a field for each metadata category.  So the schema would
> have the fields "tag_person" and "tag_place".  The problem with this method
> is that I am limited to filtering by a single criterion for each of my
> queries.
>
> 2) I could leave the Solr schema unmodified and post-process the query.
> This solution is less elegant than one that could be completely contained
> within Solr.  I also imagine that it would be less performant.
>
> Any thoughts?
>
> Thanks in advance,
>
> Asif
>
> --
> Asif Rahman
> Lead Engineer - NewsCred
> a...@newscred.com
> http://platform.newscred.com
>

Re: Facet filtering

2009-08-20 Thread Uri Boness

Another solution is to use hierachical values. So for example, instead 
of having a "Barack Obama" value you'll have "person/Barak Obama". To 
filter on a person you can just use wildcards (e.g. "person/*").


Asif Rahman wrote:

Is there any way to assign metadata to terms in a field and then filter on
that metadata when using that field as a facet?

For example, I have a collection of news articles in my index.  Each article
has a field that contains tags based on the topics discussed in the
article.  An article might have the tags "Barack Obama" and "Chicago".  I
want to assign metadata describing what type of entity each tag is.  For
these tags the metadata would be "person" for "Barack Obama" and "place" for
"Chicago".  Then I want to issue a facet query that returns only "person"
facets.

I can think of two possible solutions for this, both with shortcomings.

1) I could create a field for each metadata category.  So the schema would
have the fields "tag_person" and "tag_place".  The problem with this method
is that I am limited to filtering by a single criterion for each of my
queries.

2) I could leave the Solr schema unmodified and post-process the query.
This solution is less elegant than one that could be completely contained
within Solr.  I also imagine that it would be less performant.

Any thoughts?

Thanks in advance,

Asif

Re: Retrieving the boost factor using Solrj CommonsHttpSolrServer


: Subject: Retrieving the boost factor using Solrj CommonsHttpSolrServer
: References:
: <957081.80086
: 
@web50309.mail.re2.yahoo.com> 

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss

Facet filtering

2009-08-20 Thread Asif Rahman

Is there any way to assign metadata to terms in a field and then filter on
that metadata when using that field as a facet?

For example, I have a collection of news articles in my index.  Each article
has a field that contains tags based on the topics discussed in the
article.  An article might have the tags "Barack Obama" and "Chicago".  I
want to assign metadata describing what type of entity each tag is.  For
these tags the metadata would be "person" for "Barack Obama" and "place" for
"Chicago".  Then I want to issue a facet query that returns only "person"
facets.

I can think of two possible solutions for this, both with shortcomings.

1) I could create a field for each metadata category.  So the schema would
have the fields "tag_person" and "tag_place".  The problem with this method
is that I am limited to filtering by a single criterion for each of my
queries.

2) I could leave the Solr schema unmodified and post-process the query.
This solution is less elegant than one that could be completely contained
within Solr.  I also imagine that it would be less performant.

Any thoughts?

Thanks in advance,

Asif

-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com

Snapshot backups with new replication

2009-08-20 Thread KaktuChakarabati


Hey,
I was wondering if there is any equivalent in new in-process replication to
what
could previously be achieved by running snapcleaner -N which would
essentially
allow me to keep backups of N latest indices pulls on a search node.
This is of course very important for failover operation in production
environments,
And I assume this is a functionality important to many potential users

Thanks,
-Chak
-- 
View this message in context: 
http://www.nabble.com/Snapshot-backups-with-new-replication-tp25070608p25070608.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Embedded Solr Clustering


: Yes we are using Solr for a non-traditional search purpose and the
: performance is critical. However it sounds like that sharing the same index
: could slow down reading / writing to the index. And access synchronization
: is tricky as well.

no matter how you use Solr (HTTP or Embedded) only one SOlrCore can be 
writting to an index at a time.

: Therefore, we might have to use a single web based Solr instance or use
: multiple embedded Solr instances and setup the script based replication.

not clear if this is what you mena: but you can always have a single solr 
"master" (which might be embedded or it might be the war) and then use 
script replication or shared disk to make that index available for 
numerous additional solr instance (again: either embedded or war based) 
for queries.



-Hoss

Re: WordDelimiterFilter misunderstanding

2009-08-20 Thread Yonik Seeley

This is unfortunately outside the scope of what filters can currently
do at query time.
This is why the example schema has WordDelimiterFilter only producing
subwords at query time (not catenating them).

-Yonik
http://www.lucidimagination.com



On Thu, Aug 20, 2009 at 5:29 PM, jOhn wrote:
> I've misunderstood WordDelimiterFilter.  You might think that
> catenateAll="1" would append the full phrase (sans delimiters) as an OR
> against the query.
>
> So "jOkersWild" would produce:
>
> "j (okers wild)" OR "jokerswild"
>
> But you thought wrong.  Its actually:
>
> "j (okers wild jokerswild)"
>
> Which is confusing and won't match... Anyone know of a filter mod to do the
> former ?
>
> -nc
>

WordDelimiterFilter misunderstanding

2009-08-20 Thread jOhn

I've misunderstood WordDelimiterFilter.  You might think that
catenateAll="1" would append the full phrase (sans delimiters) as an OR
against the query.

So "jOkersWild" would produce:

"j (okers wild)" OR "jokerswild"

But you thought wrong.  Its actually:

"j (okers wild jokerswild)"

Which is confusing and won't match... Anyone know of a filter mod to do the
former ?

-nc

Multi-shard query with error on one shard

2009-08-20 Thread Phillip Farber



What will the client receive from the primary solr instance if that 
instance doesn't get HTTP 200 from all the shards in a multi-shard query?


Thanks,

Phil

RE: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise Search Server

2009-08-20 Thread Chenini, Mohamed

Hi,

Is there any promotional code I may use to get a discount?

Thanks,
Mohamed

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org] 
Sent: Wednesday, August 19, 2009 12:38 AM
To: solr-user@lucene.apache.org
Subject: RE: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise
Search Server

Hi Faud.

It's true I didn't publicize its release beforehand; I have no idea if
it is normal to do so or not.  I guess I'm a bit shy.

I honestly have no clue what you're referring to as the successor to the
"faceting" term.

~ David Smiley

From: Fuad Efendi [f...@efendi.ca]
Sent: Tuesday, August 18, 2009 10:39 PM
To: solr-user@lucene.apache.org
Subject: RE: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise
Search Server

Some very smart guys at Hadoop even posted some discount codes at WIKI,
and
it's even possible to buy in-advance not published yet chapters :) -
everything changes extremely quick...


Why did you keeep it in secret? Waiting for SOLR-4.1 :))) - do you still
use
outdated pre-1.4 "faceting" term in your book?

Congratulations!



-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: August-18-09 10:10 AM
To: solr
Subject: [ANNOUNCEMENT] Newly released book: Solr 1.4 Enterprise Search
Server

Fellow Solr users,

I've finally finished the book "Solr 1.4 Enterprise Search Server" with
my
co-author Eric.  We are proud to present the first book on Solr and hope
you
find it a valuable resource.   You can find full details about the book
and
purchase it here:
http://www.packtpub.com/solr-1-4-enterprise-search-server/book
It can be pre-ordered at a discount now and should be shipping within a
week
or two.  The book is also available through Amazon.  You can feel good
about
the purchase knowing that 5% of each sale goes to support the Apache
Software Foundation.  For a free sample, there is a portion of chapter 5
covering faceting available as an article online here:
http://www.packtpub.com/article/faceting-in-solr-1.4-enterprise-search-s
erve
r

By the way, we realize Solr 1.4 isn't out [quite] yet.  It is
feature-frozen
however, and there's little in the forthcoming release that isn't
covered in
our book.  About the only notable thing that comes to mind is the
contrib
module on search result clustering.  However Eric plans to write a free
online article available from Packt Publishing on that very subject.

"Solr 1.4 Enterprise Search Server" In Detail:

If you are a developer building a high-traffic web site, you need to
have a
terrific search engine. Sites like Netflix.com and Zappos.com employ
Solr,
an open source enterprise search server, which uses and extends the
Lucene
search library. This is the first book in the market on Solr and it will
show you how to optimize your web site for high volume web traffic with
full-text search capabilities along with loads of customization options.
So,
let your users gain a terrific search experience

This book is a comprehensive reference guide for every feature Solr has
to
offer. It serves the reader right from initiation to development to
deployment. It also comes with complete running examples to demonstrate
its
use and show how to integrate it with other languages and frameworks

This book first gives you a quick overview of Solr, and then gradually
takes
you from basic to advanced features that enhance your search. It starts
off
by discussing Solr and helping you understand how it fits into your
architecture-where all databases and document/web crawlers fall short,
and
Solr shines. The main part of the book is a thorough exploration of
nearly
every feature that Solr offers. To keep this interesting and realistic,
we
use a large open source set of metadata about artists, releases, and
tracks
courtesy of the MusicBrainz.org project. Using this data as a testing
ground
for Solr, you will learn how to import this data in various ways from
CSV to
XML to database access. You will then learn how to search this data in a
myriad of ways, including Solr's rich query syntax, "boosting" match
scores
based on record data and other means, about searching across multiple
fields
with different boosts, getting facets on the results, auto-complete user
queries, spell-correcting searches, highlighting queried text in search
results, and so on.

After this thorough tour, we'll demonstrate working examples of
integrating
a variety of technologies with Solr such as Java, JavaScript, Drupal,
Ruby,
XSLT, PHP, and Python.

Finally, we'll cover various deployment considerations to include
indexing
strategies and performance-oriented configuration that will enable you
to
scale Solr to meet the needs of a high-volume site


Sincerely,

David Smiley (primary-author)
dsmi...@mitre.org
Eric Pugh (co-author)
ep...@opensourceconnections.com

This email/fax message is for the sole use of the intended
recipient(s) and may contain confidential and privile

Common Solr Question

2009-08-20 Thread darniz

Hi
i have some basic yet important question about solr and that too with
terminology.
I want to be crystal clear about it.
Please answer the following questions.

Question 1
1) "Incoming queries are processed by the appropriate SolrRequestHandler.
For the purposes of this discussion, you will use the
StandardRequestHandler"
So i assume all request which we make like
--For select
http://dl1rap711-epe.media.edmunds.com:8983/solr/select/?q=make%3AHonda&version=2.2&start=0&rows=10&indent=on

the question is where is it defined in solrconfig.xml. if i have to change
the default size for my result set from 10 to for example say 50 where
should i change it.
i tried to do this

explicit
55
-

But did not work.

Question 2
2)
When we issue an update command something like this
http://localhost:8983/solr/update?stream.body=2007HyundaiSonata

The following request handler will be used which is mentioned in the
solrconfig.xml file

Is this correct.

Question 3
3) To upload CSV data we need to use /update/csv handler.
I would appreciate how to specify this in the url if i have to upload a csv
file.

For now this is it.
More to follow

Thanks

--
View this message in context:
http://www.nabble.com/Common-Solr-Question-tp25068160p25068160.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing customized Scorer with solr API 1.4

Jérôme Etévé wrote:
> Hi all,
>
>  I'm kind of struggling with a customized lucene.Scorer of mine, since
> I use solr 1.4.
>
>  Here's the problem:
>
>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
> is a decorator for a lucene.Query that filters out the documents which
> are not in a given set of  predefined documents (a solr.DocSet which I
> call docset ).
>
> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>
> public int nextDoc() throws IOException {
> do {
>  if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>   return NO_MORE_DOCS;
>  }
> // DO THIS UNTIL the doc is in the docset
>  } while (!docset.exists(decoScorer.docID()));
>  return decoScorer.docID();
> }
>
> The decoScorer here is the decorated scorer.
>
> My problem here is that in docset, there are 'absolute' documents IDs,
> but now solr uses a number of sub readers each with a kind of offset,
> so decoScorer.docID() gives 'relative' document ID . Because of this,
> I happen to test relative document IDs against a set of absolute
> docIDs.
>
> So my DocSetQuery does not work anymore. The solution would be I think
> to have a way of getting the offset of the SolrReader being used in
> the context to be able to do docset.exists(decoScorer.docID() +
> offset) .
>
> But how can I get this offset?
> The scorer is built with a lucene.IndexReader in parameter:
> public Scorer scorer(IndexReader reader) .
>
> Within solr, this IndexReader happens to be an instance of
> SolrIndexReader so I though maybe I could downcast reader to a
> SolrIndexReader to be able to call the offset related methods on it
> (getBase() etc...).
>   
It may not feel super clean, but it should be fine - Solr always uses a
SolrIndexSearcher which always wraps all of the IndexReaders in
SolrIndexReader. I'm fairly sure anyway ;)

By getting the base of the subreader wihtin the top reader, you can add
it to the doc id to get the top reader doc id.
> I feel quite unconfortable with this solution since my DocSetQuery
> inherits from a lucene thing, so it would be quite odd to downcast
> something to a solr class inside it, plus I didn't really figured out
> how to use those offset related methods.
>
> Thanks for your help!
>
> All the best!
>
> Jerome Eteve.
>
>   


-- 
- Mark

http://www.lucidimagination.com

Re: Implementing customized Scorer with solr API 1.4

2009-08-20 Thread Jason Rutherglen

We should probably move to using Lucene's Filters/DocIdSets
instead of DocSets and merge the two. Then we will not need to
maintain two separate but similar and confusing functionality
classes. This will make seamlessly integrating searching with
Solr's Filters/DocSets into Lucene's new per segment reader
searching easier, especially for new filter writers such as
yourself. Right now we have what appears to be duplicated code.

We probably need several different issues to accomplish what
this requires. One start is SOLR-1308, though I suspect given
the restructuring required, we'll need to break things up into
several separate issues. I'm not really sure what SOLR-1179 was
for.

On Thu, Aug 20, 2009 at 11:17 AM, Jérôme Etévé wrote:
> Hi all,
>
>  I'm kind of struggling with a customized lucene.Scorer of mine, since
> I use solr 1.4.
>
>  Here's the problem:
>
>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
> is a decorator for a lucene.Query that filters out the documents which
> are not in a given set of  predefined documents (a solr.DocSet which I
> call docset ).
>
> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>
> public int nextDoc() throws IOException {
> do {
>         if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>              return NO_MORE_DOCS;
>         }
>        // DO THIS UNTIL the doc is in the docset
>  } while (!docset.exists(decoScorer.docID()));
>  return decoScorer.docID();
> }
>
> The decoScorer here is the decorated scorer.
>
> My problem here is that in docset, there are 'absolute' documents IDs,
> but now solr uses a number of sub readers each with a kind of offset,
> so decoScorer.docID() gives 'relative' document ID . Because of this,
> I happen to test relative document IDs against a set of absolute
> docIDs.
>
> So my DocSetQuery does not work anymore. The solution would be I think
> to have a way of getting the offset of the SolrReader being used in
> the context to be able to do docset.exists(decoScorer.docID() +
> offset) .
>
> But how can I get this offset?
> The scorer is built with a lucene.IndexReader in parameter:
> public Scorer scorer(IndexReader reader) .
>
> Within solr, this IndexReader happens to be an instance of
> SolrIndexReader so I though maybe I could downcast reader to a
> SolrIndexReader to be able to call the offset related methods on it
> (getBase() etc...).
>
> I feel quite unconfortable with this solution since my DocSetQuery
> inherits from a lucene thing, so it would be quite odd to downcast
> something to a solr class inside it, plus I didn't really figured out
> how to use those offset related methods.
>
> Thanks for your help!
>
> All the best!
>
> Jerome Eteve.
>
> --
> Jerome Eteve.
>
> Chat with me live at http://www.eteve.net
>
> jer...@eteve.net
>

Re: Implementing customized Scorer with solr API 1.4

You might be interested in this issue:
http://issues.apache.org/jira/browse/LUCENE-1821

-- 
- Mark

http://www.lucidimagination.com



Jérôme Etévé wrote:
> Hi all,
>
>  I'm kind of struggling with a customized lucene.Scorer of mine, since
> I use solr 1.4.
>
>  Here's the problem:
>
>  I wrote a DocSetQuery which inherit from a lucene.Query. This query
> is a decorator for a lucene.Query that filters out the documents which
> are not in a given set of  predefined documents (a solr.DocSet which I
> call docset ).
>
> So In my Weight / Scorer, I implemented the method  nextDoc like that:
>
> public int nextDoc() throws IOException {
> do {
>  if (decoScorer.nextDoc() == NO_MORE_DOCS) {
>   return NO_MORE_DOCS;
>  }
> // DO THIS UNTIL the doc is in the docset
>  } while (!docset.exists(decoScorer.docID()));
>  return decoScorer.docID();
> }
>
> The decoScorer here is the decorated scorer.
>
> My problem here is that in docset, there are 'absolute' documents IDs,
> but now solr uses a number of sub readers each with a kind of offset,
> so decoScorer.docID() gives 'relative' document ID . Because of this,
> I happen to test relative document IDs against a set of absolute
> docIDs.
>
> So my DocSetQuery does not work anymore. The solution would be I think
> to have a way of getting the offset of the SolrReader being used in
> the context to be able to do docset.exists(decoScorer.docID() +
> offset) .
>
> But how can I get this offset?
> The scorer is built with a lucene.IndexReader in parameter:
> public Scorer scorer(IndexReader reader) .
>
> Within solr, this IndexReader happens to be an instance of
> SolrIndexReader so I though maybe I could downcast reader to a
> SolrIndexReader to be able to call the offset related methods on it
> (getBase() etc...).
>
> I feel quite unconfortable with this solution since my DocSetQuery
> inherits from a lucene thing, so it would be quite odd to downcast
> something to a solr class inside it, plus I didn't really figured out
> how to use those offset related methods.
>
> Thanks for your help!
>
> All the best!
>
> Jerome Eteve.
>
>

solr and approximate string matching

2009-08-20 Thread Ryszard Szopa

Hi,

I've been using Solr for some time in the simplest possible way (as a
backend to a search engine for English documents) and I've been really
happy about it. However, now I need to do something which is a bit
non-standard, and unfortunately I am desperately stuck. To make things
more complicated, I am using solr in a Django application through
Haystack [http://haystacksearch.org], but I am pretty sure that
there's no funny business going on between haystack and solr.

So, we have a database of movies and series, and as the data comes
from many sources of varying reliability, we'd like to be able to do
fuzzy string matching on the titles of episodes (the default matching
mechanisms operate on word levels, which is not good enough for short
strings, like titles). I had used n-grams approximate matching in the
past, and I was very happy to find that Lucene (and Solr) supports
something like this out of the box.

I assumed that I need a special field type for this, so I added the
following field-type to my schema.xml:

   
 
   
   
 
   

and changed the appropriate field in the schema to:



However, this is not working as I expected. The query analysis looks
correctly, but I don't get any results, which makes me believe that
something happens at index time (ie. the title is indexed like a
default string field instead of trigram field).

Moreover, I would like to be able to do something more. I'd like to
lowercace the string, remove all punctuation marks and spaces, remove
English stopwords and THEN change the string into trigrams. However,
the filters are applied only after the string has been tokenized...

Could you please suggest me any solution to this problem?

Thanks in advance for your answers.

 -- Ryszard Szopa

-- 
http://gryziemy.net
http://robimy.net

WordDelimiterFilter to QueryParser to MultiPhraseQuery?

2009-08-20 Thread jOhn

If you have several tokens, for example after a WordDelimiterFilter, there
is almost no way NOT to trigger a MultiPhraseQuery when you have
catenateWords="1" or catenateAll="1".

For example the title: Jokers Wild

In the index it is: jokers wild, jokers, wild, jokerswild.

When you query "jOkerswild" it becomes these tokens after the
WordDelimiterFilter/LowercaseFilter:

j(0,1,positionInc=1), okerswild(1,10,positionInc=1),
jokerswild(0,10,positionInc=0)

In the QueryParser, its j=positionCount(1), okerswild=positionCount(2),
jokerswild=positionCount(2)

Thus there is no way for jokerswild to match b/c the positionCount > 1 and
QueryParser will turn that into a MultiPhraseQuery instead of a
BooleanQuery.  Even though severalTokensAtSamePosition=true (b/c
j=startOffset(0) and jokerswild=startOffset(0)).

Isn't this a bug?  How could 2 tokens at the same position be treated as a
MultiPhraseQuery?

-nc

Implementing customized Scorer with solr API 1.4

2009-08-20 Thread Jérôme Etévé

Hi all,

 I'm kind of struggling with a customized lucene.Scorer of mine, since
I use solr 1.4.

 Here's the problem:

 I wrote a DocSetQuery which inherit from a lucene.Query. This query
is a decorator for a lucene.Query that filters out the documents which
are not in a given set of  predefined documents (a solr.DocSet which I
call docset ).

So In my Weight / Scorer, I implemented the method  nextDoc like that:

public int nextDoc() throws IOException {
do {
 if (decoScorer.nextDoc() == NO_MORE_DOCS) {
  return NO_MORE_DOCS;
 }
// DO THIS UNTIL the doc is in the docset
 } while (!docset.exists(decoScorer.docID()));
 return decoScorer.docID();
}

The decoScorer here is the decorated scorer.

My problem here is that in docset, there are 'absolute' documents IDs,
but now solr uses a number of sub readers each with a kind of offset,
so decoScorer.docID() gives 'relative' document ID . Because of this,
I happen to test relative document IDs against a set of absolute
docIDs.

So my DocSetQuery does not work anymore. The solution would be I think
to have a way of getting the offset of the SolrReader being used in
the context to be able to do docset.exists(decoScorer.docID() +
offset) .

But how can I get this offset?
The scorer is built with a lucene.IndexReader in parameter:
public Scorer scorer(IndexReader reader) .

Within solr, this IndexReader happens to be an instance of
SolrIndexReader so I though maybe I could downcast reader to a
SolrIndexReader to be able to call the offset related methods on it
(getBase() etc...).

I feel quite unconfortable with this solution since my DocSetQuery
inherits from a lucene thing, so it would be quite odd to downcast
something to a solr class inside it, plus I didn't really figured out
how to use those offset related methods.

Thanks for your help!

All the best!

Jerome Eteve.

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: where to get solr 1.4 nightly

2009-08-20 Thread Shalin Shekhar Mangar

On Thu, Aug 20, 2009 at 11:31 PM, Joe Calderon wrote:

> i want to try out the improvements in 1.4 but the nightly site is down
>
> http://people.apache.org/builds/lucene/solr/nightly/
>
>
Yeah, its going to be down for 24 hours.

>
> is there a mirror for nightlies?
>

No, but you can always check out the code and build yourself. The svn
repository is still up.

-- 
Regards,
Shalin Shekhar Mangar.

where to get solr 1.4 nightly

2009-08-20 Thread Joe Calderon

i want to try out the improvements in 1.4 but the nightly site is down

http://people.apache.org/builds/lucene/solr/nightly/


is there a mirror for nightlies?


--joe

Re: can solr accept other tag other than field?



You can use the Data Import Handler to pull data out of any XML or SQL data
source:

http://wiki.apache.org/solr/DataImportHandler

Andrew.


Elaine Li wrote:
> 
> Hi,
> 
> I am new solr user. I want to use solr search to run query against
> many xml files I have.
> I have set up the solr server to run query against the example files.
> 
> One problem is my xml does not have  tag and "name" attribute.
> My format is rather easy:
> 
> 
> 
> 
> 
> 
> I looked at the schema.xml file and realized I can only customize(add)
> attribute name.
> 
> Is there a way to let Solr accept my xml w/o me changing my xml into
> the ?
> 
> Thanks.
> 
> Elaine
> 
> 

-- 
View this message in context: 
http://www.nabble.com/can-solr-accept-other-tag-other-than-field--tp25066496p25066638.html
Sent from the Solr - User mailing list archive at Nabble.com.

can solr accept other tag other than field?

2009-08-20 Thread Elaine Li

Hi,

I am new solr user. I want to use solr search to run query against
many xml files I have.
I have set up the solr server to run query against the example files.

One problem is my xml does not have  tag and "name" attribute.
My format is rather easy:






I looked at the schema.xml file and realized I can only customize(add)
attribute name.

Is there a way to let Solr accept my xml w/o me changing my xml into
the ?

Thanks.

Elaine

Re: Is wildcard search not correctly analyzed at query?

2009-08-20 Thread Avlesh Singh

Wildcard queries are not analyzed by Lucene and hence the behavior. A
similar thread earlier -
http://www.lucidimagination.com/search/document/a6b9144ecab9d0ff/search_phrase_wildcard

Cheers
Avlesh

On Thu, Aug 20, 2009 at 7:03 PM, Alexander Herzog  wrote:

>
> It seems like the analyzer/filter isn't affected at all, since the query
>
> http://localhost:8983/solr/select/?q=PhysicalDescription:nü*&debugQuery=true
>
> does not return a
> PhysicalDescription:nu*
> as I would expect.
>
> So can I just have a "you're right, wildcard search is passed to lucene
> directly without any analyzing".
>
> If it is like this, I'm happy with that as well.
>
> best,
> Alexander
>
>
> Alexander Herzog schrieb:
> > Hi all
> >
> > sorry for the long post
> >
> > We are switching from indexdata's zebra to solr for a new book
> > archival/preservation project with multiple languages, so expect more
> > questions soon (sorry for that)
> > The features of solr are pretty cool and more or less overwhelming!
> >
> > But there is one thing I found after a little test with wildcards.
> >
> > I'm using the latest svn build and didn't change anything except the
> > schema.xml
> > Solr Specification Version: 1.3.0.2009.08.20.07.53.52
> > Solr Implementation Version: 1.4-dev 806060 - ait015 - 2009-08-20
> 07:53:52
> > Lucene Specification Version: 2.9-dev
> > Lucene Implementation Version: 2.9-dev 804692 - 2009-08-16 09:33:41
> >
> > I have a text_ws field with this schema config:
> >
> >  positionIncrementGap="100">
> >
> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >   
> >   
> >
> > 
> > ...
> > and I added a dynamic field for everything since I'm not sure what field
> > we will use...
> >
> >  > multiValued="true"/>
> > ...
> >
> >
> > So I ed this content:
> > ...
> > 
> >X, 143, XIV S.:
> >124 feine Farbendrucktafeln mit über 600 Abbildungen;
> >24,5 cm.
> > 
> > ...
> >
> > since it's German, and I couldn't find a tokenizer for German compound
> > words (any help appreciated) I wanted to search for 'Farb*'
> >
> > The final row of the query analyzer in the admin section told me:
> > farb*
> > for the content:
> > x,143,xiv s.: 124 feine   farbendrucktafeln   mit
> uber600 abbildungen;
> > 24,5  cm.
> >
> > so everything seems to be ok, everything in lower case
> >
> > Now, for the rest service:
> >
> http://localhost:8983/solr/select/?q=PhysicalDescription:Farb*&debugQuery=true
> > PhysicalDescription:Farb*
> > PhysicalDescription:Farb*
> > PhysicalDescription:Farb*
> > PhysicalDescription:Farb*
> >
> > Since Farb* has a capital letter, nothing is found.
> > When using farb* as query, I get the result.
> >
> > Where can I add/change a query anaylizer that "lower cases" wildcard
> > searches?
> >
> > thanks, best wishes,
> > Alexander
> >
>

Re: EmbeddedSolrServer restart

Yes and yes.

-- 
- Mark

http://www.lucidimagination.com



Ron Chan wrote:
> would that be the reload method in CoreContainer?
>
> will this pick up changes in schema.xml?
>
> Thanks
>
>
> markrmiller wrote:
>   
>> Ron Chan wrote:
>> 
>>> Is it possible to restart an EmbeddedSolrServer using code without having
>>> to
>>> stop and start the holding application?
>>>
>>>
>>>   
>>>   
>> Reload the core?
>>
>> -- 
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>>
>> 
>
>

Re: EmbeddedSolrServer restart

2009-08-20 Thread Ron Chan


would that be the reload method in CoreContainer?

will this pick up changes in schema.xml?

Thanks


markrmiller wrote:
> 
> Ron Chan wrote:
>> Is it possible to restart an EmbeddedSolrServer using code without having
>> to
>> stop and start the holding application?
>>
>>
>>   
> Reload the core?
> 
> -- 
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/EmbeddedSolrServer-restart-tp25065189p25065347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: EmbeddedSolrServer restart

2009-08-20 Thread Noble Paul നോബിള്‍ नोब्ळ्

Ron Chan wrote:
> Is it possible to restart an EmbeddedSolrServer using code without having to
> stop and start the holding application?
>
>
>   
Reload the core?

-- 
- Mark

http://www.lucidimagination.com

EmbeddedSolrServer restart

2009-08-20 Thread Ron Chan


Is it possible to restart an EmbeddedSolrServer using code without having to
stop and start the holding application?


-- 
View this message in context: 
http://www.nabble.com/EmbeddedSolrServer-restart-tp25065189p25065189.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Quoted search confusions

2009-08-20 Thread Chris Male

Hi,

What analyzers/filters have you configured for the field that you are
searching? One could be causing the various versions of "ilike" to be
indexed the same way.

Thanks
Chris

On Thu, Aug 20, 2009 at 5:29 PM, Vannia Rajan wrote:

> Hi,*
>
>   *I need some help to clarify how solr indexes documents. I have 6
> documents with various forms of the word "ilike" (complete word and not "i
> like") - one having "ilike" as such and others having a special character
> in
> between "i" and "like".
>
>   What i expected from solr is that, when i do a Quoted search "ilike", it
> should return only the document that had "ilike" exactly. But, what i get
> from solr is that various forms of the word "ilike" are also included in
> the
> results. Is there an option/configuration that i can do to solr so that i
> will get only the result with exact word "ilike"?
> *
>
>  The result i obtained from solr is shown below,
>
> http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score
> 
> -
> 
> 0
> 20
> -
> 
> description,score
> "ilike"
> 
> 
> -
> 
> -
> 
> 0.5
> Ilike company is doing great!
> 
> -
> 
> 0.375
> I:like company is doing great!
> 
> -
> 
> 0.3125
> I-like it very much. Really, this can come
> up!.
> 
> -
> 
> 0.3125
> I;like it very much. Really, i say.
> 
> -
> 
> 0.25
> -
> 
> i.like it very much. full stop can come? i don't know.
> 
> 
> 
> 
> --
> Thanks,
> Vanniarajan
>

Solr Quoted search confusions

2009-08-20 Thread Vannia Rajan

Hi,*

   *I need some help to clarify how solr indexes documents. I have 6
documents with various forms of the word "ilike" (complete word and not "i
like") - one having "ilike" as such and others having a special character in
between "i" and "like".

   What i expected from solr is that, when i do a Quoted search "ilike", it
should return only the document that had "ilike" exactly. But, what i get
from solr is that various forms of the word "ilike" are also included in the
results. Is there an option/configuration that i can do to solr so that i
will get only the result with exact word "ilike"?
*

  The result i obtained from solr is shown below,

http://localhost:8080/solr/select/?q=%22ilike%22&fl=description,score

-

0
20
-

description,score
"ilike"


-

-

0.5
Ilike company is doing great!

-

0.375
I:like company is doing great!

-

0.3125
I-like it very much. Really, this can come
up!.

-

0.3125
I;like it very much. Really, i say.

-

0.25
-

i.like it very much. full stop can come? i don't know.

Re: Wildcard seaches?

2009-08-20 Thread Paul Tomblin

On Thu, Aug 20, 2009 at 10:51 AM, Andrew Clegg wrote:
> Paul Tomblin wrote:
>>
>> Is there such a thing as a wildcard search?  If I have a simple
>> solr.StrField with no analyzer defined, can I query for "foo*" or
>> "foo.*" and get everything that starts with "foo" such as 'foobar" and
>> "foobaz"?
>>
>
> Yes. foo* is fine even on a simple string field.

Ah, I discovered what was going wrong - I was passing the url to
ClientUtils.escapeQueryChars, and that was escapign the *.  I have to
pass the URL without the * to escapeQueryChars, then tag the * on the
end.

Thanks.

-- 
http://www.linkedin.com/in/paultomblin

Re: How to reduce the Solr index size..

2009-08-20 Thread Grant Ingersoll



On Aug 20, 2009, at 11:00 AM, Silent Surfer wrote:


Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the  
indexes for each line of the logs, so that users would be able to do  
a fine grain search upto second/ms.


Now what we are observing is , the index size that is being created  
is almost double the size of the actual log size. i.e if the logs  
size is say 1 MB, the actual index size is around 2 MB.


Could anyone let us know what can be done to reduce the index size.  
Do we need to change any configurations/delete any files which are  
created during the indexing processes, but not required for  
searching..


Our schema is as follows:

  required="false" />
  omitNorms="true"/>

  
  
  
  
  
  
  
  stored="true"/>

  

message field holds the actual logtext.


There are a couple of things you can do:
1. stored = true only needs to be on if you are going to use that  
value later in your application (i.e. for display).  Storage is not  
needed for search.
2. You can omitNorms and termFreqsAndPositions for any fields that you  
aren't searching (but just displaying).


A doubling in size seems a bit much.  However, 1 MB is likely not  
enough to show whether this holds true for a larger index.  Often  
times, the growth of the index is sublinear, since the same terms  
appear over and over again and Lucene can obtain pretty high levels of  
compression.


Also, are you adding any other content to what comes in (synonyms,  
etc.)?


I would open up the index in Luke, too and make sure everything looks  
right.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Remove data from index

2009-08-20 Thread Marc Sturlese


As far as I know you can not do that with DIH. What size is your index?
Probably the best you can do is index from scratch again with full-import.

clico wrote:
> 
> I hope it could be a solution.
> 
> But I think I understood that u can use deletePkQuery like this
> 
> "select document_id from table_document where statusDeleted= 'Y'"
> 
> In my case I have no status like "statusDeleted".
> 
> The request I would like to write is
> 
> "Delete from my solr Index the id that are no longer present in my
> table_document"
> 
> With Lucene I had a way to do that : 
> open IndexReader,
> for each lucene document : check in table_document and remove in lucene
> index if document is no longer present in the table
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Remove-data-from-index-tp25063736p25063986.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Remove data from index

2009-08-20 Thread Constantijn Visinescu

You could write a solr query that queries *:* and only returns the id field
and then throw out all the IDs from "select id from databaseTable"
and then run a delete query for all the IDs that are left after wards.

However you'd have to write a seperate program/script to do this i think as
the DIH won't be able to do this my itself.

On Thu, Aug 20, 2009 at 5:09 PM, clico  wrote:

>
> I hope it could be a solution.
>
> But I think I understood that u can use deletePkQuery like this
>
> "select document_id from table_document where statusDeleted= 'Y'"
>
> In my case I have no status like "statusDeleted".
>
> The request I would like to write is
>
> "Delete from my solr Index the id that are no longer present in my
> table_document"
>
> With Lucene I had a way to do that :
> open IndexReader,
> for each lucene document : check in table_document and remove in lucene
> index if document is no longer present in the table
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Remove-data-from-index-tp25063736p25063965.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Remove data from index

2009-08-20 Thread clico


I hope it could be a solution.

But I think I understood that u can use deletePkQuery like this

"select document_id from table_document where statusDeleted= 'Y'"

In my case I have no status like "statusDeleted".

The request I would like to write is

"Delete from my solr Index the id that are no longer present in my
table_document"

With Lucene I had a way to do that : 
open IndexReader,
for each lucene document : check in table_document and remove in lucene
index if document is no longer present in the table




-- 
View this message in context: 
http://www.nabble.com/Remove-data-from-index-tp25063736p25063965.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Remove data from index

did you see the deletedPkQuery?

On Thu, Aug 20, 2009 at 8:27 PM, clico wrote:
>
> Hello
>
> I'm trying a way to do that :
>
> I index a db query like
>  "select id from table_documents"
>
> Some documents are updated or deleted from the data table.
>
>
> Using DIH, I can indexe the updated document
>
> But I want to remove from the index the documents that were removed in the
> database.
>
> How could I do this?
> A way would be to delete in the index each time a doc is deleted in the
> database but this is not possible
> because I canot modify the code of my documents management tool
>
> Thanks a lot
>
>
>
>
> --
> View this message in context: 
> http://www.nabble.com/Remove-data-from-index-tp25063736p25063736.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

How to reduce the Solr index size..

2009-08-20 Thread Silent Surfer

Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the indexes for 
each line of the logs, so that users would be able to do a fine grain search 
upto second/ms.

Now what we are observing is , the index size that is being created is almost 
double the size of the actual log size. i.e if the logs size is say 1 MB, the 
actual index size is around 2 MB.

Could anyone let us know what can be done to reduce the index size. Do we need 
to change any configurations/delete any files which are created during the 
indexing processes, but not required for searching..

Our schema is as follows:


   
   
   
   
   
   
   
   
   
   

message field holds the actual logtext.

Thanks,
sS

Remove data from index

2009-08-20 Thread clico


Hello

I'm trying a way to do that :

I index a db query like 
 "select id from table_documents"

Some documents are updated or deleted from the data table.


Using DIH, I can indexe the updated document

But I want to remove from the index the documents that were removed in the
database.

How could I do this?
A way would be to delete in the index each time a doc is deleted in the
database but this is not possible
because I canot modify the code of my documents management tool

Thanks a lot




-- 
View this message in context: 
http://www.nabble.com/Remove-data-from-index-tp25063736p25063736.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcard seaches?




Paul Tomblin wrote:
> 
> Is there such a thing as a wildcard search?  If I have a simple
> solr.StrField with no analyzer defined, can I query for "foo*" or
> "foo.*" and get everything that starts with "foo" such as 'foobar" and
> "foobaz"?
> 

Yes. foo* is fine even on a simple string field.

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Wildcard-seaches--tp25063582p25063623.html
Sent from the Solr - User mailing list archive at Nabble.com.

Wildcard seaches?

2009-08-20 Thread Paul Tomblin

Is there such a thing as a wildcard search?  If I have a simple
solr.StrField with no analyzer defined, can I query for "foo*" or
"foo.*" and get everything that starts with "foo" such as 'foobar" and
"foobaz"?

-- 
http://www.linkedin.com/in/paultomblin

Re: Solr Range Query Anomalities [Solved]

2009-08-20 Thread johan . sjoberg

SortableDoubleField works excellent, haven't tried TrieField though.
Thanks for the super fast support everyone.


Regards,
Johan

> On Thursday 20 August 2009 16:07, johan.sjob...@findwise.se wrote:
>> we're performing range queries of a field which is of type double. Some
>> queries which should generate results does not, and I think it's best
>> explained by the following examples; it's also expected to exist data in
>> all ranges:
>>
>>
>> ?q=field:[10.0 TO 20.0] // OK
>> ?q=field:[9.0 TO 20.0] // NOT OK
>> ?q=field:[09.0 TO 20.0] // OK
>>
>> Interesting here is that the range query only works if both ends of the
>> interval is of equal length (hence 09-to-20 works, but not 9-20).
>> Unfortunately, this logic does not work for ranges in the 100s.
>>
>>
>>
>> ?q=field:[* TO 500]  // OK
>> ?q=field:[100.0 TO 500.0] // OK
>> ?q=field:[90.0 TO 500.0] // NOT OK
>> ?q=field:[090.0 TO 500.0] // NOT OK
>>
>>
>>
>> Any ideas to this very strange behaviour?
>
> You'll want to use SortableDoubleField for range queries to work as
> expected.
>
> Also, do have a look at TrieField if you're on a recent enough version of
> Solr.
>
> .øs
>
> --
> Øystein Steimler, Produktans, EasyConnect AS  -  http://opplysning1890.no
> oystein.steim...@easyconnect.no  - GPG: 0x784a7dea - Mob: 90010882
>

Re: Solr Range Query Anomalities

2009-08-20 Thread Øystein F. Steimler

On Thursday 20 August 2009 16:07, johan.sjob...@findwise.se wrote:
> we're performing range queries of a field which is of type double. Some
> queries which should generate results does not, and I think it's best
> explained by the following examples; it's also expected to exist data in
> all ranges:
>
>
> ?q=field:[10.0 TO 20.0] // OK
> ?q=field:[9.0 TO 20.0] // NOT OK
> ?q=field:[09.0 TO 20.0] // OK
>
> Interesting here is that the range query only works if both ends of the
> interval is of equal length (hence 09-to-20 works, but not 9-20).
> Unfortunately, this logic does not work for ranges in the 100s.
>
>
>
> ?q=field:[* TO 500]  // OK
> ?q=field:[100.0 TO 500.0] // OK
> ?q=field:[90.0 TO 500.0] // NOT OK
> ?q=field:[090.0 TO 500.0] // NOT OK
>
>
>
> Any ideas to this very strange behaviour?

You'll want to use SortableDoubleField for range queries to work as expected.

Also, do have a look at TrieField if you're on a recent enough version of 
Solr.

.øs

-- 
Øystein Steimler, Produktans, EasyConnect AS  -  http://opplysning1890.no
oystein.steim...@easyconnect.no  - GPG: 0x784a7dea - Mob: 90010882


pgp6bdTiKfUhi.pgp
Description: PGP signature

Re: Solr Range Query Anomalities


Try a sdouble or sfloat field type?

Andrew.


johan.sjoberg wrote:
> 
> Hi,
> 
> we're performing range queries of a field which is of type double. Some
> queries which should generate results does not, and I think it's best
> explained by the following examples; it's also expected to exist data in
> all ranges:
> 
> 
> ?q=field:[10.0 TO 20.0] // OK
> ?q=field:[9.0 TO 20.0] // NOT OK
> ?q=field:[09.0 TO 20.0] // OK
> 
> Interesting here is that the range query only works if both ends of the
> interval is of equal length (hence 09-to-20 works, but not 9-20).
> Unfortunately, this logic does not work for ranges in the 100s.
> 
> 
> 
> ?q=field:[* TO 500]  // OK
> ?q=field:[100.0 TO 500.0] // OK
> ?q=field:[90.0 TO 500.0] // NOT OK
> ?q=field:[090.0 TO 500.0] // NOT OK
> 
> 
> 
> Any ideas to this very strange behaviour?
> 
> 
> Regards,
> Johan
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-773-%28GEO-Module%29-question-tp25041799p25062912.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Range Query Anomalities

2009-08-20 Thread Shalin Shekhar Mangar

On Thu, Aug 20, 2009 at 7:37 PM,  wrote:

> Hi,
>
> we're performing range queries of a field which is of type double. Some
> queries which should generate results does not, and I think it's best
> explained by the following examples; it's also expected to exist data in
> all ranges:
>
>
> ?q=field:[10.0 TO 20.0] // OK
> ?q=field:[9.0 TO 20.0] // NOT OK
> ?q=field:[09.0 TO 20.0] // OK
>
> Interesting here is that the range query only works if both ends of the
> interval is of equal length (hence 09-to-20 works, but not 9-20).
> Unfortunately, this logic does not work for ranges in the 100s.
>
>
Use a "sdouble" field type if you want to do range searches on that field.

-- 
Regards,
Shalin Shekhar Mangar.

Solr Range Query Anomalities

2009-08-20 Thread johan . sjoberg

Hi,

we're performing range queries of a field which is of type double. Some
queries which should generate results does not, and I think it's best
explained by the following examples; it's also expected to exist data in
all ranges:


?q=field:[10.0 TO 20.0] // OK
?q=field:[9.0 TO 20.0] // NOT OK
?q=field:[09.0 TO 20.0] // OK

Interesting here is that the range query only works if both ends of the
interval is of equal length (hence 09-to-20 works, but not 9-20).
Unfortunately, this logic does not work for ranges in the 100s.



?q=field:[* TO 500]  // OK
?q=field:[100.0 TO 500.0] // OK
?q=field:[90.0 TO 500.0] // NOT OK
?q=field:[090.0 TO 500.0] // NOT OK



Any ideas to this very strange behaviour?


Regards,
Johan

Re: Adding a prefix to fields