date:20110111

!= unequal in fq !?

2011-01-11 Thread stockii


hello.

i need to filter a field. i want all fields are not like the given string.

eg.: ...&fq=status!=refundend

how can i realize this in solr !? i dont want to use
...string+OR+string+OR+...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/unequal-in-fq-tp2233235p2233235.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: != unequal in fq !?

2011-01-11 Thread Markus Jelsma

Hi,

It works just like boolean operators in the main query:
fq=-status:refunded

http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Boolean operators

Cheers


> hello.
> 
> i need to filter a field. i want all fields are not like the given string.
> 
> eg.: ...&fq=status!=refundend
> 
> how can i realize this in solr !? i dont want to use
> ...string+OR+string+OR+...

Re: != unequal in fq !?

2011-01-11 Thread stockii


ah, cool thx =) 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/unequal-in-fq-tp2233235p2233261.html
Sent from the Solr - User mailing list archive at Nabble.com.

FunctionQuery plugin propieties

2011-01-11 Thread dante stroe

Hi,

   Is there any way one can define proprieties for a function plugin
extending the ValueSourceParser inside solrconfig.xml (as one can do with
the "defaults" attribute for a query parser plugin inside the request
handler)?

Thanks,
Dante

spell suggest response

2011-01-11 Thread satya swaroop

Hi All,
 can we get just suggestions only without the files response??
Here I state an example
when i query
http://localhost:8080/solr/spellcheckCompRH?q=java daka
usar&spellcheck=true&spellcheck.count=5&spellcheck.collate=true

i get some result of java files and then the suggestions for the words
daka-data , usar-user. But actually i need only the spell suggestions.
But here time is getting consumed for displaying of files and then giving
spell suggestions. Cant we post a query to solr where we can get
the response as only spell suggestions???

Regards,
satya

Re: spell suggest response

2011-01-11 Thread Gora Mohanty

On Tue, Jan 11, 2011 at 3:07 PM, satya swaroop  wrote:
> Hi All,
>         can we get just suggestions only without the files response??
> Here I state an example
> when i query
>    http://localhost:8080/solr/spellcheckCompRH?q=java daka
> usar&spellcheck=true&spellcheck.count=5&spellcheck.collate=true
>
> i get some result of java files and then the suggestions for the words
> daka-data , usar-user. But actually i need only the spell suggestions.
> But here time is getting consumed for displaying of files and then giving
> spell suggestions. Cant we post a query to solr where we can get
> the response as only spell suggestions???
[...]

Not sure why you would want to use Solr for this. I would suggest
using the LGPL aspell library. As aspell exposes a C interface, Java
bindings can be created with SWIG, if needed.

Regards,
Gora

Re: first steps in nlp

2011-01-11 Thread lee carroll

Just to be more explicit in terms of using synonyms. Our thinking was
something like:

1 analyse texts for patterns such as not x and list these out
2 in a synonyms txt file list in effect antonyms eg
  not pretty -> Ugly
  not ugly -> pretty
  not lively -> quiet
  not very nice -> Ugly
  etc
3 use a synonym filter referencing the antoymns at index time only.

however the language in the text is probably more complex than the above
simple phrases and nlp seems to promise a lot :-) should we venture down
that route instead?

cheers lee c


On 10 January 2011 22:04, lee carroll  wrote:

> Hi Grant,
>
> Its a search relevancy problem. For example:
>
> a document about london reads like
>
> London is not very good for a peaceful break.
>
> we analyse this at the (i can't remember the technical term) is it lexical
> level? (bloody hell i think you may have wrote the book !) anyway which
> produces tokens in our index of say
>
> "London good peaceful holiday"
>
> users search for cities which would be nice for them to take a holiday in
> say the search is
> "good for a peaceful break"
>
> and bang london is top. talk about a relevancy problem :-)
>
> now i was thinking of using phrase matches in the synonyms file but is that
> the best approach or could nlp help here?
>
> cheers lee
>
>
>
>
>
> On 10 January 2011 18:21, Grant Ingersoll  wrote:
>
>>
>> On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
>>
>> > Hi
>> >
>> > I'm indexing a set of documents which have a conversational writing
>> style.
>> > In particular the authors are very fond
>> > of listing facts in a variety of ways (this is to keep a human reader
>> > interested) but its causing my index trouble.
>> >
>> > For example instead of listing facts like: the house is white, the
>> castle is
>> > pretty.
>> >
>> > We get the house is the complete opposite of black and the castle is not
>> > ugly.
>> >
>> > What are the best approaches to resolve these sorts of issues. Even if
>> its
>> > just handling "not" correctly would be a good start
>> >
>>
>> Hmm, good problem.  I guess I'd start by stepping back and ask what is the
>> problem you are trying to solve?  You've stated, I think, one half of the
>> problem, namely that your authors have a conversational style, but you
>> haven't stated what your users are expecting to do with this information?
>>  Is this a pure search app?  Is it something else that is just backed by
>> Solr but the user would never do a search?
>>
>> Do you have a relevance problem?  Also, what is your notion of handling
>> "not" correctly?  In other words, more details are welcome!
>>
>> -Grant
>>
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>>
>

RE: first steps in nlp

2011-01-11 Thread Hong-Thai Nguyen

Hi,

Absolutely this problem is the main scope of NLP. To handle not (negative), 
passive, tense (pass, future, ...) need more advanced linguistic analyse 
(morpho-syntax) in phraseology than a sample tokenize with stem or lemm 
enhances. The output of this analyse kind is normally a tree-like structure.
Beware that this work is quite expensive.

Best,
---
Hong-Thai

-Message d'origine-
De : lee carroll [mailto:lee.a.carr...@googlemail.com] 
Envoyé : lundi 10 janvier 2011 23:04
À : solr-user@lucene.apache.org
Objet : Re: first steps in nlp

Hi Grant,

Its a search relevancy problem. For example:

a document about london reads like

London is not very good for a peaceful break.

we analyse this at the (i can't remember the technical term) is it lexical
level? (bloody hell i think you may have wrote the book !) anyway which
produces tokens in our index of say

"London good peaceful holiday"

users search for cities which would be nice for them to take a holiday in
say the search is
"good for a peaceful break"

and bang london is top. talk about a relevancy problem :-)

now i was thinking of using phrase matches in the synonyms file but is that
the best approach or could nlp help here?

cheers lee

On 10 January 2011 18:21, Grant Ingersoll  wrote:

>
> On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
>
> > Hi
> >
> > I'm indexing a set of documents which have a conversational writing
> style.
> > In particular the authors are very fond
> > of listing facts in a variety of ways (this is to keep a human reader
> > interested) but its causing my index trouble.
> >
> > For example instead of listing facts like: the house is white, the castle
> is
> > pretty.
> >
> > We get the house is the complete opposite of black and the castle is not
> > ugly.
> >
> > What are the best approaches to resolve these sorts of issues. Even if
> its
> > just handling "not" correctly would be a good start
> >
>
> Hmm, good problem.  I guess I'd start by stepping back and ask what is the
> problem you are trying to solve?  You've stated, I think, one half of the
> problem, namely that your authors have a conversational style, but you
> haven't stated what your users are expecting to do with this information?
>  Is this a pure search app?  Is it something else that is just backed by
> Solr but the user would never do a search?
>
> Do you have a relevance problem?  Also, what is your notion of handling
> "not" correctly?  In other words, more details are welcome!
>
> -Grant
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com
>
>

Re: Input raw log file

2011-01-11 Thread Gora Mohanty

On Tue, Jan 11, 2011 at 10:06 AM, Dinesh  wrote:
>
> can u give an example.. like something that is currently being used..

Sorry, I do not have anything like this at hand at the moment.

>   
>  i'am an
> engineering student and my project is to index all the real time log files
> from different devices and use some artificial intelligence and produce a
> usefull data out of it.. i'm doing this for my college.. i'm struggling more
> than a month even for a start..
[...]

It should not really be that hard. Did you go through the Solr tutorial,
get the example working, and grasp the basics of Solr indexing, and
search? If so, then it is just a matter of setting up what should be a
simple Solr schema, extracting the relevant data from the log files,
and posting it to Solr for indexing. Where exactly in this process are
you having trouble? Can you post a small (say, 10-15 lines) excerpt
of your log files, indicating which of the data you want to keep. No
promises, but maybe someone will have the time to take a crack at
it. The other way might be to look for help from a local expert in
Solr.

Finally, I am afraid that data mining / artificial intelligence is beyond
the scope of Solr, but you could look at something like Apache Mahout.

Regards,
Gora

Re: DIH dataimport.properties file

2011-01-11 Thread Grijesh.singh


You have to explicitly call
http://:/solr/dataimport?command=delta-import to start
delta-import.
You can setup it as cronjob.

As per my knowledge dataimport.properties file contains parameter to be used
in DtataImport xml configuration file for example database related  file
contains text as


#Wed Dec 15 14:56:57 IST 2010
last_index_time=2010-12-15 14\:56\:53
feed.last_index_time=2010-12-15 14\:56\:53

I don't know how it contains text as
/select?qt\=/dataimport&command\=delta-import&clean\=false&commit\=true as
you are saying.

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-dataimport-properties-file-tp2229115p2233653.html
Sent from the Solr - User mailing list archive at Nabble.com.

pruning search result with search score gradient

2011-01-11 Thread Julien Piquot


Hi everyone,

I would like to be able to prune my search result by removing the less 
relevant documents. I'm thinking about using the search score : I use 
the search scores of the document set (I assume there are sorted by 
descending order), normalise them (0 would be the the lowest value and 1 
the greatest value) and then calculate the gradient of the normalised 
scores. The documents with a gradient below a threshold value would be 
rejected.
If the scores are linearly decreasing, then no document is rejected. 
However, if there is a brutal score drop, then the documents below the 
drop are rejected.
The threshold value would still have to be tuned but I believe it would 
make a much stronger metric than an absolute search score.


What do you think about this approach? Do you see any problem with it? 
Is there any SOLR tools that could help me dealing with that?


Thanks for your answer.

Julien

Re: pruning search result with search score gradient

2011-01-11 Thread Grijesh.singh


Look at Solr Function Queries they might help you

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/pruning-search-result-with-search-score-gradient-tp2233760p2233773.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with DIH delta-import delete.

2011-01-11 Thread Matti Oinas

Problem was incorrect pk definition on data-config.xml


   

pk attribute needs to be the same as Solr uniqueField, so in my case
changing pk value from id to uuid solved the problem.


2010/12/7 Matti Oinas :
> Thanks Koji.
>
> Problem seems to be that template transformer is not used when delete
> is performed.
>
> ...
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: entry
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll
> INFO: Deleting stale documents
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
> INFO: Deleting document: 787
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
> INFO: Deleting document: 786
> ...
>
> There are entries with id 787 and 786 in database and those are marked
> as deleted. Query returns right number of deleted documents and right
> rows from database but delete fails because solr is using plain
> numeric id when deleting document. The same happens with blogs also.
>
> Matti
>
>
> 2010/12/4 Koji Sekiguchi :
>> (10/11/17 20:18), Matti Oinas wrote:
>>>
>>> Solr does not delete documents from index although delta-import says
>>> it has deleted n documents from index. I'm using version 1.4.1.
>>>
>>> The schema looks like
>>>
>>>  
>>>     >> required="true" />
>>>     >> required="true" />
>>>     
>>>     
>>>     
>>>  
>>>  uuid
>>>
>>>
>>> Relevant fields from database tables:
>>>
>>> TABLE: blogs and entries both have
>>>
>>>   Field: id
>>>    Type: int(11)
>>>    Null: NO
>>>     Key: PRI
>>> Default: NULL
>>>   Extra: auto_increment
>>> 
>>>   Field: modified
>>>    Type: datetime
>>>    Null: YES
>>>     Key:
>>> Default: NULL
>>>   Extra:
>>> 
>>>   Field: status
>>>    Type: tinyint(1) unsigned
>>>    Null: YES
>>>     Key:
>>> Default: NULL
>>>   Extra:
>>>
>>>
>>> 
>>> 
>>>        >> driver="com.mysql.jdbc.Driver".../>
>>>        
>>>                >>                                pk="id"
>>>                                query="SELECT id,description,1 as type FROM
>>> blogs WHERE status=2"
>>>                                deltaImportQuery="SELECT id,description,1
>>> as type FROM blogs WHERE
>>> status=2 AND id='${dataimporter.delta.id}'"
>>>                                deltaQuery="SELECT id FROM blogs WHERE
>>> '${dataimporter.last_index_time}'< modified AND status=2"
>>>                                deletedPkQuery="SELECT id FROM blogs WHERE
>>> '${dataimporter.last_index_time}'<= modified AND status=3"
>>>                                transformer="TemplateTransformer">
>>>                        >> template="blog-${blog.id}" />
>>>                        
>>>                        
>>>                        
>>>                
>>>                >>                                pk="id"
>>>                                query="SELECT f.id as
>>> id,f.content,f.blog_id,2 as type FROM
>>> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2"
>>>                                deltaImportQuery="SELECT f.id as
>>> id,f.content,f.blog_id,2 as type
>>> FROM entries f,blogs b WHERE f.blog_id=b.id AND
>>> f.id='${dataimporter.delta.id}'"
>>>                                deltaQuery="SELECT f.id as id FROM entries
>>> f JOIN blogs b ON
>>> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified
>>> AND b.status=2"
>>>                                deletedPkQuery="SELECT f.id as id FROM
>>> entries f JOIN blogs b ON
>>> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
>>> < b.modified"
>>>
>>>  transformer="HTMLStripTransformer,TemplateTransformer">
>>>                        >> template="entry-${entry.id}" />
>>>                        
>>>                        
>>>                        >> stripHTML="true" />
>>>                        
>>>                
>>>        
>>> 
>>>
>>> Full import and delta import works without problems when it comes to
>>> adding new documents to the index but when blog is deleted (status is
>>> set to 3 in database), solr report after delta import is something
>>> like "Indexing completed. Added/Updated: 0 documents. Deleted 81
>>> documents.". The problem is that documents are still found from solr
>>> index.
>>>
>>> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;
>>>
>>> 2. delta-import =>
>>>
>>> 
>>> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
>>> 
>>> 2010-11-17 13:00:50
>>> 2010-11-17 13:00:50
>>>
>>> So solr says it has deleted documents and that index is also optimzed

Strange query behaviour using splitOnCaseChange=1

2011-01-11 Thread Frederico Azeiteiro

Hi all,

 

I had indexed a text with the word "InterContinental" with fieldType
text (with the default filters just removing the
solr.SnowballPorterFilterFactory).

 

As far as I understand, using the filter solr.WordDelimiterFilterFactory
with splitOnCaseChange="1", this word is indexed as:

 

term text

inter

continental

intercontinental

 

When I search for "continental" the article is returned. 

When searching for "intercontinental" the article is returned

When searching for "Inter Continental" the article is returned

When searching for "Inter AND Continental" the article is returned

When searching for "InterContinental" the article is NOT returned

 

Can anyone explains why the last search didn't return the article?

 

Thank you,

 



Frederico Azeiteiro

solr wildcard queries and analyzers

2011-01-11 Thread Kári Hreinsson

Hi,

I am having a problem with the fact that no text analysis are performed on 
wildcard queries.  I have the following field type (a bit simplified):

  




  


My problem has to do with Icelandic characters, when I index a document with a 
text field including the word "sjálfsögðu" it gets indexed as "sjalfsogdu" 
(because of the ASCIIFoldingFilterFactory which replaces the Icelandic 
characters with their English equivalents).  Then, when I search (without a 
wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document as a result.  
This is convenient since it enables people to search without using accented 
characters and yet get the results they want (e.g. if they are working on 
computers with English keyboards).

However this all falls apart when using wildcard searches, then the search 
string isn't passed through the filters, and even if I search for "sjálf*" I 
don't get any results because the index doesn't contain the original words (I 
get result if I search for "sjalf*").  I know people have been having a similar 
problem with the case sensitivity of wildcard queries and most often the 
solution seems to be to lowercase the string before passing it on to solr, 
which is not exactly an optimal solution (yet a simple one in that case).  The 
Icelandic characters complicate things a bit and applying the same solution 
(doing the lowercasing and character mapping) in my application seems like 
unnecessary duplication of code already part of solr, not to mention 
complication of my application and possible maintenance down the road.

Is there any way around this?  How are people solving this?  Is there a way to 
apply the filters to wildcard queries?  I guess removing the 
ASCIIFoldingFilterFactory is the simplest "solution" but this "normalization" 
(of the text done by the filter) is often very useful.

I hope I'm not overlooking some obvious explanation. :/

Thanks in advance,
Kári Hreinsson

Re: spell suggest response

2011-01-11 Thread satya swaroop

Hi Gora,
   I am using solr for file indexing and searching, But i have a
module where i dont need any files result but only the spell suggestions, so
i asked is der anyway in solr where i would get the spell suggestion
responses only.. I think it is clear for u now.. If not tell me I will try
to explain still furthur...

Regards,
satya

Re: solr wildcard queries and analyzers

2011-01-11 Thread Matti Oinas

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers

On wildcard and fuzzy searches, no text analysis is performed on the
search word.

2011/1/11 Kári Hreinsson :
> Hi,
>
> I am having a problem with the fact that no text analysis are performed on 
> wildcard queries.  I have the following field type (a bit simplified):
>    
>      
>        
>        
>        
>        
>      
>    
>
> My problem has to do with Icelandic characters, when I index a document with 
> a text field including the word "sjálfsögðu" it gets indexed as "sjalfsogdu" 
> (because of the ASCIIFoldingFilterFactory which replaces the Icelandic 
> characters with their English equivalents).  Then, when I search (without a 
> wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document as a result.  
> This is convenient since it enables people to search without using accented 
> characters and yet get the results they want (e.g. if they are working on 
> computers with English keyboards).
>
> However this all falls apart when using wildcard searches, then the search 
> string isn't passed through the filters, and even if I search for "sjálf*" I 
> don't get any results because the index doesn't contain the original words (I 
> get result if I search for "sjalf*").  I know people have been having a 
> similar problem with the case sensitivity of wildcard queries and most often 
> the solution seems to be to lowercase the string before passing it on to 
> solr, which is not exactly an optimal solution (yet a simple one in that 
> case).  The Icelandic characters complicate things a bit and applying the 
> same solution (doing the lowercasing and character mapping) in my application 
> seems like unnecessary duplication of code already part of solr, not to 
> mention complication of my application and possible maintenance down the road.
>
> Is there any way around this?  How are people solving this?  Is there a way 
> to apply the filters to wildcard queries?  I guess removing the 
> ASCIIFoldingFilterFactory is the simplest "solution" but this "normalization" 
> (of the text done by the filter) is often very useful.
>
> I hope I'm not overlooking some obvious explanation. :/
>
> Thanks in advance,
> Kári Hreinsson
>

Re: solr wildcard queries and analyzers

2011-01-11 Thread Matti Oinas

Sorry, the message was not meant to be sent here. We are struggling
with the same problem here.

2011/1/11 Matti Oinas :
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
>
> On wildcard and fuzzy searches, no text analysis is performed on the
> search word.
>
> 2011/1/11 Kári Hreinsson :
>> Hi,
>>
>> I am having a problem with the fact that no text analysis are performed on 
>> wildcard queries.  I have the following field type (a bit simplified):
>>    
>>      
>>        
>>        
>>        
>>        
>>      
>>    
>>
>> My problem has to do with Icelandic characters, when I index a document with 
>> a text field including the word "sjálfsögðu" it gets indexed as "sjalfsogdu" 
>> (because of the ASCIIFoldingFilterFactory which replaces the Icelandic 
>> characters with their English equivalents).  Then, when I search (without a 
>> wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document as a result.  
>> This is convenient since it enables people to search without using accented 
>> characters and yet get the results they want (e.g. if they are working on 
>> computers with English keyboards).
>>
>> However this all falls apart when using wildcard searches, then the search 
>> string isn't passed through the filters, and even if I search for "sjálf*" I 
>> don't get any results because the index doesn't contain the original words 
>> (I get result if I search for "sjalf*").  I know people have been having a 
>> similar problem with the case sensitivity of wildcard queries and most often 
>> the solution seems to be to lowercase the string before passing it on to 
>> solr, which is not exactly an optimal solution (yet a simple one in that 
>> case).  The Icelandic characters complicate things a bit and applying the 
>> same solution (doing the lowercasing and character mapping) in my 
>> application seems like unnecessary duplication of code already part of solr, 
>> not to mention complication of my application and possible maintenance down 
>> the road.
>>
>> Is there any way around this?  How are people solving this?  Is there a way 
>> to apply the filters to wildcard queries?  I guess removing the 
>> ASCIIFoldingFilterFactory is the simplest "solution" but this 
>> "normalization" (of the text done by the filter) is often very useful.
>>
>> I hope I'm not overlooking some obvious explanation. :/
>>
>> Thanks in advance,
>> Kári Hreinsson
>>
>

Re: spell suggest response

2011-01-11 Thread Stefan Matheis

Satya,

what about rows=0 .. if i got i correct .. :)

Regards
Stefan

On Tue, Jan 11, 2011 at 1:19 PM, satya swaroop wrote:

> Hi Gora,
>   I am using solr for file indexing and searching, But i have a
> module where i dont need any files result but only the spell suggestions,
> so
> i asked is der anyway in solr where i would get the spell suggestion
> responses only.. I think it is clear for u now.. If not tell me I will try
> to explain still furthur...
>
> Regards,
> satya
>

Re: first steps in nlp

2011-01-11 Thread Grant Ingersoll


On Jan 10, 2011, at 5:04 PM, lee carroll wrote:

> Hi Grant,
> 
> Its a search relevancy problem. For example:
> 
> a document about london reads like
> 
> London is not very good for a peaceful break.
> 
> we analyse this at the (i can't remember the technical term) is it lexical
> level? (bloody hell i think you may have wrote the book !) anyway which
> produces tokens in our index of say
> 
> "London good peaceful holiday"
> 
> users search for cities which would be nice for them to take a holiday in
> say the search is
> "good for a peaceful break"
> 
> and bang london is top. talk about a relevancy problem :-)

First question, why are you getting rid of "not"?  Despite it's reputation as a 
"stopword", it does carry a significant amount of meaning for you.  Then, you 
could probably do some phrase based searching that would help in some cases.

> 
> now i was thinking of using phrase matches in the synonyms file but is that
> the best approach or could nlp help here?

I suppose it could.  During indexing,  you could detect that it is a negative 
connotation and change it to be "bad for a peaceful break" or something like 
that.  I'm not aware of any system that does that.  You could also use some 
sentiment analysis to analyze the sentence and determine it is a negative 
sentence and then tag it as negative such that your query takes that into 
account.  Payloads and/or marker tokens would likely help here.

-Grant


> 
> cheers lee
> 
> 
> 
> 
> On 10 January 2011 18:21, Grant Ingersoll  wrote:
> 
>> 
>> On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
>> 
>>> Hi
>>> 
>>> I'm indexing a set of documents which have a conversational writing
>> style.
>>> In particular the authors are very fond
>>> of listing facts in a variety of ways (this is to keep a human reader
>>> interested) but its causing my index trouble.
>>> 
>>> For example instead of listing facts like: the house is white, the castle
>> is
>>> pretty.
>>> 
>>> We get the house is the complete opposite of black and the castle is not
>>> ugly.
>>> 
>>> What are the best approaches to resolve these sorts of issues. Even if
>> its
>>> just handling "not" correctly would be a good start
>>> 
>> 
>> Hmm, good problem.  I guess I'd start by stepping back and ask what is the
>> problem you are trying to solve?  You've stated, I think, one half of the
>> problem, namely that your authors have a conversational style, but you
>> haven't stated what your users are expecting to do with this information?
>> Is this a pure search app?  Is it something else that is just backed by
>> Solr but the user would never do a search?
>> 
>> Do you have a relevance problem?  Also, what is your notion of handling
>> "not" correctly?  In other words, more details are welcome!
>> 
>> -Grant
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> 
>> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Re: solr wildcard queries and analyzers

2011-01-11 Thread Matti Oinas

This might be the solution.

http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html

2011/1/11 Matti Oinas :
> Sorry, the message was not meant to be sent here. We are struggling
> with the same problem here.
>
> 2011/1/11 Matti Oinas :
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
>>
>> On wildcard and fuzzy searches, no text analysis is performed on the
>> search word.
>>
>> 2011/1/11 Kári Hreinsson :
>>> Hi,
>>>
>>> I am having a problem with the fact that no text analysis are performed on 
>>> wildcard queries.  I have the following field type (a bit simplified):
>>>    
>>>      
>>>        
>>>        
>>>        
>>>        
>>>      
>>>    
>>>
>>> My problem has to do with Icelandic characters, when I index a document 
>>> with a text field including the word "sjálfsögðu" it gets indexed as 
>>> "sjalfsogdu" (because of the ASCIIFoldingFilterFactory which replaces the 
>>> Icelandic characters with their English equivalents).  Then, when I search 
>>> (without a wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document 
>>> as a result.  This is convenient since it enables people to search without 
>>> using accented characters and yet get the results they want (e.g. if they 
>>> are working on computers with English keyboards).
>>>
>>> However this all falls apart when using wildcard searches, then the search 
>>> string isn't passed through the filters, and even if I search for "sjálf*" 
>>> I don't get any results because the index doesn't contain the original 
>>> words (I get result if I search for "sjalf*").  I know people have been 
>>> having a similar problem with the case sensitivity of wildcard queries and 
>>> most often the solution seems to be to lowercase the string before passing 
>>> it on to solr, which is not exactly an optimal solution (yet a simple one 
>>> in that case).  The Icelandic characters complicate things a bit and 
>>> applying the same solution (doing the lowercasing and character mapping) in 
>>> my application seems like unnecessary duplication of code already part of 
>>> solr, not to mention complication of my application and possible 
>>> maintenance down the road.
>>>
>>> Is there any way around this?  How are people solving this?  Is there a way 
>>> to apply the filters to wildcard queries?  I guess removing the 
>>> ASCIIFoldingFilterFactory is the simplest "solution" but this 
>>> "normalization" (of the text done by the filter) is often very useful.
>>>
>>> I hope I'm not overlooking some obvious explanation. :/
>>>
>>> Thanks in advance,
>>> Kári Hreinsson
>>>
>>
>

What can cause segment corruption?

2011-01-11 Thread Stéphane Delprat


Hi,


I'm using Solr 1.4.1 (Lucene 2.9.3)

And some segments get corrupted:

  4 of 11: name=_p40 docCount=470035
compound=false
hasProx=true
numFiles=9
size (MB)=1,946.747
diagnostics = {optimize=true, mergeFactor=6, 
os.version=2.6.26-2-amd64, os=Linux, mergeDocStores=true, 
lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=merge, 
os.arch=amd64, java.version=1.6.0_20, java.vendor=Sun Microsystems Inc.}

has deletions [delFileName=_p40_bj.del]
test: open reader.OK [9299 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term source:margolisphil docFreq=1 
!= num docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term source:margolisphil docFreq=1 != num 
docs seen 0 + num docs deleted 0
at 
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [15454281 total field count; avg 
33.543 fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]

FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:

java.lang.RuntimeException: Term Index test failed
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


What might cause this corruption?


I detailed my configuration here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c4d2ae506.7070...@blogspirit.com%3e

Thanks,

Re: Synonyms at index time

2011-01-11 Thread Grant Ingersoll


On Jan 10, 2011, at 10:57 PM, TxCSguy wrote:

> 
> Hi,
> 
> I'm not sure if this question is better posted in Solr - User or Solr - Dev,
> but I'll start here.
> 
> I'm interested to find some documentation that describes in detail how
> synonym expansion is handled at index time.  
> http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/
> This  article explains what the index looks like for three example
> documents.  However, I'm looking for some documentation about what the index
> (the inverted index) looks like when synonyms are thrown into the mix.  

Synonyms are injected by the token filter and appear as any other tokens.  
Usually they are at the same position as the original word.  Try using Solr's 
Analysis tool (via the admin) to see what it looks like.


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Strange query behaviour using splitOnCaseChange=1

2011-01-11 Thread Koji Sekiguchi


(11/01/11 20:49), Frederico Azeiteiro wrote:

Hi all,



I had indexed a text with the word "InterContinental" with fieldType
text (with the default filters just removing the
solr.SnowballPorterFilterFactory).



As far as I understand, using the filter solr.WordDelimiterFilterFactory
with splitOnCaseChange="1", this word is indexed as:



term text

inter

continental

intercontinental



When I search for "continental" the article is returned.

When searching for "intercontinental" the article is returned

When searching for "Inter Continental" the article is returned

When searching for "Inter AND Continental" the article is returned

When searching for "InterContinental" the article is NOT returned



Can anyone explains why the last search didn't return the article?


Frederico,

Doesn't preserveOriginal="1" help you?

Koji
--
http://www.rondhuit.com/en/

Re: Tuning StatsComponent

2011-01-11 Thread stockii


simplest solution is more RAM !? 

sometimes i think, that is a standard solution for problems with solr ;-) 

i going to buy 100 GB RAM :P
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tuning-StatsComponent-tp2225809p2234557.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: What can cause segment corruption?

2011-01-11 Thread Jason Rutherglen

Stéphane,

I've only seen production index corruption when during merge the
process ran out of disk space, or there is an underlying hardware
related issue.

On Tue, Jan 11, 2011 at 5:06 AM, Stéphane Delprat
 wrote:
> Hi,
>
>
> I'm using Solr 1.4.1 (Lucene 2.9.3)
>
> And some segments get corrupted:
>
>  4 of 11: name=_p40 docCount=470035
>    compound=false
>    hasProx=true
>    numFiles=9
>    size (MB)=1,946.747
>    diagnostics = {optimize=true, mergeFactor=6, os.version=2.6.26-2-amd64,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
> 01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
> java.vendor=Sun Microsystems Inc.}
>    has deletions [delFileName=_p40_bj.del]
>    test: open reader.OK [9299 deleted docs]
>    test: fields..OK [51 fields]
>    test: field norms.OK [51 fields]
>    test: terms, freq, prox...ERROR [term source:margolisphil docFreq=1 !=
> num docs seen 0 + num docs deleted 0]
> java.lang.RuntimeException: term source:margolisphil docFreq=1 != num docs
> seen 0 + num docs deleted 0
>        at
> org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>    test: stored fields...OK [15454281 total field count; avg 33.543
> fields per doc]
>    test: term vectorsOK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> FAILED
>    WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: Term Index test failed
>        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
>
>
> What might cause this corruption?
>
>
> I detailed my configuration here:
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c4d2ae506.7070...@blogspirit.com%3e
>
> Thanks,
>

Re: What can cause segment corruption?

2011-01-11 Thread Stéphane Delprat


Thanks for your answer,

It's not a disk space problem here :

# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda4 280G   22G  244G   9% /


We will try to install solr on a different server (We just need a little 
time for that)



Stéphane


Le 11/01/2011 15:42, Jason Rutherglen a écrit :

Stéphane,

I've only seen production index corruption when during merge the
process ran out of disk space, or there is an underlying hardware
related issue.

On Tue, Jan 11, 2011 at 5:06 AM, Stéphane Delprat
  wrote:

Hi,


I'm using Solr 1.4.1 (Lucene 2.9.3)

And some segments get corrupted:

  4 of 11: name=_p40 docCount=470035
compound=false
hasProx=true
numFiles=9
size (MB)=1,946.747
diagnostics = {optimize=true, mergeFactor=6, os.version=2.6.26-2-amd64,
os=Linux, mergeDocStores=true, lucene.version=2.9.3 951790 - 2010-06-06
01:30:55, source=merge, os.arch=amd64, java.version=1.6.0_20,
java.vendor=Sun Microsystems Inc.}
has deletions [delFileName=_p40_bj.del]
test: open reader.OK [9299 deleted docs]
test: fields..OK [51 fields]
test: field norms.OK [51 fields]
test: terms, freq, prox...ERROR [term source:margolisphil docFreq=1 !=
num docs seen 0 + num docs deleted 0]
java.lang.RuntimeException: term source:margolisphil docFreq=1 != num docs
seen 0 + num docs deleted 0
at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:675)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
test: stored fields...OK [15454281 total field count; avg 33.543
fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]
FAILED
WARNING: fixIndex() would remove reference to this segment; full
exception:
java.lang.RuntimeException: Term Index test failed
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)


What might cause this corruption?


I detailed my configuration here:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3c4d2ae506.7070...@blogspirit.com%3e

Thanks,

Re: Tuning StatsComponent

2011-01-11 Thread Grant Ingersoll

On Jan 11, 2011, at 9:37 AM, stockii wrote:

> 
> simplest solution is more RAM !? 
> 
> sometimes i think, that is a standard solution for problems with solr ;-) 

FWIW, it's a solution for most computing problems, right?

> 
> i going to buy 100 GB RAM :P

That won't do it.  More RAM is sometimes the answer, but not always.  Too much 
RAM to the JVM often equals too much garbage which means stop the world garbage 
collections. 

--
Grant Ingersoll
http://www.lucidimagination.com

Re: Storing metadata from post parameters and XML

2011-01-11 Thread Erick Erickson

I'm not quite sure whether your question is answered or not, so ignore me if
it is...

But I'm having trouble envisioning this part

"they can use
the dblocation field to retrieve the data for editing purposes (and then
re-index it following their edits)."

I'd never, ever, ever let a user edit the XML and re-post it. You're just
asking
for messed up data (I mean, nobody is really good enough to hand-edit
XML, and for sure random users aren't).

Somewhere, I suspect you'll have a program that the user interacts with that
handles this kind of thing, parsing the XML, presenting it in a format the
user
can't mess up, saving it away and re-indexing. It's pretty easy to use
something
like SolrJ to handle the interactions with solar part...

A common way is right at your step "and if they decide that's the data they
want",
which is often clicking a link in a browser. At that point, you launch your
own very
special program with enough meta-data to find the file to edit and provide a
front-end
to let them edit it under controlled circumstances. You can use SolrJ to
re-index
after the user is done.

Of course, I may well be way off base relative to your app...

Best
Erick

On Mon, Jan 10, 2011 at 11:22 AM, Walter Closenfleight <
walter.p.closenflei...@gmail.com> wrote:

> Stefan,
>
>
>
> You're right. I was attempting to post some quick pseudo-code, but that
>  is pretty misleading, they should have been  elements, like
>  name="dblocation">/abc/def/ghi/123.xml, or something to that affect.
>
>
>
> Thanks,
>
> Walter
>
>
> On Mon, Jan 10, 2011 at 10:08 AM, Stefan Matheis <
> matheis.ste...@googlemail.com> wrote:
>
> > Hey Walter,
> >
> > what's against just putting your db-location in a 'string' field, and use
> > it
> > like any other value?
> > There is no special field-type for something like a
> > path/directory/location-information, afaik.
> >
> > Regards
> > Stefan
> >
> > On Mon, Jan 10, 2011 at 4:50 PM, Walter Closenfleight <
> > walter.p.closenflei...@gmail.com> wrote:
> >
> > > I'm very unclear on how to associate what I need to a Solr index entry.
> > > Based on what I've read thus far, you can extract data from text files
> > and
> > > store that in a Solr document.
> > >
> > > I have hundreds of thousands of documents in a database/svn type
> system.
> > > When I index a file, it is likely going to be local to the filesystem
> and
> > I
> > > know the location it will take on in the database. So, when I index, I
> > want
> > > to provide a path that it can find it when someone else does a search.
> > >
> > > 123.xml may look like:
> > >
> > > 
> > > my title
> > > Every foobar has its day
> > > My caption
> > > 
> > >
> > > and the proprietary location I want it to be associated with is:
> > >
> > > /abc/def/ghi/123.xml
> > >
> > > So, when a user does a search for "foobar", it returns some information
> > > about 123.xml but most importantly the location should be available.
> > >
> > > I have yet to find (in the schema.xml or otherwise) where you can
> define
> > > that path to store, and how you would pass along that parameter in the
> > > indexing of that document.
> > >
> > > Instead, from the examples I can find, including the book, you store
> > fields
> > > from your data into the index. In the book's examples (a music
> database),
> > > searching for "Cherub Rock" returns a list of with their duration,
> track
> > > name, album name, and artist. In other words, the full text data you
> > > retrieve is the only information the search index has to offer.
> > >
> > > Just for example, using the exampledocs post.jar, I'm envisioning
> > something
> > > like this:
> > >
> > > java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml"
> -othermeta1
> > > "xxx" -othermeta2 "zzz"
> > >
> > > Then the Solr doc would look like:
> > > 
> > > 123
> > > /abc/def/ghi/123.xml
> > > xxx
> > > zzz
> > > my title
> > > /abc/xxx.gif
> > > Every foobar has its day My caption
> > > 
> > >
> > > This way, when a user searches for foobar, they get item 123 back,
> review
> > > the search result and if they decide that's the data they want, they
> can
> > > use
> > > the dblocation field to retrieve the data for editing purposes (and
> then
> > > re-index it following their edits).
> > >
> > > I'm guessing I just haven't found the right terms yet to look into, as
> > I'm
> > > very new to this. Thanks for any direction you can provide. Also, if
> Solr
> > > appears to be the wrong tool for what I need, let me know as well!
> > >
> > > Thank you,
> > > Walter
> > >
> >
>

default RegexFragmenter

2011-01-11 Thread Sebastian M


Hello,

I'm investigating an issue where spellcheck queries are tokenized without
being explicitly told to do so, resulting in suggestions such as
"www.www.product4sale.com.com" for the queries such as
"www.product4sale.com".

The default RegexFragmenter fragmenter (name="regex") uses the regular
expression:

[-\w ,/\n\"']{20,200}

I understand parts of it, but I'm not sure about the - sign, or the slash
midway through it.
I would like to perhaps tailor this regular expression to not cause query
terms such as "www.product4sale.com" to be broken down on the period marks,
but just be kept as they are.

Any suggestions or answers are highly appreciated!

Sebastian
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/default-RegexFragmenter-tp2235106p2235106.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multiple Solr instances common core possible ?

2011-01-11 Thread Ravi Kiran

Hello,
Is it possible to deploy multiple solr instances with different
context roots pointing to the same solr core ? If I do this will there be
any deadlocks or file handle issues ? The reason I need this setup is
because I want to expose solr to an third party vendor via a different
context root. My solr instance is deployed on Glassfish. Alternately, if
there is a configurable way to setup multiple context roots for the same
solr instance that will suffice at this point of time.

Ravi Kiran

Re: Multiple Solr instances common core possible ?

2011-01-11 Thread Dennis Gearon

NOT sure about any of it, but THINK that READ ONLY, with one solr instance 
doing 
writes is possible. I've heard that it's NEVER possible to do multiple Solr 
Instances writing.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Ravi Kiran 
To: solr-user@lucene.apache.org
Sent: Tue, January 11, 2011 9:15:06 AM
Subject: Multiple Solr instances common core possible ?

Hello,
Is it possible to deploy multiple solr instances with different
context roots pointing to the same solr core ? If I do this will there be
any deadlocks or file handle issues ? The reason I need this setup is
because I want to expose solr to an third party vendor via a different
context root. My solr instance is deployed on Glassfish. Alternately, if
there is a configurable way to setup multiple context roots for the same
solr instance that will suffice at this point of time.

Ravi Kiran

Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread alexei


Hi,

I am in a situation where the data needed for one of the fields in my
document
may be sitting in a different datasource each time.

I would like to be able to configure something like this:
http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2235573.html
Sent from the Solr - User mailing list archive at Nabble.com.

Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw


I've noticed that performing a query with facet.mincount=0 and no fq clauses
results in a response where only facets with non-zero counts are returned,
but adding in an fq clause (caused by a user selecting a non-zero-valued
facet value checkbox) actually causes a bunch of 0-count facet values
completely unrelated to the query to be returned.

Is adding the fq constraint actually widening the query before
facet.mincount gets applied?  

E.g., say a query with no fq constraint produces the following facet values:

ID
1234 (1)
 (15)
1010 (30)

Title
Red (11)
Green (15)
Blue (32)

but when the user selects Blue (32), and I add &fq=Color:Blue, Solr returns
the following:

ID
1 (0)
2 (0)
3 (0)
...
99 (0)
100 (0)

Color
Orange (0)
Teal (0)
Red (0)
Green (0)
Blue (32)


Notice how, before the fq clause is added, none of the 0-count facets are
returned, even though facet.mincount = 0, but afterward, a bunch of 0-count
facets are returned?


The context of my question is trying to solve a problem where the
application must display facet values with a count of zero as filtering
operations remove them from the result set.  That is, if Red (10) was
displayed after the initial query, but the user filters on Blue (32), then
we must still display Red (0) so the user can select it and widen the query.  
Initially, we were using mincount=1 and managing the missing facets entirely
within the application, but now I'm trying to see if we can use mincount=0
and maybe some other constraints to achieve the same behavior without a lot
of custom code in the application.

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236105.html
Sent from the Solr - User mailing list archive at Nabble.com.

Grouping - not sure where to handle (solr or outside of)

2011-01-11 Thread kmf


I currently have a DIH that is working in terms of being able to
search/filter on various facets, but I'm struggling to figure out how to
take it to the next level of what I'd like ideally.

We have a database where the "atomic" unit is a condition (like an
environment description - temp, light, high salt, etc) and these conditions
can be in groups.

For example, conditionA may belong to groups "huckleberry", "star wars" and
"some group".

When I search/filter on a facet I'm currently able to see the conditions and
the information about the conditions (like which group(s) it belongs to),
but what I'm wanting to do is be able to return group names and their member
conditions along with the conditions' respective info when I search/filter
on a facet.

So instead of seeing:

- condtionA
description: some description
groups:  huckleberry, star wars, some group

I would like to see is:
- huckleberry
  conditionA   temp: 78light: 12hrs, NaCl: 35g/L
  condition35 control, temp: 65, NaCl: 25g/L

- star wars
  conditionA temp: 78light: 12hrs, NaCl: 35g/L
  conditionDE temp: 78, light: 24hrs, NaCl: 0


Is this doable?  My DIH has one entity that is "conditions" with all of its
sub entities, would I need to change the DIH to achieve what I want to do? 
And/or do I need to configure the solrconfig and schema files to be able to
do what I want to do?

I realize that part of the problem is presentation which is not solr, but
I'm struggling with figuring out how to transpose from condition to group in
the index, if that makes sense?  Assuming that's what I need to do.

Or am I totally wrong in thinking I would handle this in the index?

Thanks,
kmf

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-not-sure-where-to-handle-solr-or-outside-of-tp2236108p2236108.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread Gora Mohanty

On Tue, Jan 11, 2011 at 11:10 PM, alexei  wrote:
>
> Hi,
>
> I am in a situation where the data needed for one of the fields in my
> document
> may be sitting in a different datasource each time.
[...]

At what point of time will you be aware of which datasource
the field is coming from? How are you initiating the import?
One possibility might be to start the import from a script, which
first rewrites the data import configuration file according to the
datasource that the field is expected to come from.

Regards,
Gora

Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread Ahmet Arslan

> I've noticed that performing a query with facet.mincount=0
> and no fq clauses
> results in a response where only facets with non-zero
> counts are returned,
> but adding in an fq clause (caused by a user selecting a
> non-zero-valued
> facet value checkbox) actually causes a bunch of 0-count
> facet values
> completely unrelated to the query to be returned.
> 
> Is adding the fq constraint actually widening the query
> before
> facet.mincount gets applied?  
> 
> E.g., say a query with no fq constraint produces the
> following facet values:
> 
> ID
> 1234 (1)
>  (15)
> 1010 (30)
> 
> Title
> Red (11)
> Green (15)
> Blue (32)
> 
> but when the user selects Blue (32), and I add
> &fq=Color:Blue, Solr returns
> the following:
> 
> ID
> 1 (0)
> 2 (0)
> 3 (0)
> ...
> 99 (0)
> 100 (0)
> 
> Color
> Orange (0)
> Teal (0)
> Red (0)
> Green (0)
> Blue (32)
> 
> 
> Notice how, before the fq clause is added, none of the
> 0-count facets are
> returned, even though facet.mincount = 0, but afterward, a
> bunch of 0-count
> facets are returned?

This is normal.

> The context of my question is trying to solve a problem
> where the
> application must display facet values with a count of zero
> as filtering
> operations remove them from the result set.  That is,
> if Red (10) was
> displayed after the initial query, but the user filters on
> Blue (32), then
> we must still display Red (0) so the user can select it and
> widen the query.  
> Initially, we were using mincount=1 and managing the
> missing facets entirely
> within the application, but now I'm trying to see if we can
> use mincount=0
> and maybe some other constraints to achieve the same
> behavior without a lot
> of custom code in the application.

I couldn't fully follow, but you want something like multi-select faceting?

http://search-lucene.com/ is an example for that, user can select solr and 
lucene from the project facet at the same time.

http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams

Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw



>> Notice how, before the fq clause is added, none of the
>> 0-count facets are
>> returned, even though facet.mincount = 0, but afterward, a
>> bunch of 0-count
>> facets are returned?
>>
> This is normal.

What's behind that?  Is it widening the results before the mincount
constraint is being applied?


> I couldn't fully follow, but you want something like multi-select
> faceting?
> 
> http://search-lucene.com/ is an example for that, user can select solr and
> lucene from the project facet > at the same time.
> 
> http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams


No.  Search-Lucene actually appears to remove facets when they're not
returned.  If you select Blue, and Red is eliminated, Red won't show up as a
facet anymore.  Therefore, the user select Red to add it back into the
result set.

Multi-selection keeps eliminated facets, but gives them virtual counts
related to the entire result set.  If you select Blue(32) and Red(10) is
eliminated, multi-selection causes Red(10) to be displayed.  Therefore, the
user can't tell that Red was eliminated, and the Red facet no longer has any
connection to the values int the result set.

What we need to do is show eliminated facets with a 0 count, so if you
select Blue (32) and Red (10) is eliminated, we show Red (0).  That
indicates that there are zero documents in the result set for Red, but Red
can still be selected to add Red documents back into the result set.


Thanks



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236309.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrJ Question about Bad Request Root cause error

2011-01-11 Thread roz dev

Hi All

We are using SolrJ client (v 1.4.1) to integrate with our solr search
server.
We notice that whenever SolrJ request does not match with Solr schema, we
get Bad Request exception which makes sense.

org.apache.solr.common.SolrException: Bad Request

But, SolrJ Client does not provide any clue about the reason request is Bad.

Is there any way to get the root cause on client side?

Of Course, solr server logs have enough info to know that data is bad but it
would be great
to have the same info in the exception generated by SolrJ.

Any thoughts? Is there any plan to add this in future releases?

Thanks,
Saroj

Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread alexei


Hi Gora,

Thank you for your reply.

The datasource number is stored in the database.
The parent entity queries for this number and in theory it 
should becomes available to the child entity - "Article" in my case.

I am initiating the import via solr/db/dataimport?command=full-import

Script is a good idea, but I will have close to 200+ datasources and I would
have to generate a map of all the Article ids each time I do a full import
or update.
Did you mean a script that would import all the articles from each
Datasource and then reload 
the config solr/db/dataimport?command=reload-config ?

In my mind this should be following the same mechanism which resolves
variables in queries.
Any other ideas?

Regards,
Alex
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Resolve-a-DataImportHandler-datasource-based-on-previous-entity-tp2235573p2236472.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread Ahmet Arslan

> >> Notice how, before the fq clause is added, none of
> the
> >> 0-count facets are
> >> returned, even though facet.mincount = 0, but
> afterward, a
> >> bunch of 0-count
> >> facets are returned?
> >>
> > This is normal.
> 
> What's behind that?  Is it widening the results before
> the mincount
> constraint is being applied?

After re-reading, it is not normal that none of the 0-count facets are showing 
up. Can you give us full parameter list that you obtain this
by adding &echoParams=all to your search URL?

May be you limit facets to three in your first query? What happens when you add 
&facet.limit=-1?

Re: SolrJ Question about Bad Request Root cause error

2011-01-11 Thread Savvas-Andreas Moysidis

good point! that's an enhancement we would definitely welcome as well.

currently, we too have to remote desktop to the Sol machine and search
through the logs..

Any thoughts?

Cheers,
-- Savvas

On 11 January 2011 19:59, roz dev  wrote:

> Hi All
>
> We are using SolrJ client (v 1.4.1) to integrate with our solr search
> server.
> We notice that whenever SolrJ request does not match with Solr schema, we
> get Bad Request exception which makes sense.
>
> org.apache.solr.common.SolrException: Bad Request
>
> But, SolrJ Client does not provide any clue about the reason request is
> Bad.
>
> Is there any way to get the root cause on client side?
>
> Of Course, solr server logs have enough info to know that data is bad but
> it
> would be great
> to have the same info in the exception generated by SolrJ.
>
> Any thoughts? Is there any plan to add this in future releases?
>
> Thanks,
> Saroj
>

Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw



iorixxx wrote:
> 
> 
> After re-reading, it is not normal that none of the 0-count facets are
> showing up. Can you give us full parameter list that you obtain this
> by adding &echoParams=all to your search URL?
> 
> May be you limit facets to three in your first query? What happens when
> you add &facet.limit=-1?
> 
> 

We're actually using the default facet.limit value of 100.  I will increase
it to 200 and see if the non-zero-count facets show up.  Maybe that was
causing my confusion.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236768.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw

mrw wrote:
> 
> 
> We're actually using the default facet.limit value of 100.  I will
> increase it to 200 and see if the non-zero-count facets show up.  Maybe
> that was causing my confusion.
> 

Yep -- the 0-count facets were not being returned due to the facet.limit
cutoff.

So, unless there is another parameter that can be used with facet.mincount=0
in order to tune the results, it looks like I will need to use
facet.mincount=1 and handle the processing of omitted facets in the
application.

Thanks for the help.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236801.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-11 Thread Shane Perry

By placing some strategic debug messages, I have found that the JDBC
connections are not being closed until all  elements have been
processed (in the entire config file).  A simplified example would be:

  ... field list ...

... field list ...

  ... field list ...

... field list ...

The behavior is:

JDBC connection opened for entity1 and entity1a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity1 and entity1a
JDBC connection opened for entity2 and entity2a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity2 and entity2a
All JDBC connections (none are closed at this point) are closed.

In my instance, I have some 95 unique  elements (19 parents with 5
children each), resulting in 95 open JDBC connections.  If I understand the
process correctly, it should be safe to close the JDBC connection for a
"root"  (immediate children of ) and all descendant
 elements once the parent has been successfully completed.  I have
been digging around the code, but due to my unfamiliarity with the code, I'm
not sure where this would occur.

Is this a valid solution?  It's looking like I should probably open a defect
and I'm willing to do so along with submitting a patch, but need a little
more direction on where the fix would best reside.

Thanks,

Shane

On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry  wrote:

> Gora,
>
> Thanks for the response.  After taking another look, you are correct about
> the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
> didn't recognize the case difference in the two function calls, so missed
> it.  I'll keep looking into the original issue and reply if I find a
> cause/solution.
>
> Shane
>
>
> On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty  wrote:
>
>> On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry  wrote:
>> > Hi,
>> >
>> > I am in the process of migrating our system from Postgres 8.4 to Solr
>> > 1.4.1.  Our system is fairly complex and as a result, I have had to
>> define
>> > 19 base entities in the data-config.xml definition file.  Each of these
>> > entities executes 5 queries.  When doing a full-import, as each entity
>> > completes, the server hosting Postgres shows 5 "idle in transaction" for
>> the
>> > entity.
>> >
>> > In digging through the code, I found that the JdbcDataSource wraps the
>> > ResultSet object in a custom ResultSetIterator object, leaving the
>> ResultSet
>> > open.  Walking through the code I can't find a close() call anywhere on
>> the
>> > ResultSet.  I believe this results in the "idle in transaction"
>> processes.
>> [...]
>>
>> Have not examined the "idle in transaction" issue that you
>> mention, but the ResultSet object in a ResultSetIterator is
>> closed in the private hasnext() method, when there are no
>> more results, or if there is an exception. hasnext() is called
>> by the public hasNext() method that should be used in
>> iterating over the results, so I see no issue there.
>>
>> Regards,
>> Gora
>>
>> P.S. This is from Solr 1.4.0 code, but I would not think that
>>this part of the code would have changed.
>>
>
>

Re: segment gets corrupted (after background merge ?)

2011-01-11 Thread Michael McCandless

When you hit corruption is it always this same problem?:

  java.lang.RuntimeException: term source:margolisphil docFreq=1 !=
num docs seen 0 + num docs deleted 0

Can you run with Lucene's IndexWriter infoStream turned on, and catch
the output leading to the corruption?  If something is somehow messing
up the bits in the deletes file that could cause this.

Mike

On Mon, Jan 10, 2011 at 5:52 AM, Stéphane Delprat
 wrote:
> Hi,
>
> We are using :
> Solr Specification Version: 1.4.1
> Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 18:06:42
> Lucene Specification Version: 2.9.3
> Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55
>
> # java -version
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>
> We want to index 4M docs in one core (and when it works fine we will add
> other cores with 2M on the same server) (1 doc ~= 1kB)
>
> We use SOLR replication every 5 minutes to update the slave server (queries
> are executed on the slave only)
>
> Documents are changing very quickly, during a normal day we will have approx
> :
> * 200 000 updated docs
> * 1000 new docs
> * 200 deleted docs
>
>
> I attached the last good checkIndex : solr20110107.txt
> And the corrupted one : solr20110110.txt
>
>
> This is not the first time a segment gets corrupted on this server, that's
> why I ran frequent "checkIndex". (but as you can see the first segment is
> 1.800.000 docs and it works fine!)
>
>
> I can't find any "SEVER" "FATAL" or "exception" in the Solr logs.
>
>
> I also attached my schema.xml and solrconfig.xml
>
>
> Is there something wrong with what we are doing ? Do you need other info ?
>
>
> Thanks,
>

[Example] Compound Queries

2011-01-11 Thread Adam Estrada

All,

I have the following query which works just fine for querying a date range.
Now I would like to add any kind of spatial query to the mix. Would someone
be so kind as to help me out with an example spatial query that works in
conjunction with my date range query?

http://localhost:8983/solr/select/?q=hurricane+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]&version=2.2&start=0&rows=10&indent=on

I think it's something like this but my results are a not correct

http://localhost:8983/solr/select/?q=hurricane+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]&sfield=store&pt=45.15,-93.85&sort=geodist()%20asc&version=2.2&start=0&rows=10&indent=on

Your feedback is greatly appreciated!
Adam

Re: [Example] Compound Queries

2011-01-11 Thread Estrada Groups

I am using Solr4.0 for my testing right now if that helps.

Adam



On Jan 11, 2011, at 10:46 PM, Adam Estrada  
wrote:

> All,
> 
> I have the following query which works just fine for querying a date range.
> Now I would like to add any kind of spatial query to the mix. Would someone
> be so kind as to help me out with an example spatial query that works in
> conjunction with my date range query?
> 
> http://localhost:8983/solr/select/?q=hurricane+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]&version=2.2&start=0&rows=10&indent=on
> 
> I think it's something like this but my results are a not correct
> 
> http://localhost:8983/solr/select/?q=hurricane+AND+eventdate:[2006-01-21T00:00:000Z+TO+2007-01-21T00:00:000Z]&sfield=store&pt=45.15,-93.85&sort=geodist()%20asc&version=2.2&start=0&rows=10&indent=on
> 
> Your feedback is greatly appreciated!
> Adam

field_Type 'Textsplell'

2011-01-11 Thread Isha Garg


Hi,
Plz explain me the difference between field_type 'text' and 
'textspell' in solr schema.


Thanks!
Isha

Re: field_Type 'Textsplell'

2011-01-11 Thread Grijesh.singh


Check their configurations they are using different Analysis .i.e their
definition is different

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/field-Type-Textsplell-tp2239237p2239275.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr search performance

2011-01-11 Thread Isha Garg


Hi

  Plz tell me changes that made in solr config file to improve the 
solr search performance.


Thanks!

Re: spell suggest response

2011-01-11 Thread satya swaroop

Hi Stefan,
  Ya it works :). Thanks...
  But i have a question... can it be done only getting spell
suggestions even if the spelled word is correct... I mean near words to
it...
   ex:-

http://localhost:8080/solr/spellcheckCompRH?q=java&rows=0&spellcheck=true&spellcheck.count=10
   In the o/p the suggestions will not be coming as
java is a word that spelt correctly...
  But cant we get near suggestions as javax,javacetc.., ???

Regards,
satya

Re: solr search performance

2011-01-11 Thread Grijesh.singh


Which type of performance issues you have index time or query time?

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-search-performance-tp2239298p2239338.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr search performance

2011-01-11 Thread Isha Garg


On Wednesday 12 January 2011 10:56 AM, Grijesh.singh wrote:

Which type of performance issues you have index time or query time?

-
Grijesh
   
i have query time index issues.Also tell me in which condition 
field_type' textspell'  is used. Is it effect the performance of solr query.

Re: solr search performance

2011-01-11 Thread Grijesh.singh


what means query time index issue? Please provide more detail of your
problem.

the field type textSpell is defined in example schema.xml for Spell
Suggestion.What analysis chain you have used in your "textSpell" field.
for what purpose you are using that field ? 

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-search-performance-tp2239298p2239378.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

2011-01-11 Thread Dinesh


I have installed and tested the sample xml file and tried indexing..
everything went successful and when i tried with log files i got an error..
i tried reading the schema.xml and didn't get a clear idea.. can you please
help..  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

2011-01-11 Thread Grijesh.singh


How you parsed you log?
Which way you gone for index of log file data?
have you done any work what Gora Mohanty has suggested to you.

I am local for Delhi NCR area.

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239505.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

2011-01-11 Thread Dinesh


i copied it to the same exampledocs folder and did
#java -jar post.jar log.txt

and i got

SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file log.txt
SimplePostTool: FATAL: Solr returned an error:
Unexpected_character_S_code_83_in_prolog_expected___at_rowcol_unknownsource_11

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239518.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

2011-01-11 Thread Grijesh.singh


It will not work.
I think your log files are not in solr Doc xml files.

First thing is that your log files is raw data.
you have to convert it to any of solr readable data either in solr xml DOC
or CSV format to index on solr As Gora suggested to you.

-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239530.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

2011-01-11 Thread Dinesh


if i convert it to CSV or XML then it will be time consuming cause the
indexing and getting data out of it should be real time.. is there any way i
can do other than this.. if not what are the ways i can convert them to CSV
and XML.. and lastly which is the doc folder of solr
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239538.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Input raw log file

2011-01-11 Thread Grijesh.singh


First thing is that your raw log files solr can not understand. Solr needs
data according to schema  defined And also solr does not know your log file
format .

So you have to write a parser program that will parse your log files into a
existing solr writable formats .Then you can be able to index that data.



-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239548.html
Sent from the Solr - User mailing list archive at Nabble.com.

issue with the spatial search with solr

2011-01-11 Thread ur lops

Hi,
 I took the latest build from the hudson and installed on my computer. I
have done the following changes in my schema.xml

 
 
 

When i run the query like this:
HTTP ERROR 500

Problem accessing /solr/select. Reason:

The field restaurantName does not support spatial filtering

org.apache.solr.common.SolrException: The field restaurantName does
not support spatial filtering
at 
org.apache.solr.search.SpatialFilterQParser.parse(SpatialFilterQParser.java:86)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:112)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



This is my solr query:

select?wt=json&indent=true&fl=name,store&q=*:*&fq={!geofilt%20sfield=restaurantName}&pt=45.15,-93.85&d=5


Any help will be highly appreciated.

Thanks

Re: solr search performance

2011-01-11 Thread Grijesh.singh


what are your Benchmarks , 
Please describe in detail your problem, 
What is exactly being .
What is the way you are indexing and quering 
what data you indexing,
how much data you indexing.
What are your server configurations.
How much Ram you are using



-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-search-performance-tp2239298p2239714.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Resolve a DataImportHandler datasource based on previous entity

2011-01-11 Thread Gora Mohanty

On Wed, Jan 12, 2011 at 1:40 AM, alexei  wrote:
[...]
> The datasource number is stored in the database.
> The parent entity queries for this number and in theory it
> should becomes available to the child entity - "Article" in my case.

I do not think that it is possible to have the datasource name
come from a variable.

> I am initiating the import via solr/db/dataimport?command=full-import
>
> Script is a good idea, but I will have close to 200+ datasources and I would
> have to generate a map of all the Article ids each time I do a full import
> or update.
> Did you mean a script that would import all the articles from each
> Datasource and then reload
> the config solr/db/dataimport?command=reload-config ?

I meant a script that runs the query that defines the datasources for all
fields, writes a Solr DIH configuration file, and then initiates a dataimport.

> In my mind this should be following the same mechanism which resolves
> variables in queries.
[...]

It ought to be possible to allow this syntax. I think that people have
not had a need for this.

Another possibility might be to revisit how your data are organized.
Could you explain why you need to use multiple datasources (in this
context, presumably this means multiple databases?), rather than
multiple tables?

Regards,
Gora

Re: Input raw log file

2011-01-11 Thread Dennis Gearon

A possible shortcut?

Write a regex that will parse out the fields as you want them, put that into 
some shell script that calls solr?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Grijesh.singh 
To: solr-user@lucene.apache.org
Sent: Tue, January 11, 2011 10:46:20 PM
Subject: Re: Input raw log file


First thing is that your raw log files solr can not understand. Solr needs
data according to schema  defined And also solr does not know your log file
format .

So you have to write a parser program that will parse your log files into a
existing solr writable formats .Then you can be able to index that data.



-
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239548.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: issue with the spatial search with solr

2011-01-11 Thread Dennis Gearon

You didn't happen to notice that you have one field names RestaurantLocation 
and 
another named RestaurantName, did you?

You must be submitting 'RestaurantName', and it's being applied to a geo field.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: ur lops 
To: solr-user@lucene.apache.org
Sent: Tue, January 11, 2011 11:13:36 PM
Subject: issue with the spatial search with solr

Hi,
 I took the latest build from the hudson and installed on my computer. I
have done the following changes in my schema.xml

 
 
 

When i run the query like this:
HTTP ERROR 500

Problem accessing /solr/select. Reason:

The field restaurantName does not support spatial filtering

org.apache.solr.common.SolrException: The field restaurantName does
not support spatial filtering
at 
org.apache.solr.search.SpatialFilterQParser.parse(SpatialFilterQParser.java:86)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:112)

at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)



This is my solr query:

select?wt=json&indent=true&fl=name,store&q=*:*&fq={!geofilt%20sfield=restaurantName}&pt=45.15,-93.85&d=5



Any help will be highly appreciated.

Thanks

66 matches

Mail list logo