Re: Problem with XML encode UFT-8

2011-02-24 Thread Jan Høydahl
Hi,

Attachments may not work on the mailing lists. Paste the code into email or 
provide a link.
May it be your Python code not handling UTF-8 strings correctly?

Can you paste some relevant lines from the Solr log?
If you start solr with Jetty, you can use "java -jar start.jar" and get the log 
right in your console.
The same for Tomcat would be "bin/catalina.sh run"

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 23. feb. 2011, at 13.29, jayronsoares wrote:

> 
> Hi Jan,
> 
> I appreciate you attention.
> I've tried to answer your questions to the best of my knowledge.
> 
> 2011/2/22 Jan Høydahl / Cominvent [via Lucene] <
> ml-node+2551500-1071759141-363...@n3.nabble.com>
> 
>> Hi,
>> 
>> Please explain some more.
>> a) What version of Solr?
>> 
>  Solr version 1.4
> 
> 
> 
>> b) Are you trying to feed XML or PDF?
>> 
>   XML via solrpy
> 
> 
>> c) What request handler are you feeding to? /update or /update/extract ?
>> 
>   I don't know, see the example attached
> 
>> d) Can you copy/paste some more lines from the error log?
>> 
> 
>   I'm attaching one example, so you can test for yourself.
> 
> 
> Thanks for your help.
> Cheers
> jayron
> 
> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 21. feb. 2011, at 15.02, jayronsoares wrote:
>> 
>>> 
>>> Hi I'm using solr py to stored files in pdf, however at moment of run
>> script,
>>> shows me that issue:
>>> 
>>> An invalid XML character (Unicode: 0xc) was found in the element content
>> of
>>> the document.
>>> 
>>> Someone could give some help?
>>> 
>>> cheers
>>> jayron
>>> --
>>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2545020.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2551500.html
>> To unsubscribe from Any new python libraries?, click 
>> here.
>> 
>> 
> 
> 
> 
> -- 
> " A Vida é arte do Saber...Quem quiser saber tem que viver!"
> 
> http://bucolick.tumblr.com
> http://artecultural.wordpress.com/
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2559636.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Problem in full query searching

2011-02-24 Thread Bagesh Sharma

Hi sir,  My problem is that when i am searching a string "software
engineering institute" in query then i am not getting those documents first
which have complete text matching in them. There are documents which have
complete text matching but they are not appearing above in the result. I
want the results like that first complete string matching after that 2 word
matching and at last any word matching. I am using dismax request handler. I
also studied about "Term Proximity" but its also not working for me.

I have sorted on score desc to result. After analyzing i observed that the
documents which don't have complete text in it but they have more occurrence
of 3 or 2 or 1 words in its body text due to this they are getting higher
score. Is there any way to get high score for those documents which have
complete text matching instead of more occurrences of any word.

Please suggest me.   
-- 
Thanks and Regards
   Bagesh Sharma
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: UpdateProcessor and copyField

2011-02-24 Thread Jan Høydahl
Hi,

I'd also like a more powerful/generic CopyField.

Today  always copies after UpdateChain and before analysis.
Refactoring it as an UP (using SOLR-2370 to include it as part of default 
chain) would  let us specify before UpdateChain in addition. But how could we 
get it to copy after analysis?

Imagine these lines in schema.xml:


 // Default 
when=preAnalysis



This would read in two source fields and merge them into the "keywords" field 
before UpdateChain is run. UpdateChain may do various magic with the field, and 
then before analysis it is copied to two fields, for facet and a stemmed 
version. After analysis we copy the stemmed field to another stemmed field 
(must be same field Class and multiValued of course). The PostAnalysis copying 
would also allow for some advanced hacking by copying results of different 
fieldTypes into one, enabling the usecase of lemmatization by expansion on the 
index side and thus querying multiple languages in the one and same field.

From my understanding, the RunUpdateProcessor is one monolithic beast passing 
the doc along for analysis and indexing. Would it be possible to split it in 
two, one AnalysisUpdateProcessor and one IndexUpdateProcessor?

Chris, for the custom field manipulations in custom UpdateChains it makes sense 
with a "FieldManipulator" UpdateProcessor which can be inserted wherever you 
like, and depending on use case. I believe this can/should exist independently 
from a refactoring of 

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 03.16, Chris Hostetter wrote:

> 
> : > Maybe copy fields should be refactored to happen in a new, core, 
> : update processor, so there is nothing special/awkward about them?  It 
> : seems they fit as part of what an update processor is all about, 
> : augmenting/modifying incoming documents.
> : 
> : Seems reasonable.
> : By default, the copyFields could be read from the schema for back
> : compat (and the fact that copyField does feel more natural in the
> : schema)
> 
> As someone who has written special case UpdateProcessors that clone field 
> values, I agree that it would be handy to have a new generic 
> "CopyFieldUpdateProcessor" but i'm not really on board the idea of it 
> reading  declarations by default.  the ideas really serve 
> differnet purposes...
> 
> * as an UpdateProcessor it's something that can be 
> adjusted/configured/overridden on a use cases basis - some request 
> handlers could be confgured to use a processor chain that includes the 
> CopyFieldUpdateProcessor and some could be configured not to.
> 
> * schema copyField declarations are things hat happen to *every* document, 
> regardless of where it comes from.
> 
> the use cases would be very differnet: consider a schema with many 
> differnet fields specific to certain types of documents, as well as a few 
> required fields that every type of document must have: "title", 
> "description", "body", and "maintext" fields.  it might make sense for 
> to use differnet processor chains along with a 
> CopyFieldUpdateProcessor to clone some some other fields (say: an 
> "dust_jacked_text" field for books, and a "plot_summary" field for movies) 
> into the "description" field when those docs are indexed -- but if you 
> absolutely positively *allways* wanted the contents of title, description, 
> and body to be copied into the "maintext" field that would make more sense 
> as a schema.xml declaration.
> 
> likewise: it would be handy t have an UpdateProcessor that rejected 
> documents that were missing some fields -- but that would not be a true 
> substitute for using required="true" on a field in the schema.xml.
> 
> a single index may have multiple valid processor chains for differnet 
> indexing situations -- but "rules" declared in the schema.xml are absolute 
> and can not be circumvented.
> 
> 
> -Hoss



Re: disable replication in a persistent way

2011-02-24 Thread Jan Høydahl
I think all of this should be adapted for SolrCloud.
ZK should be the one knowing which is master and slave. ZK should know whether 
replication on a slave is disabled or not. To disable replication it should be 
enough to set a new value in ZK, and the node will be notified and change 
behaviour at next poll. Thus, in a ZK environment we'll not need the 
replicationHandler section of solrconfig.xml at all, as it should be stored in 
distinct ZK nodes, not? We somehow have to refactor this to work seamlessly 
with and without ZK.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 05.10, Otis Gospodnetic wrote:

> Hi,
> 
> 
> - Original Message 
>> From: Ahmet Arslan 
>> Subject: disable replication in a persistent way
>> 
>> Hello,
>> 
>> solr/replication?command=disablepoll disables replication on  slave(s). 
>> However 
>> it is not persistent. After solr/tomcat restart, slave(s) will  continue 
>> polling. 
>> 
>> 
>> Is there a built-in way to disable replication on  slave side in a 
>> persistent 
>> manner?
> 
> Not that I know of.
> 
> Hoss or somebody else will correct me if I'm wrong :)
> 
>> Currently I am using system property  substitution along with 
>> solrcore.properties file to simulate  this.
>> 
>> 
>>${enable.slave:false} 
>> 
>> #solrcore.properties  in slave
>> enable.master=true
>> 
>> And modify solrcore.properties with a  custom solr request handler after the 
>> disablepoll command, to make it  persistent. It seems that there is no 
>> existing 
>> mechanism to write  solrconfig.properties file, am I  correct?
> 
> What about modifying the existing classes (the one/ones that handle the 
> disablepoll command) to take another param: persist=true|false ?
> Would that be better than a custom Solr request handler that requires a 
> separate 
> call?
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 



Re: custom query parameters

2011-02-24 Thread Jan Høydahl
I would probably try the SearchComponent route first, translating input into 
DisMax speak.
But if you have a completely different query language, a QParserPlugin could be 
the way to go.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 06.26, Michael Moores wrote:

> Trying to answer my own question.. seems like it would be a good idea to 
> create a SearchComponent and add this to the list of existing components.
> My component just converts query parameters to something that the solr 
> QueryComponent understands.
> One good way of doing it?
> 
> 
> 
> On Feb 23, 2011, at 8:12 PM, Michael Moores wrote:
> 
>> I'm required to provide a handler with some specialized query string inputs.
>> 
>> I'd like to translate the query inputs to a lucene/solr query and delegate 
>> the request to the existing lucene/dismax handler.
>> 
>> What's the best way to do this?
>> Do I implement SolrRequestHandler, or a QParser?  Do I extend the existing 
>> StandardRequestHandler?
>> 
>> thanks,
>> --Michael
>> 
>> 
>> 
>> 
>> 
>> 
> 



Re: Problem in full query searching

2011-02-24 Thread Grijesh

Try to configue more waight on ps and pf parameters of dismax request
handler to boost phrase matching documents.

Or if you do not want to consider the term frequency then use
omitTermFreqAndPositions="true" in field definition

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566230.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem when search grouping word

2011-02-24 Thread Grijesh

may synanym will help

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566548.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem when search grouping word

2011-02-24 Thread Grijesh

may synonym will help

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Make syntax highlighter caseinsensitive

2011-02-24 Thread Tarjei Huse
Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
'hl' => 'on',
'hl.fl' => 'header,body',
'hl.usePhraseHighlighter' => 'true',
'hl.highlightMultiTerm' => 'true',
'hl.fragsize' => 200,
'hl.regex.pattern' => '[-\w ,/\n\"\']{20,200}',

 

Regards / Med vennlig hilsen
Tarjei Huse




Re: "Special Circumstances" for embedded Solr

2011-02-24 Thread Devangini

Can you please show me how an http implementation of solrj querying can be
converted to one for embedded solr with the help of an example?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Special-Circumstances-for-embedded-Solr-tp833409p2566768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: embedding solr

2011-02-24 Thread Devangini

How does the SolrParams fill up directly? Shouldn't it be SolrQueryRequest
and not SolrParams, if I am not mistaken?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/embedding-solr-tp476484p2566785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Filter Query

2011-02-24 Thread Salman Akram
Hi,

I know Filter Query is really useful due to caching but I am confused about
how it filter results.

Lets say I have following criteria

Text:: "Abc def"
Date: 24th Feb, 2011

Now "abc def" might be coming in almost every document but if SOLR first
filters based on date it will have to do search only on few documents
(instead of millions)

If I put Date parameter in fq would it be first filtering on date and then
doing text search or both of them would be filtered separately and then
intersection? If its filtered separately the issue would be that lets say
"abd def" takes 20 secs on all documents (without any filters - due to large
# of documents) and it will be still taking same time but if its done only
on few documents on that specific date it would be super fast.

If fq doesn't give what I am looking for, is there any other parameter?
There should be a way as this is a very common scenario.



-- 
Regards,

Salman Akram


Re: problem when search grouping word

2011-02-24 Thread Chamnap Chhorn
There are many product names. How could I list them all, and the list is
growing fast as well?

On Thu, Feb 24, 2011 at 5:25 PM, Grijesh  wrote:

>
> may synonym will help
>
> -
> Thanx:
> Grijesh
> http://lucidimagination.com
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566550.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


synonym.txt

2011-02-24 Thread Isha Garg

Hi,
I have a doubt regarding query time synonym expansion  that 
whether   the changes apply after index creation for synonym.txt   will 
work or not? or it  will refer to initial synonym. txt present at index 
time.


Thanks!
Isha Garg


Re: Filter Query

2011-02-24 Thread Stefan Matheis
Salman,

afaik, the Query is executed first and afterwards FilterQuery steps in
Place .. so it's only an additional Filter on your Results.

Recommended Wiki-Pages on FilterQuery:
* http://wiki.apache.org/solr/CommonQueryParameters#fq
* http://wiki.apache.org/solr/FilterQueryGuidance

Regards
Stefan

On Thu, Feb 24, 2011 at 12:46 PM, Salman Akram
 wrote:
> Hi,
>
> I know Filter Query is really useful due to caching but I am confused about
> how it filter results.
>
> Lets say I have following criteria
>
> Text:: "Abc def"
> Date: 24th Feb, 2011
>
> Now "abc def" might be coming in almost every document but if SOLR first
> filters based on date it will have to do search only on few documents
> (instead of millions)
>
> If I put Date parameter in fq would it be first filtering on date and then
> doing text search or both of them would be filtered separately and then
> intersection? If its filtered separately the issue would be that lets say
> "abd def" takes 20 secs on all documents (without any filters - due to large
> # of documents) and it will be still taking same time but if its done only
> on few documents on that specific date it would be super fast.
>
> If fq doesn't give what I am looking for, is there any other parameter?
> There should be a way as this is a very common scenario.
>
>
>
> --
> Regards,
>
> Salman Akram
>


Re: synonym.txt

2011-02-24 Thread Stefan Matheis
Isha,

Solr will use the currently loaded synonyms-file, so no relation to
synonyms-file-content which was used while indexing.

But to refresh the used synonyms you'll have to restart your
java-process (in singlecore mode) or to reload your core-configuration
(otherwise)

Regards
Stefan

On Thu, Feb 24, 2011 at 12:58 PM, Isha Garg  wrote:
> Hi,
>    I have a doubt regarding query time synonym expansion  that whether   the
> changes apply after index creation for synonym.txt   will work or not? or it
>  will refer to initial synonym. txt present at index time.
>
> Thanks!
> Isha Garg
>


Question Solr Index main in RAM

2011-02-24 Thread Andrés Ospina

Hi,

My name is Felipe and i want to use the index main of solr in RAM memory.

How it's possible? I have solr 1.4

Thank you!

Felipe

Re: Filter Query

2011-02-24 Thread Salman Akram
Yea I had an idea about that...

Now logically speaking main text search should be in the Query filter so
there is no way to first filter based on meta data and then do text search
on that limited data set?

Thanks!

On Thu, Feb 24, 2011 at 5:24 PM, Stefan Matheis <
matheis.ste...@googlemail.com> wrote:

> Salman,
>
> afaik, the Query is executed first and afterwards FilterQuery steps in
> Place .. so it's only an additional Filter on your Results.
>
> Recommended Wiki-Pages on FilterQuery:
> * http://wiki.apache.org/solr/CommonQueryParameters#fq
> * http://wiki.apache.org/solr/FilterQueryGuidance
>
> Regards
> Stefan
>
> On Thu, Feb 24, 2011 at 12:46 PM, Salman Akram
>  wrote:
> > Hi,
> >
> > I know Filter Query is really useful due to caching but I am confused
> about
> > how it filter results.
> >
> > Lets say I have following criteria
> >
> > Text:: "Abc def"
> > Date: 24th Feb, 2011
> >
> > Now "abc def" might be coming in almost every document but if SOLR first
> > filters based on date it will have to do search only on few documents
> > (instead of millions)
> >
> > If I put Date parameter in fq would it be first filtering on date and
> then
> > doing text search or both of them would be filtered separately and then
> > intersection? If its filtered separately the issue would be that lets say
> > "abd def" takes 20 secs on all documents (without any filters - due to
> large
> > # of documents) and it will be still taking same time but if its done
> only
> > on few documents on that specific date it would be super fast.
> >
> > If fq doesn't give what I am looking for, is there any other parameter?
> > There should be a way as this is a very common scenario.
> >
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
>



-- 
Regards,

Salman Akram


query slop issue

2011-02-24 Thread Bagesh Sharma

Hi all, i have a search string q=water+treatment+plant  and i am using dismax
request handler where i have qs = 1 . in which way processing will be done
means with in how many words water or treatment or plant should occur to
come in result set.

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2567418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Free Webcast/Technical Case Study: How Bazaarvoice moved to Solr to implement Search Strategies for Social and eCommerce

2011-02-24 Thread Grant Ingersoll
I thought you might be interested in a technical webcast on
Solr/Lucene and e-commerce/social media that we are sponsoring,
featuring RC Johnson of Bazaarvoice. It's Wednesday, March 2, 2011 at
11:00am PST/2:00pm EST/19:00 GMT.

RC has been leading efforts at Bazaarvoice to build out their Solr
search applications moving beyond a more traditional RDBMS-centered
data strategy. If you've not heard of Bazaarovoice, they provide
user-generated content and ratings in a white-label service offering.
They use Solr to index and search millions online customer
conversations that deliver billions of monthly impressions for leading
companies in retail, manufacturing, financial services, health care,
travel and media.

Key topics this webcast will cover include:

Iterative expansion of search features and content collections
Migrating from simplistic database search to Solr-based search
Integrating statistical analytics into search at scale
Considering NoSQL for scalability and deployability of big data, to
make data easier to consume across applications

You can sign up here: http://www.eventsvc.com/lucidimagination/030211?trk=ap
and mark you calendars for Wednesday, March 2, 2011 at 11:00am
PST/2:00pm EST/19:00 GMT.

-Grant

Re: "Special Circumstances" for embedded Solr

2011-02-24 Thread Tarjei Huse
On 02/24/2011 12:16 PM, Devangini wrote:
> Can you please show me how an http implementation of solrj querying can be
> converted to one for embedded solr with the help of an example?
Hi, heres an example that almost compiles. You should be able to get
going with this.
T

class EmbeddedSolrExample {

public static void main (String[] args) {

setupSolrContainer();
addDocument();
   


SolrDocumentList getResults(QueryResponse response) {
if (response.getStatus() != 0) {
return new SolrDocumentList();
}
return response.getResults();
}
private void addDocument() throws IncompleteDocumentException,
SolrServerException, IOException {

SolrInputDocument res = new SolrInputDocument();
res.setField("body", "test");
res.setField("id", 12);


UpdateResponse s = server.add(res);
assertEquals((int) s.getStatus(), 0 );
server.commit();

SolrDocumentList res = getResults(search("test"));
System.out.println("I got " + res.size() + " documents");

}

private void setupSolrContainer() throws
ParserConfigurationException, IOException, SAXException,
IncompleteDocumentException, SolrServerException {
File home = new File("/tmp/solr");
File f = new File(home, "solr.xml");
CoreContainer container = new CoreContainer();
container.load("/tmp/solr", f);

server = new EmbeddedSolrServer(container, "model");
addDocument();
}

QueryResponse search(String words) throws SolrServerException {
SolrQuery query = new SolrQuery();
query.addField("id").addField("body").addField("score");
query.setTimeAllowed(1000);
query.setRows(50);
query.set("q", words);
query.setSortField("timestamp", ORDER.desc); // sorter på dato
return server.query(query);

}

-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Re: Question Solr Index main in RAM

2011-02-24 Thread Koji Sekiguchi

(11/02/24 21:38), Andrés Ospina wrote:


Hi,

My name is Felipe and i want to use the index main of solr in RAM memory.

How it's possible? I have solr 1.4

Thank you!

Felipe  


Welcome Felipe!

If I understand your question correctly, you can use RAMDirectoryFactory:

https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/RAMDirectoryFactory.html

But I believe it is available 3.1 (to be released soon...).

Koji
--
http://www.rondhuit.com/en/


Re: Question about Nested Span Near Query

2011-02-24 Thread Ahsan |qbal
Hi

To narrow down the issue I indexed a single document with one of the sample
queries (given below) which was giving issue.

*"evaluation of loan and lease portfolios for purposes of assessing the
adequacy of" *

Now when i Perform a search query (*TextContents:"evaluation of loan and
lease portfolios for purposes of assessing the adequacy of"*) the parsed
query is

*spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0,
true), Contents:purposes], 0, true), Contents:of], 0, true),
Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy],
0, true), Contents:of], 0, true)*

and search is not successful.

If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
end it works fine. Issue seems to come on relatively long phrases but I have
not been able to find a pattern and its really mind boggling coz I thought
this issue might be due to large position list but this is a single document
with one phrase. So its definitely not related to size of index.

Any ideas whats going on??

On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal wrote:

> Hi
>
> It didn't search.. (means no results found even results exist) one
> observation is that it works well even in the long phrases but when the long
> phrases contain stop words and same stop word exist two or more time in the
> phrase then, solr can't search with query parsed in this way.
>
>
> On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
>> Hi,
>>
>> What do you mean by "this doesn't work fine"?  Does it not work correctly
>> or is
>> it slow or ...
>>
>> I was going to suggest you look at Surround QP, but it looks like you
>> already
>> did that.  Wouldn't it be better to get Surround QP to work?
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> - Original Message 
>> > From: Ahsan |qbal 
>> > To: solr-user@lucene.apache.org
>> > Sent: Tue, February 22, 2011 10:59:26 AM
>> > Subject: Question about Nested Span Near Query
>> >
>> > Hi All
>> >
>> > I had a requirement to implement queries that involves phrase
>>  proximity.
>> > like user should be able to search "ab cd" w/5 "de fg", both  phrases as
>> > whole should be with in 5 words of each other. For this I  implement a
>> query
>> > parser that make use of nested span queries, so above query  would be
>> parsed
>> > as
>> >
>> > spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
>> > spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)
>> >
>> > Queries like this seems to work really good when phrases are small  but
>> when
>> > phrases are large this doesn't work fine. Now my question, Is there  any
>> > limitation of SpanNearQuery. that we cannot handle large phrases in
>>  this
>> > way?
>> >
>> > please help
>> >
>> > Regards
>> > Ahsan
>> >
>>
>
>


Re: Question about Nested Span Near Query

2011-02-24 Thread Bill Bell
Send schema and document in XML format and I'll look at it

Bill Bell
Sent from mobile


On Feb 24, 2011, at 7:26 AM, "Ahsan |qbal"  wrote:

> Hi
> 
> To narrow down the issue I indexed a single document with one of the sample
> queries (given below) which was giving issue.
> 
> *"evaluation of loan and lease portfolios for purposes of assessing the
> adequacy of" *
> 
> Now when i Perform a search query (*TextContents:"evaluation of loan and
> lease portfolios for purposes of assessing the adequacy of"*) the parsed
> query is
> 
> *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
> Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
> Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0,
> true), Contents:purposes], 0, true), Contents:of], 0, true),
> Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy],
> 0, true), Contents:of], 0, true)*
> 
> and search is not successful.
> 
> If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
> end it works fine. Issue seems to come on relatively long phrases but I have
> not been able to find a pattern and its really mind boggling coz I thought
> this issue might be due to large position list but this is a single document
> with one phrase. So its definitely not related to size of index.
> 
> Any ideas whats going on??
> 
> On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal wrote:
> 
>> Hi
>> 
>> It didn't search.. (means no results found even results exist) one
>> observation is that it works well even in the long phrases but when the long
>> phrases contain stop words and same stop word exist two or more time in the
>> phrase then, solr can't search with query parsed in this way.
>> 
>> 
>> On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
>> otis_gospodne...@yahoo.com> wrote:
>> 
>>> Hi,
>>> 
>>> What do you mean by "this doesn't work fine"?  Does it not work correctly
>>> or is
>>> it slow or ...
>>> 
>>> I was going to suggest you look at Surround QP, but it looks like you
>>> already
>>> did that.  Wouldn't it be better to get Surround QP to work?
>>> 
>>> Otis
>>> 
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>> 
>>> 
>>> 
>>> - Original Message 
 From: Ahsan |qbal 
 To: solr-user@lucene.apache.org
 Sent: Tue, February 22, 2011 10:59:26 AM
 Subject: Question about Nested Span Near Query
 
 Hi All
 
 I had a requirement to implement queries that involves phrase
>>> proximity.
 like user should be able to search "ab cd" w/5 "de fg", both  phrases as
 whole should be with in 5 words of each other. For this I  implement a
>>> query
 parser that make use of nested span queries, so above query  would be
>>> parsed
 as
 
 spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
 spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)
 
 Queries like this seems to work really good when phrases are small  but
>>> when
 phrases are large this doesn't work fine. Now my question, Is there  any
 limitation of SpanNearQuery. that we cannot handle large phrases in
>>> this
 way?
 
 please help
 
 Regards
 Ahsan
 
>>> 
>> 
>> 


Re: Question Solr Index main in RAM

2011-02-24 Thread Bill Bell
How to use this?

Bill Bell
Sent from mobile


On Feb 24, 2011, at 7:19 AM, Koji Sekiguchi  wrote:

> (11/02/24 21:38), Andrés Ospina wrote:
>> 
>> Hi,
>> 
>> My name is Felipe and i want to use the index main of solr in RAM memory.
>> 
>> How it's possible? I have solr 1.4
>> 
>> Thank you!
>> 
>> Felipe   
> 
> Welcome Felipe!
> 
> If I understand your question correctly, you can use RAMDirectoryFactory:
> 
> https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/RAMDirectoryFactory.html
> 
> But I believe it is available 3.1 (to be released soon...).
> 
> Koji
> -- 
> http://www.rondhuit.com/en/


Re: DataImportHandler in Solr 4.0

2011-02-24 Thread Mark
It seems this thread has been hijacked. My initial posting was in 
regards to my custom Evaluators always receiving a null context. Same 
Evaluators work in 1.4.1


On 2/23/11 5:47 PM, Alexandre Rocco wrote:

I got it working by building the DIH from the contrib folder and made a
change on the lib statements to map the folder that contains the .jar files.

Thanks!
Alexandre

On Wed, Feb 23, 2011 at 8:55 PM, Smiley, David W.  wrote:


The DIH is no longer supplied embedded in the Solr war file.  You need to
get it on the classpath somehow. You could add anotherhttp://www.packtpub.com/solr-1-4-enterprise-search-server/

On Feb 23, 2011, at 4:11 PM, Alexandre Rocco wrote:


Hi guys,

I'm having some issues when trying to use the DataImportHandler on Solr

4.0.

I've downloaded the latest nightly build of Solr 4.0 and configured

normally

(on the example folder) solrconfig.xml file like this:



data-config.xml



At this point I noticed that the DIH jar was not being loaded correctly
causing exceptions like:
Error loading class

'org.apache.solr.handler.dataimport.DataImportHandler'

and
java.lang.ClassNotFoundException:
org.apache.solr.handler.dataimport.DataImportHandler

Do I need to build to get DIH running on Solr 4.0?

Thanks!
Alexandre











Order Facet on ranking score

2011-02-24 Thread Jenny Arduini

Hello everybody,
Is it possibile to order the facet results on some ranking score?

I was doing a query with "or" operator and sometimes the first facet 
have inside of them only result with small rank and not important.

This cause that users are led to other reasearch not important.

//

--
Jenny Arduini
I.T.&T. S.r.l.
Strada degli Angariari, 25
47891 Falciano
Repubblica di San Marino
Tel 0549 941183
Fax 0549 974280
email: jardu...@ittweb.net
http://www.ittweb.net



facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline

Hi all,

I'm having a problem using distributed search in conjunction with the 
facet.offset parameter and lexical facet value sorting.  Is there an 
incompatibility between these?  I'm using Solr 1.41.


I have a facet with ~100k values in one index.  I'm wanting to page 
through them alphabetically.  When not using distributed search, 
everything works just fine, and very quick.  A query like this works, 
returning 10 facet values starting at the 50,001st:


http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5
# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which is 
the same index), I get no facet values returned.


http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5&shards=server:port/solr
# empty list :(

Doing a little more testing, I'm finding that with sharding I often get 
an empty list any time the facet.offset >= facet.limit.  Also, by 
example, if I do facet.limit=100 and facet.offset=90, I get 10 facet 
values.  Doing so without sharding, I get the expected (by me, at least) 
100 values (starting at what would normally be the 91st).


Can anybody shed any light on this for me?

Thanks,
Peter


Re: Question about Nested Span Near Query

2011-02-24 Thread Ahsan |qbal
Hi

schema and document are attached.

On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell  wrote:

> Send schema and document in XML format and I'll look at it
>
> Bill Bell
> Sent from mobile
>
>
> On Feb 24, 2011, at 7:26 AM, "Ahsan |qbal" 
> wrote:
>
> > Hi
> >
> > To narrow down the issue I indexed a single document with one of the
> sample
> > queries (given below) which was giving issue.
> >
> > *"evaluation of loan and lease portfolios for purposes of assessing the
> > adequacy of" *
> >
> > Now when i Perform a search query (*TextContents:"evaluation of loan and
> > lease portfolios for purposes of assessing the adequacy of"*) the parsed
> > query is
> >
> >
> *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
> > Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0,
> true),
> > Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for],
> 0,
> > true), Contents:purposes], 0, true), Contents:of], 0, true),
> > Contents:assessing], 0, true), Contents:the], 0, true),
> Contents:adequacy],
> > 0, true), Contents:of], 0, true)*
> >
> > and search is not successful.
> >
> > If I remove '*evaluation*' from start OR *'assessing the adequacy of*'
> from
> > end it works fine. Issue seems to come on relatively long phrases but I
> have
> > not been able to find a pattern and its really mind boggling coz I
> thought
> > this issue might be due to large position list but this is a single
> document
> > with one phrase. So its definitely not related to size of index.
> >
> > Any ideas whats going on??
> >
> > On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal  >wrote:
> >
> >> Hi
> >>
> >> It didn't search.. (means no results found even results exist) one
> >> observation is that it works well even in the long phrases but when the
> long
> >> phrases contain stop words and same stop word exist two or more time in
> the
> >> phrase then, solr can't search with query parsed in this way.
> >>
> >>
> >> On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic <
> >> otis_gospodne...@yahoo.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> What do you mean by "this doesn't work fine"?  Does it not work
> correctly
> >>> or is
> >>> it slow or ...
> >>>
> >>> I was going to suggest you look at Surround QP, but it looks like you
> >>> already
> >>> did that.  Wouldn't it be better to get Surround QP to work?
> >>>
> >>> Otis
> >>> 
> >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>> Lucene ecosystem search :: http://search-lucene.com/
> >>>
> >>>
> >>>
> >>> - Original Message 
>  From: Ahsan |qbal 
>  To: solr-user@lucene.apache.org
>  Sent: Tue, February 22, 2011 10:59:26 AM
>  Subject: Question about Nested Span Near Query
> 
>  Hi All
> 
>  I had a requirement to implement queries that involves phrase
> >>> proximity.
>  like user should be able to search "ab cd" w/5 "de fg", both  phrases
> as
>  whole should be with in 5 words of each other. For this I  implement a
> >>> query
>  parser that make use of nested span queries, so above query  would be
> >>> parsed
>  as
> 
>  spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
>  spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)
> 
>  Queries like this seems to work really good when phrases are small
>  but
> >>> when
>  phrases are large this doesn't work fine. Now my question, Is there
>  any
>  limitation of SpanNearQuery. that we cannot handle large phrases in
> >>> this
>  way?
> 
>  please help
> 
>  Regards
>  Ahsan
> 
> >>>
> >>
> >>
>

  3369660 
  evaluation of loan and lease portfolios for purposes of assessing the adequacy of 



 























  

  
  
  
 

  
  

  
  
  
  
  

  
  

  
  

  
 

 
	
   
 
 DocID
 Contents
 




Re: Filter Query

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 6:46 AM, Salman Akram
 wrote:
> Hi,
>
> I know Filter Query is really useful due to caching but I am confused about
> how it filter results.
>
> Lets say I have following criteria
>
> Text:: "Abc def"
> Date: 24th Feb, 2011
>
> Now "abc def" might be coming in almost every document but if SOLR first
> filters based on date it will have to do search only on few documents
> (instead of millions)

Yes, this is the way Solr works.  The filters are executed separately,
but the query is executed last with the filters (i.e. it will be
faster if the filter cuts down the number of documents).

-Yonik
http://lucidimagination.com


Re: Filter Query

2011-02-24 Thread Salman Akram
So you are agreeing that it does what I want? So in my example "Abc def"
would only be searched on 24th Feb 2010 documents?

When you say 'last with filters' does it mean first it filters out with
Filter Query and then applies Query on it?

On Thu, Feb 24, 2011 at 9:29 PM, Yonik Seeley wrote:

> On Thu, Feb 24, 2011 at 6:46 AM, Salman Akram
>  wrote:
> > Hi,
> >
> > I know Filter Query is really useful due to caching but I am confused
> about
> > how it filter results.
> >
> > Lets say I have following criteria
> >
> > Text:: "Abc def"
> > Date: 24th Feb, 2011
> >
> > Now "abc def" might be coming in almost every document but if SOLR first
> > filters based on date it will have to do search only on few documents
> > (instead of millions)
>
> Yes, this is the way Solr works.  The filters are executed separately,
> but the query is executed last with the filters (i.e. it will be
> faster if the filter cuts down the number of documents).
>
> -Yonik
> http://lucidimagination.com
>



-- 
Regards,

Salman Akram


Re: Filter Query

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 11:56 AM, Salman Akram
 wrote:
> So you are agreeing that it does what I want? So in my example "Abc def"
> would only be searched on 24th Feb 2010 documents?

Pretty much, but not exactly.  It's close enough to what you want though.

The details are that the scorer and the filter are leapfrogged, but
always starting with the filter again after a match.
If you're interested in further details, look at the source code of
IndexSearcher for a filtered query.

This was added in 1.4:
http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performance-increases-for-solr-14/

-Yonik
http://lucidimagination.com


Re: Solr 4.0 DIH

2011-02-24 Thread Koji Sekiguchi

(11/02/22 6:58), Mark wrote:

I download Solr 4.0 from trunk today and I tried using a custom Evaluator 
during my
full/delta-importing.

Within the evaluate method though, the Context is always null? When using this 
same class with Solr
1.4.1 the context always exists. Is this a bug or is this behavior expected?

Thanks


public class MyEvaluator extends Evaluator {
@Override
public String evaluate(String argument, Context context) {
// Argument is present however context is always null!
}
}



I tried my test Evaluator on Solr 4.0 and it worked as expected, context is not 
null.
What I did on example-DIH is that:

1. add the following tag to db-data-config.xml:



2. use the above evaluator:

http://www.rondhuit.com/en/


Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 10:57 AM, Peter Cline  wrote:
> Hi all,
>
> I'm having a problem using distributed search in conjunction with the
> facet.offset parameter and lexical facet value sorting.  Is there an
> incompatibility between these?  I'm using Solr 1.41.
>
> I have a facet with ~100k values in one index.  I'm wanting to page through
> them alphabetically.  When not using distributed search, everything works
> just fine, and very quick.  A query like this works, returning 10 facet
> values starting at the 50,001st:
>
> http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5
> # Butterflies - Indiana !
>
> However, if I enable distributed search, using a single shard (which is the
> same index), I get no facet values returned.
>
> http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5&shards=server:port/solr
> # empty list :(
>
> Doing a little more testing, I'm finding that with sharding I often get an
> empty list any time the facet.offset >= facet.limit.  Also, by example, if I
> do facet.limit=100 and facet.offset=90, I get 10 facet values.  Doing so
> without sharding, I get the expected (by me, at least) 100 values (starting
> at what would normally be the 91st).
>
> Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed there?

-Yonik
http://lucidimagination.com


Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline

On 02/24/2011 12:37 PM, Yonik Seeley wrote:

On Thu, Feb 24, 2011 at 10:57 AM, Peter Cline  wrote:

Hi all,

I'm having a problem using distributed search in conjunction with the
facet.offset parameter and lexical facet value sorting.  Is there an
incompatibility between these?  I'm using Solr 1.41.

I have a facet with ~100k values in one index.  I'm wanting to page through
them alphabetically.  When not using distributed search, everything works
just fine, and very quick.  A query like this works, returning 10 facet
values starting at the 50,001st:

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5
# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which is the
same index), I get no facet values returned.

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5&shards=server:port/solr
# empty list :(

Doing a little more testing, I'm finding that with sharding I often get an
empty list any time the facet.offset>= facet.limit.  Also, by example, if I
do facet.limit=100 and facet.offset=90, I get 10 facet values.  Doing so
without sharding, I get the expected (by me, at least) 100 values (starting
at what would normally be the 91st).

Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed there?

-Yonik
http://lucidimagination.com


I haven't.  I'll try the current trunk and get back to you.

Thanks,
Peter


Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline

On 02/24/2011 02:58 PM, Peter Cline wrote:

On 02/24/2011 12:37 PM, Yonik Seeley wrote:
On Thu, Feb 24, 2011 at 10:57 AM, Peter 
Cline  wrote:

Hi all,

I'm having a problem using distributed search in conjunction with the
facet.offset parameter and lexical facet value sorting.  Is there an
incompatibility between these?  I'm using Solr 1.41.

I have a facet with ~100k values in one index.  I'm wanting to page 
through
them alphabetically.  When not using distributed search, everything 
works

just fine, and very quick.  A query like this works, returning 10 facet
values starting at the 50,001st:

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5 


# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which 
is the

same index), I get no facet values returned.

http://server:port/solr/select/?q=*:*&facet.field=subject_full_facet&facet=true&f.subject_full_facet.facet.limit=10&facet.sort=lex&facet.offset=5&shards=server:port/solr 


# empty list :(

Doing a little more testing, I'm finding that with sharding I often 
get an
empty list any time the facet.offset>= facet.limit.  Also, by 
example, if I
do facet.limit=100 and facet.offset=90, I get 10 facet values.  
Doing so
without sharding, I get the expected (by me, at least) 100 values 
(starting

at what would normally be the 91st).

Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed 
there?


-Yonik
http://lucidimagination.com


I haven't.  I'll try the current trunk and get back to you.

Thanks,
Peter


I tried today's builds for the 3.x branch and the trunk.  The problem 
persists in both.


Peter


dataimport

2011-02-24 Thread Brian Lamb
Hi all,

First of all, I'm quite new to solr.

I have the server set up and everything appears to work. I set it up so that
the indexed data comes through a mysql connection:


  
db-data-config.xml
   


And here is the contents of db-data-config.xml:


   

  
 
   


When I point my browser at localhost:8983/solr/dataimport, the server
produces the following message:

Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=10
Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Feb 24, 2011 8:58:24 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
commit{dir=/wwwroot/apps/apache-solr-1.4.1/example/solr/data/index,segFN=segments_p,version=1297781919778,generation=25,filenames=[_n.nrm,
_n.tis, _n.prx, segments_p, _n.fdt, _n.frq, _n.tii, _n.fdx, _n.fnm]
Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1297781919778
Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity id with URL:
jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8&zeroDateTimeBehavior=convertToNull
Feb 24, 2011 8:58:25 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 137
Killed

So it looks like for whatever reason, the server crashes trying to do a full
import. When I add a LIMIT clause on the query, it works fine when the LIMIT
is only 250 records but if I try to do 500 records, I get the same message.

The fields types are:

SHOW CREATE TABLE mytable;
CREATE TABLE mytable (
   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `fielda` varchar(650) COLLATE utf8_unicode_ci DEFAULT NULL,
   `fieldb` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL,
   `fieldc` text COLLATE utf8_unicode_ci,
   `fieldd` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
   PRIMARY KEY (`id`)
);

How can I get Solr to do a full import without crashing? Doing it 250
records at a time is not going to be feasible because there are about 50
records.


Re: query slop issue

2011-02-24 Thread Jayendra Patil
qs is only the amount of slop on phrase queries explicitly specified
in the "q" for qf fields.
So only if the search q is "water treatment plant", would the qs come
into picture.

Slop is the maximum allowable positional distance between terms to be
considered a match is called slop.
and distance is the number of positional moves of terms to reconstruct
the phrase in same order.

So with qs=1 you are allowed for only one positional move to recreate
the exact phrase.

You may also want to check the pf and the ps params for the dismax.

Regards,
Jayendra

On Thu, Feb 24, 2011 at 8:31 AM, Bagesh Sharma  wrote:
>
> Hi all, i have a search string q=water+treatment+plant  and i am using dismax
> request handler where i have qs = 1 . in which way processing will be done
> means with in how many words water or treatment or plant should occur to
> come in result set.
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2567418.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Problem in full query searching

2011-02-24 Thread Jayendra Patil
With dismax or extended dismax parser you should be able to achieve this.

Dismax :- qf, qs, pf & ps should help you to have exact control on the
fields and boosts.
Extended Dismax :- In addition to qf, qs, pf & ps, you have pf2 and
pf3 for the two and three words shingles.

As Grijesh mentioned, use more weight for phrase or proximity matches

Regards,
Jayendra

On Thu, Feb 24, 2011 at 4:03 AM, Grijesh  wrote:
>
> Try to configue more waight on ps and pf parameters of dismax request
> handler to boost phrase matching documents.
>
> Or if you do not want to consider the term frequency then use
> omitTermFreqAndPositions="true" in field definition
>
> -
> Thanx:
> Grijesh
> http://lucidimagination.com
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566230.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 3:53 PM, Peter Cline  wrote:
> I tried today's builds for the 3.x branch and the trunk.  The problem
> persists in both.

Thanks Peter, I was now also able to duplicated the bug.  Could you
open a JIRA issue for this?

-Yonik
http://lucidimagination.com


DIH regex remove email + extract url

2011-02-24 Thread Rosa (Anuncios)

Hi,

I'm trying to remove all email address in my content field with 
following line:


regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z]{2,4}" replaceWith="" />


But it doesn't seem to remove emails? Is the syntax right?

Second thing:

I would like to extract domain name from url via a regex:



Example: url=http://www.abcd.com/product.php?id=324   --> i want to 
index source = abcd.com


What the syntax for this one?

Thanks for your help

Rosa


Re: Order Facet on ranking score

2011-02-24 Thread Markus Jelsma
No, Solr returns facets ordered alphabetically or count.

> Hello everybody,
> Is it possibile to order the facet results on some ranking score?
> 
> I was doing a query with "or" operator and sometimes the first facet
> have inside of them only result with small rank and not important.
> This cause that users are led to other reasearch not important.
> 
> //


Re: CUSTOM JSP FOR APACHE SOLR

2011-02-24 Thread Paul Libbrecht
Hello list,

as suggested below, I tried to implement a custom ResponseWriter that would 
evaluate a JSP but that seems impossible: the HttpServletRequest and the 
HttpServletResponse are not available anymore.

Have I missed something?
Should I rather do a RequestHandler?
Does anyone know an artificial way to run a JSP? (I rather not like it).

thanks in advance

paul


Le 2 févr. 2011 à 20:42, Tomás Fernández Löbbe a écrit :

> Hi Paul, I don't fully understand what you want to do. The way, I think,
> SolrJ is intended to be used is from a client application (outside Solr). If
> what you want is something like what's done with Velocity I think you could
> implement a response writer that renders the JSP and send it on the
> response.
> 
> Tomás
> 
> 
> On Mon, Jan 31, 2011 at 6:25 PM, Paul Libbrecht  wrote:
> 
>> Tomas,
>> 
>> I also know velocity can be used and works well.
>> I would be interested to a simpler way to have the objects of SOLR
>> available in a jsp than write a custom jsp processor as a request handler;
>> indeed, this seems to be the way solrj is expected to be used in the wiki
>> page.
>> 
>> Actually I migrated to velocity (which I like less than jsp) just because I
>> did not find a response to this question.
>> 
>> paul
>> 
>> 
>> Le 31 janv. 2011 à 21:53, Tomás Fernández Löbbe a écrit :
>> 
>>> Hi John, you can use whatever you want for building your application,
>> using
>>> Solr on the backend (JSP included). You should find all the information
>> you
>>> need on Solr's wiki page:
>>> http://wiki.apache.org/solr/
>>> 
>>> including some client libraries to easy
>>> integrate your application with Solr:
>>> http://wiki.apache.org/solr/IntegratingSolr
>>> 
>>> for fast prototyping you
>> could
>>> use Velocity:
>>> http://wiki.apache.org/solr/VelocityResponseWriter
>>> 
>>> Anyway, I recommend
>> you
>>> to start with Solr's tutorial:
>>> http://lucene.apache.org/solr/tutorial.html
>>> 
>>> 
>>> Good luck,
>>> Tomás
>>> 
>>> 2011/1/31 JOHN JAIRO GÓMEZ LAVERDE 
>>> 
 
 
 SOLR LUCENE
 DEVELOPERS
 
 Hi i am new to solr and i like to make a custom search page for
>> enterprise
 users
 in JSP that takes the results of Apache Solr.
 
 - Where i can find some useful examples for that topic ?
 - Is JSP the correct approach to solve mi requirement ?
 - If not what is the best solution to build a customize search page for
>> my
 users?
 
 Thanks
 from South America
 
 JOHN JAIRO GOMEZ LAVERDE
 Bogotá - Colombia
 
>> 
>> 



query results filter

2011-02-24 Thread Babak Farhang
Hi everyone,

I have some existing solr cores that for one reason or another have
documents that I need to filter from the query results page.

I would like to do this inside Solr instead of doing it on the
receiving end, in the client.  After searching the mailing list
archives and Solr wiki, it appears you do this by registering a custom
SearchHandler / SearchComponent with Solr.  Still, I don't quite
understand how this machinery fits together.  Any suggestions / ideas
/ pointers much appreciated!

Cheers,
-Babak

~~

Ideally, I'd like to find / code a solution that does the following:

1. A request handler that works like the StandardRequestHandler but
which allows an optional DocFilter (say, modeled like the
java.io.FileFilter interface)
2. Allows current pagination to work transparently.
3. Works transparently with distributed/sharded queries.


RE: query results filter

2011-02-24 Thread Jonathan Rochkind
Hmm, depending on what you are actually needing to do, can you do it with a 
simple fq param to filter out what you want filtered out, instead of needing to 
write custom Java as you are suggesting? It would be a lot easier to just use 
an fq. 

How would you describe the documents you want to filter from the query results 
page?  Can that description be represented by a Solr query you can already 
represent using the lucene, dismax, or any other existing query? If so, why not 
just use a negated fq describing what to omit from the results?

From: Babak Farhang [farh...@gmail.com]
Sent: Thursday, February 24, 2011 6:58 PM
To: solr-user
Subject: query results filter

Hi everyone,

I have some existing solr cores that for one reason or another have
documents that I need to filter from the query results page.

I would like to do this inside Solr instead of doing it on the
receiving end, in the client.  After searching the mailing list
archives and Solr wiki, it appears you do this by registering a custom
SearchHandler / SearchComponent with Solr.  Still, I don't quite
understand how this machinery fits together.  Any suggestions / ideas
/ pointers much appreciated!

Cheers,
-Babak

~~

Ideally, I'd like to find / code a solution that does the following:

1. A request handler that works like the StandardRequestHandler but
which allows an optional DocFilter (say, modeled like the
java.io.FileFilter interface)
2. Allows current pagination to work transparently.
3. Works transparently with distributed/sharded queries.


Re: DataImportHandler in Solr 4.0

2011-02-24 Thread Chris Hostetter

: It seems this thread has been hijacked. My initial posting was in regards to
: my custom Evaluators always receiving a null context. Same Evaluators work in
: 1.4.1

I'm pretty sure you are talking about a completely different thread, with 
a completely differnet subject ("Solr 4.0 DIH")



-Hoss


Re: DIH regex remove email + extract url

2011-02-24 Thread Koji Sekiguchi

Hi Rosa,





Shouldn't it be regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,4}"?




Example: url=http://www.abcd.com/product.php?id=324 --> i want to index source 
= abcd.com


Probably it could be regex="http:\/\/(.*?)\/(.*)"

I use a regex web tool:

http://www.regexplanet.com/simple/index.html

Koji
--
http://www.rondhuit.com/en/


Re: Make syntax highlighter caseinsensitive

2011-02-24 Thread Koji Sekiguchi

(11/02/24 20:18), Tarjei Huse wrote:

Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
 'hl' =>  'on',
 'hl.fl' =>  'header,body',
 'hl.usePhraseHighlighter' =>  'true',
 'hl.highlightMultiTerm' =>  'true',
 'hl.fragsize' =>  200,
 'hl.regex.pattern' =>  '[-\w ,/\n\"\']{20,200}',


Tarjei,

Maybe silly question, but why no you make body field case insensitive
and eliminate caseInsensitiveBody field, and then query and highlight on
just body field?

Koji
--
http://www.rondhuit.com/en/


Ramdirectory

2011-02-24 Thread Bill Bell
I could not figure out how to setup the ramdirectory option in solrconfig.XML. 
Does anyone have an example for 1.4?

Bill Bell
Sent from mobile



Re: Ramdirectory

2011-02-24 Thread Chris Hostetter

: I could not figure out how to setup the ramdirectory option in 
solrconfig.XML. Does anyone have an example for 1.4?

it wasn't an option in 1.4.

as Koji had already mentioned in the other thread where you chimed in
and asked about this, it was added in the 3x branch...

http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td2567166.html



-Hoss


boosting based on number of terms matched?

2011-02-24 Thread DarkNovaNick
I'm using the edismax handler, although my question is probably the same  
for dismax. When the user types a long query, I use the "mm" parameter so  
that only 75% of terms need to match. This works fine, however, sometimes  
documents that only match 75% of the terms show up higher in my results  
than documents that match 100%. I'd like to set a boost so that documents  
that match 100% will be much more likely to be put ahead of documents that  
only match 75%. Can anyone give me a pointer of how to do this? Thanks,


Nick


Re: Ramdirectory

2011-02-24 Thread Bill Bell
Thanks - yeah that is why I asked how to use it. But I still don't know
how to use it.

https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/
RAMDirectoryFactory.html


https://issues.apache.org/jira/browse/SOLR-465






Is that right? Examples? Options?

Where do I put that in solrconfig.xml ? Do I put it in
mainIndex/directoryProvider ?

I know that SOLR-465 is more generic, but
https://issues.apache.org/jira/browse/SOLR-480 seems easier to use.



Thanks.


On 2/24/11 6:21 PM, "Chris Hostetter"  wrote:

>
>: I could not figure out how to setup the ramdirectory option in
>solrconfig.XML. Does anyone have an example for 1.4?
>
>it wasn't an option in 1.4.
>
>as Koji had already mentioned in the other thread where you chimed in
>and asked about this, it was added in the 3x branch...
>
>http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671
>66.html
>
>
>
>-Hoss




Re: query results filter

2011-02-24 Thread Babak Farhang
In my case, I want to filter out "duplicate" docs so that returned
docs are unique w/ respect to a certain field (not the schema's unique
field, of course): a "duplicate" doc here is one that has same value
for a checksum field as one of the docs already in the results. It
would be great if I could somehow express that w/ a query, but I don't
think that would be possible.

On Thu, Feb 24, 2011 at 5:11 PM, Jonathan Rochkind  wrote:
> Hmm, depending on what you are actually needing to do, can you do it with a 
> simple fq param to filter out what you want filtered out, instead of needing 
> to write custom Java as you are suggesting? It would be a lot easier to just 
> use an fq.
>
> How would you describe the documents you want to filter from the query 
> results page?  Can that description be represented by a Solr query you can 
> already represent using the lucene, dismax, or any other existing query? If 
> so, why not just use a negated fq describing what to omit from the results?
> 
> From: Babak Farhang [farh...@gmail.com]
> Sent: Thursday, February 24, 2011 6:58 PM
> To: solr-user
> Subject: query results filter
>
> Hi everyone,
>
> I have some existing solr cores that for one reason or another have
> documents that I need to filter from the query results page.
>
> I would like to do this inside Solr instead of doing it on the
> receiving end, in the client.  After searching the mailing list
> archives and Solr wiki, it appears you do this by registering a custom
> SearchHandler / SearchComponent with Solr.  Still, I don't quite
> understand how this machinery fits together.  Any suggestions / ideas
> / pointers much appreciated!
>
> Cheers,
> -Babak
>
> ~~
>
> Ideally, I'd like to find / code a solution that does the following:
>
> 1. A request handler that works like the StandardRequestHandler but
> which allows an optional DocFilter (say, modeled like the
> java.io.FileFilter interface)
> 2. Allows current pagination to work transparently.
> 3. Works transparently with distributed/sharded queries.
>


Re: query slop issue

2011-02-24 Thread Bagesh Sharma

Thanks  very good explanation.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2573185.html
Sent from the Solr - User mailing list archive at Nabble.com.