Re: Problem with XML encode UFT-8

2011-02-24 Thread Jan Høydahl
Hi,

Attachments may not work on the mailing lists. Paste the code into email or 
provide a link.
May it be your Python code not handling UTF-8 strings correctly?

Can you paste some relevant lines from the Solr log?
If you start solr with Jetty, you can use java -jar start.jar and get the log 
right in your console.
The same for Tomcat would be bin/catalina.sh run

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 23. feb. 2011, at 13.29, jayronsoares wrote:

 
 Hi Jan,
 
 I appreciate you attention.
 I've tried to answer your questions to the best of my knowledge.
 
 2011/2/22 Jan Høydahl / Cominvent [via Lucene] 
 ml-node+2551500-1071759141-363...@n3.nabble.com
 
 Hi,
 
 Please explain some more.
 a) What version of Solr?
 
  Solr version 1.4
 
 
 
 b) Are you trying to feed XML or PDF?
 
   XML via solrpy
 
 
 c) What request handler are you feeding to? /update or /update/extract ?
 
   I don't know, see the example attached
 
 d) Can you copy/paste some more lines from the error log?
 
 
   I'm attaching one example, so you can test for yourself.
 
 
 Thanks for your help.
 Cheers
 jayron
 
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 21. feb. 2011, at 15.02, jayronsoares wrote:
 
 
 Hi I'm using solr py to stored files in pdf, however at moment of run
 script,
 shows me that issue:
 
 An invalid XML character (Unicode: 0xc) was found in the element content
 of
 the document.
 
 Someone could give some help?
 
 cheers
 jayron
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2545020.htmlhttp://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2545020.html?by-user=t
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 If you reply to this email, your message will be added to the discussion
 below:
 
 http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2551500.html
 To unsubscribe from Any new python libraries?, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=493419code=amF5cm9uc29hcmVzQGdtYWlsLmNvbXw0OTM0MTl8MTExMzU0MzU1Mw==.
 
 
 
 
 
 -- 
  A Vida é arte do Saber...Quem quiser saber tem que viver!
 
 http://bucolick.tumblr.com
 http://artecultural.wordpress.com/
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2559636.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Problem in full query searching

2011-02-24 Thread Bagesh Sharma

Hi sir,  My problem is that when i am searching a string software
engineering institute in query then i am not getting those documents first
which have complete text matching in them. There are documents which have
complete text matching but they are not appearing above in the result. I
want the results like that first complete string matching after that 2 word
matching and at last any word matching. I am using dismax request handler. I
also studied about Term Proximity but its also not working for me.

I have sorted on score desc to result. After analyzing i observed that the
documents which don't have complete text in it but they have more occurrence
of 3 or 2 or 1 words in its body text due to this they are getting higher
score. Is there any way to get high score for those documents which have
complete text matching instead of more occurrences of any word.

Please suggest me.   
-- 
Thanks and Regards
   Bagesh Sharma
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: UpdateProcessor and copyField

2011-02-24 Thread Jan Høydahl
Hi,

I'd also like a more powerful/generic CopyField.

Today copyField always copies after UpdateChain and before analysis.
Refactoring it as an UP (using SOLR-2370 to include it as part of default 
chain) would  let us specify before UpdateChain in addition. But how could we 
get it to copy after analysis?

Imagine these lines in schema.xml:
copyField source=my_raw_keywords dest=keywords when=preUpdate 
append=true /
copyField source=my_raw_keywords2 dest=keywords when=preUpdate 
append=true /
copyField source=keywords dest=keywords_facet / // Default 
when=preAnalysis
copyField source=keywords dest=keywords_stemmed /
copyField source=keywords_stemmed dest=all_stemmed when=postAnalysis 
append=true /

This would read in two source fields and merge them into the keywords field 
before UpdateChain is run. UpdateChain may do various magic with the field, and 
then before analysis it is copied to two fields, for facet and a stemmed 
version. After analysis we copy the stemmed field to another stemmed field 
(must be same field Class and multiValued of course). The PostAnalysis copying 
would also allow for some advanced hacking by copying results of different 
fieldTypes into one, enabling the usecase of lemmatization by expansion on the 
index side and thus querying multiple languages in the one and same field.

From my understanding, the RunUpdateProcessor is one monolithic beast passing 
the doc along for analysis and indexing. Would it be possible to split it in 
two, one AnalysisUpdateProcessor and one IndexUpdateProcessor?

Chris, for the custom field manipulations in custom UpdateChains it makes sense 
with a FieldManipulator UpdateProcessor which can be inserted wherever you 
like, and depending on use case. I believe this can/should exist independently 
from a refactoring of copyField

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 03.16, Chris Hostetter wrote:

 
 :  Maybe copy fields should be refactored to happen in a new, core, 
 : update processor, so there is nothing special/awkward about them?  It 
 : seems they fit as part of what an update processor is all about, 
 : augmenting/modifying incoming documents.
 : 
 : Seems reasonable.
 : By default, the copyFields could be read from the schema for back
 : compat (and the fact that copyField does feel more natural in the
 : schema)
 
 As someone who has written special case UpdateProcessors that clone field 
 values, I agree that it would be handy to have a new generic 
 CopyFieldUpdateProcessor but i'm not really on board the idea of it 
 reading copyField .. / declarations by default.  the ideas really serve 
 differnet purposes...
 
 * as an UpdateProcessor it's something that can be 
 adjusted/configured/overridden on a use cases basis - some request 
 handlers could be confgured to use a processor chain that includes the 
 CopyFieldUpdateProcessor and some could be configured not to.
 
 * schema copyField declarations are things hat happen to *every* document, 
 regardless of where it comes from.
 
 the use cases would be very differnet: consider a schema with many 
 differnet fields specific to certain types of documents, as well as a few 
 required fields that every type of document must have: title, 
 description, body, and maintext fields.  it might make sense for 
 to use differnet processor chains along with a 
 CopyFieldUpdateProcessor to clone some some other fields (say: an 
 dust_jacked_text field for books, and a plot_summary field for movies) 
 into the description field when those docs are indexed -- but if you 
 absolutely positively *allways* wanted the contents of title, description, 
 and body to be copied into the maintext field that would make more sense 
 as a schema.xml declaration.
 
 likewise: it would be handy t have an UpdateProcessor that rejected 
 documents that were missing some fields -- but that would not be a true 
 substitute for using required=true on a field in the schema.xml.
 
 a single index may have multiple valid processor chains for differnet 
 indexing situations -- but rules declared in the schema.xml are absolute 
 and can not be circumvented.
 
 
 -Hoss



Re: disable replication in a persistent way

2011-02-24 Thread Jan Høydahl
I think all of this should be adapted for SolrCloud.
ZK should be the one knowing which is master and slave. ZK should know whether 
replication on a slave is disabled or not. To disable replication it should be 
enough to set a new value in ZK, and the node will be notified and change 
behaviour at next poll. Thus, in a ZK environment we'll not need the 
replicationHandler section of solrconfig.xml at all, as it should be stored in 
distinct ZK nodes, not? We somehow have to refactor this to work seamlessly 
with and without ZK.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 05.10, Otis Gospodnetic wrote:

 Hi,
 
 
 - Original Message 
 From: Ahmet Arslan iori...@yahoo.com
 Subject: disable replication in a persistent way
 
 Hello,
 
 solr/replication?command=disablepoll disables replication on  slave(s). 
 However 
 it is not persistent. After solr/tomcat restart, slave(s) will  continue 
 polling. 
 
 
 Is there a built-in way to disable replication on  slave side in a 
 persistent 
 manner?
 
 Not that I know of.
 
 Hoss or somebody else will correct me if I'm wrong :)
 
 Currently I am using system property  substitution along with 
 solrcore.properties file to simulate  this.
 
 lst name=slave
str  name=enable${enable.slave:false}/str 
 
 #solrcore.properties  in slave
 enable.master=true
 
 And modify solrcore.properties with a  custom solr request handler after the 
 disablepoll command, to make it  persistent. It seems that there is no 
 existing 
 mechanism to write  solrconfig.properties file, am I  correct?
 
 What about modifying the existing classes (the one/ones that handle the 
 disablepoll command) to take another param: persist=true|false ?
 Would that be better than a custom Solr request handler that requires a 
 separate 
 call?
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 



Re: custom query parameters

2011-02-24 Thread Jan Høydahl
I would probably try the SearchComponent route first, translating input into 
DisMax speak.
But if you have a completely different query language, a QParserPlugin could be 
the way to go.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 24. feb. 2011, at 06.26, Michael Moores wrote:

 Trying to answer my own question.. seems like it would be a good idea to 
 create a SearchComponent and add this to the list of existing components.
 My component just converts query parameters to something that the solr 
 QueryComponent understands.
 One good way of doing it?
 
 
 
 On Feb 23, 2011, at 8:12 PM, Michael Moores wrote:
 
 I'm required to provide a handler with some specialized query string inputs.
 
 I'd like to translate the query inputs to a lucene/solr query and delegate 
 the request to the existing lucene/dismax handler.
 
 What's the best way to do this?
 Do I implement SolrRequestHandler, or a QParser?  Do I extend the existing 
 StandardRequestHandler?
 
 thanks,
 --Michael
 
 
 
 
 
 
 



Re: Problem in full query searching

2011-02-24 Thread Grijesh

Try to configue more waight on ps and pf parameters of dismax request
handler to boost phrase matching documents.

Or if you do not want to consider the term frequency then use
omitTermFreqAndPositions=true in field definition

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566230.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem when search grouping word

2011-02-24 Thread Grijesh

may synanym will help

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566548.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: problem when search grouping word

2011-02-24 Thread Grijesh

may synonym will help

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Make syntax highlighter caseinsensitive

2011-02-24 Thread Tarjei Huse
Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
'hl' = 'on',
'hl.fl' = 'header,body',
'hl.usePhraseHighlighter' = 'true',
'hl.highlightMultiTerm' = 'true',
'hl.fragsize' = 200,
'hl.regex.pattern' = '[-\w ,/\n\\']{20,200}',

 

Regards / Med vennlig hilsen
Tarjei Huse




Re: Special Circumstances for embedded Solr

2011-02-24 Thread Devangini

Can you please show me how an http implementation of solrj querying can be
converted to one for embedded solr with the help of an example?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Special-Circumstances-for-embedded-Solr-tp833409p2566768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: embedding solr

2011-02-24 Thread Devangini

How does the SolrParams fill up directly? Shouldn't it be SolrQueryRequest
and not SolrParams, if I am not mistaken?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/embedding-solr-tp476484p2566785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Filter Query

2011-02-24 Thread Salman Akram
Hi,

I know Filter Query is really useful due to caching but I am confused about
how it filter results.

Lets say I have following criteria

Text:: Abc def
Date: 24th Feb, 2011

Now abc def might be coming in almost every document but if SOLR first
filters based on date it will have to do search only on few documents
(instead of millions)

If I put Date parameter in fq would it be first filtering on date and then
doing text search or both of them would be filtered separately and then
intersection? If its filtered separately the issue would be that lets say
abd def takes 20 secs on all documents (without any filters - due to large
# of documents) and it will be still taking same time but if its done only
on few documents on that specific date it would be super fast.

If fq doesn't give what I am looking for, is there any other parameter?
There should be a way as this is a very common scenario.



-- 
Regards,

Salman Akram


Re: problem when search grouping word

2011-02-24 Thread Chamnap Chhorn
There are many product names. How could I list them all, and the list is
growing fast as well?

On Thu, Feb 24, 2011 at 5:25 PM, Grijesh pintu.grij...@gmail.com wrote:


 may synonym will help

 -
 Thanx:
 Grijesh
 http://lucidimagination.com
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566550.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


synonym.txt

2011-02-24 Thread Isha Garg

Hi,
I have a doubt regarding query time synonym expansion  that 
whether   the changes apply after index creation for synonym.txt   will 
work or not? or it  will refer to initial synonym. txt present at index 
time.


Thanks!
Isha Garg


Re: Filter Query

2011-02-24 Thread Stefan Matheis
Salman,

afaik, the Query is executed first and afterwards FilterQuery steps in
Place .. so it's only an additional Filter on your Results.

Recommended Wiki-Pages on FilterQuery:
* http://wiki.apache.org/solr/CommonQueryParameters#fq
* http://wiki.apache.org/solr/FilterQueryGuidance

Regards
Stefan

On Thu, Feb 24, 2011 at 12:46 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
 Hi,

 I know Filter Query is really useful due to caching but I am confused about
 how it filter results.

 Lets say I have following criteria

 Text:: Abc def
 Date: 24th Feb, 2011

 Now abc def might be coming in almost every document but if SOLR first
 filters based on date it will have to do search only on few documents
 (instead of millions)

 If I put Date parameter in fq would it be first filtering on date and then
 doing text search or both of them would be filtered separately and then
 intersection? If its filtered separately the issue would be that lets say
 abd def takes 20 secs on all documents (without any filters - due to large
 # of documents) and it will be still taking same time but if its done only
 on few documents on that specific date it would be super fast.

 If fq doesn't give what I am looking for, is there any other parameter?
 There should be a way as this is a very common scenario.



 --
 Regards,

 Salman Akram



Re: synonym.txt

2011-02-24 Thread Stefan Matheis
Isha,

Solr will use the currently loaded synonyms-file, so no relation to
synonyms-file-content which was used while indexing.

But to refresh the used synonyms you'll have to restart your
java-process (in singlecore mode) or to reload your core-configuration
(otherwise)

Regards
Stefan

On Thu, Feb 24, 2011 at 12:58 PM, Isha Garg isha.g...@orkash.com wrote:
 Hi,
    I have a doubt regarding query time synonym expansion  that whether   the
 changes apply after index creation for synonym.txt   will work or not? or it
  will refer to initial synonym. txt present at index time.

 Thanks!
 Isha Garg



Question Solr Index main in RAM

2011-02-24 Thread Andrés Ospina

Hi,

My name is Felipe and i want to use the index main of solr in RAM memory.

How it's possible? I have solr 1.4

Thank you!

Felipe

Re: Filter Query

2011-02-24 Thread Salman Akram
Yea I had an idea about that...

Now logically speaking main text search should be in the Query filter so
there is no way to first filter based on meta data and then do text search
on that limited data set?

Thanks!

On Thu, Feb 24, 2011 at 5:24 PM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Salman,

 afaik, the Query is executed first and afterwards FilterQuery steps in
 Place .. so it's only an additional Filter on your Results.

 Recommended Wiki-Pages on FilterQuery:
 * http://wiki.apache.org/solr/CommonQueryParameters#fq
 * http://wiki.apache.org/solr/FilterQueryGuidance

 Regards
 Stefan

 On Thu, Feb 24, 2011 at 12:46 PM, Salman Akram
 salman.ak...@northbaysolutions.net wrote:
  Hi,
 
  I know Filter Query is really useful due to caching but I am confused
 about
  how it filter results.
 
  Lets say I have following criteria
 
  Text:: Abc def
  Date: 24th Feb, 2011
 
  Now abc def might be coming in almost every document but if SOLR first
  filters based on date it will have to do search only on few documents
  (instead of millions)
 
  If I put Date parameter in fq would it be first filtering on date and
 then
  doing text search or both of them would be filtered separately and then
  intersection? If its filtered separately the issue would be that lets say
  abd def takes 20 secs on all documents (without any filters - due to
 large
  # of documents) and it will be still taking same time but if its done
 only
  on few documents on that specific date it would be super fast.
 
  If fq doesn't give what I am looking for, is there any other parameter?
  There should be a way as this is a very common scenario.
 
 
 
  --
  Regards,
 
  Salman Akram
 




-- 
Regards,

Salman Akram


query slop issue

2011-02-24 Thread Bagesh Sharma

Hi all, i have a search string q=water+treatment+plant  and i am using dismax
request handler where i have qs = 1 . in which way processing will be done
means with in how many words water or treatment or plant should occur to
come in result set.

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2567418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Free Webcast/Technical Case Study: How Bazaarvoice moved to Solr to implement Search Strategies for Social and eCommerce

2011-02-24 Thread Grant Ingersoll
I thought you might be interested in a technical webcast on
Solr/Lucene and e-commerce/social media that we are sponsoring,
featuring RC Johnson of Bazaarvoice. It's Wednesday, March 2, 2011 at
11:00am PST/2:00pm EST/19:00 GMT.

RC has been leading efforts at Bazaarvoice to build out their Solr
search applications moving beyond a more traditional RDBMS-centered
data strategy. If you've not heard of Bazaarovoice, they provide
user-generated content and ratings in a white-label service offering.
They use Solr to index and search millions online customer
conversations that deliver billions of monthly impressions for leading
companies in retail, manufacturing, financial services, health care,
travel and media.

Key topics this webcast will cover include:

Iterative expansion of search features and content collections
Migrating from simplistic database search to Solr-based search
Integrating statistical analytics into search at scale
Considering NoSQL for scalability and deployability of big data, to
make data easier to consume across applications

You can sign up here: http://www.eventsvc.com/lucidimagination/030211?trk=ap
and mark you calendars for Wednesday, March 2, 2011 at 11:00am
PST/2:00pm EST/19:00 GMT.

-Grant

Re: Special Circumstances for embedded Solr

2011-02-24 Thread Tarjei Huse
On 02/24/2011 12:16 PM, Devangini wrote:
 Can you please show me how an http implementation of solrj querying can be
 converted to one for embedded solr with the help of an example?
Hi, heres an example that almost compiles. You should be able to get
going with this.
T

class EmbeddedSolrExample {

public static void main (String[] args) {

setupSolrContainer();
addDocument();
   


SolrDocumentList getResults(QueryResponse response) {
if (response.getStatus() != 0) {
return new SolrDocumentList();
}
return response.getResults();
}
private void addDocument() throws IncompleteDocumentException,
SolrServerException, IOException {

SolrInputDocument res = new SolrInputDocument();
res.setField(body, test);
res.setField(id, 12);


UpdateResponse s = server.add(res);
assertEquals((int) s.getStatus(), 0 );
server.commit();

SolrDocumentList res = getResults(search(test));
System.out.println(I got  + res.size() +  documents);

}

private void setupSolrContainer() throws
ParserConfigurationException, IOException, SAXException,
IncompleteDocumentException, SolrServerException {
File home = new File(/tmp/solr);
File f = new File(home, solr.xml);
CoreContainer container = new CoreContainer();
container.load(/tmp/solr, f);

server = new EmbeddedSolrServer(container, model);
addDocument();
}

QueryResponse search(String words) throws SolrServerException {
SolrQuery query = new SolrQuery();
query.addField(id).addField(body).addField(score);
query.setTimeAllowed(1000);
query.setRows(50);
query.set(q, words);
query.setSortField(timestamp, ORDER.desc); // sorter på dato
return server.query(query);

}

-- 
Regards / Med vennlig hilsen
Tarjei Huse
Mobil: 920 63 413



Re: Question Solr Index main in RAM

2011-02-24 Thread Koji Sekiguchi

(11/02/24 21:38), Andrés Ospina wrote:


Hi,

My name is Felipe and i want to use the index main of solr in RAM memory.

How it's possible? I have solr 1.4

Thank you!

Felipe  


Welcome Felipe!

If I understand your question correctly, you can use RAMDirectoryFactory:

https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/RAMDirectoryFactory.html

But I believe it is available 3.1 (to be released soon...).

Koji
--
http://www.rondhuit.com/en/


Re: Question about Nested Span Near Query

2011-02-24 Thread Ahsan |qbal
Hi

To narrow down the issue I indexed a single document with one of the sample
queries (given below) which was giving issue.

*evaluation of loan and lease portfolios for purposes of assessing the
adequacy of *

Now when i Perform a search query (*TextContents:evaluation of loan and
lease portfolios for purposes of assessing the adequacy of*) the parsed
query is

*spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0,
true), Contents:purposes], 0, true), Contents:of], 0, true),
Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy],
0, true), Contents:of], 0, true)*

and search is not successful.

If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
end it works fine. Issue seems to come on relatively long phrases but I have
not been able to find a pattern and its really mind boggling coz I thought
this issue might be due to large position list but this is a single document
with one phrase. So its definitely not related to size of index.

Any ideas whats going on??

On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal ahsan.iqbal...@gmail.comwrote:

 Hi

 It didn't search.. (means no results found even results exist) one
 observation is that it works well even in the long phrases but when the long
 phrases contain stop words and same stop word exist two or more time in the
 phrase then, solr can't search with query parsed in this way.


 On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

 Hi,

 What do you mean by this doesn't work fine?  Does it not work correctly
 or is
 it slow or ...

 I was going to suggest you look at Surround QP, but it looks like you
 already
 did that.  Wouldn't it be better to get Surround QP to work?

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Ahsan |qbal ahsan.iqbal...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tue, February 22, 2011 10:59:26 AM
  Subject: Question about Nested Span Near Query
 
  Hi All
 
  I had a requirement to implement queries that involves phrase
  proximity.
  like user should be able to search ab cd w/5 de fg, both  phrases as
  whole should be with in 5 words of each other. For this I  implement a
 query
  parser that make use of nested span queries, so above query  would be
 parsed
  as
 
  spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
  spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)
 
  Queries like this seems to work really good when phrases are small  but
 when
  phrases are large this doesn't work fine. Now my question, Is there  any
  limitation of SpanNearQuery. that we cannot handle large phrases in
  this
  way?
 
  please help
 
  Regards
  Ahsan
 





Re: Question about Nested Span Near Query

2011-02-24 Thread Bill Bell
Send schema and document in XML format and I'll look at it

Bill Bell
Sent from mobile


On Feb 24, 2011, at 7:26 AM, Ahsan |qbal ahsan.iqbal...@gmail.com wrote:

 Hi
 
 To narrow down the issue I indexed a single document with one of the sample
 queries (given below) which was giving issue.
 
 *evaluation of loan and lease portfolios for purposes of assessing the
 adequacy of *
 
 Now when i Perform a search query (*TextContents:evaluation of loan and
 lease portfolios for purposes of assessing the adequacy of*) the parsed
 query is
 
 *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
 Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true),
 Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0,
 true), Contents:purposes], 0, true), Contents:of], 0, true),
 Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy],
 0, true), Contents:of], 0, true)*
 
 and search is not successful.
 
 If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from
 end it works fine. Issue seems to come on relatively long phrases but I have
 not been able to find a pattern and its really mind boggling coz I thought
 this issue might be due to large position list but this is a single document
 with one phrase. So its definitely not related to size of index.
 
 Any ideas whats going on??
 
 On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal ahsan.iqbal...@gmail.comwrote:
 
 Hi
 
 It didn't search.. (means no results found even results exist) one
 observation is that it works well even in the long phrases but when the long
 phrases contain stop words and same stop word exist two or more time in the
 phrase then, solr can't search with query parsed in this way.
 
 
 On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 
 Hi,
 
 What do you mean by this doesn't work fine?  Does it not work correctly
 or is
 it slow or ...
 
 I was going to suggest you look at Surround QP, but it looks like you
 already
 did that.  Wouldn't it be better to get Surround QP to work?
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
 From: Ahsan |qbal ahsan.iqbal...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, February 22, 2011 10:59:26 AM
 Subject: Question about Nested Span Near Query
 
 Hi All
 
 I had a requirement to implement queries that involves phrase
 proximity.
 like user should be able to search ab cd w/5 de fg, both  phrases as
 whole should be with in 5 words of each other. For this I  implement a
 query
 parser that make use of nested span queries, so above query  would be
 parsed
 as
 
 spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
 spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)
 
 Queries like this seems to work really good when phrases are small  but
 when
 phrases are large this doesn't work fine. Now my question, Is there  any
 limitation of SpanNearQuery. that we cannot handle large phrases in
 this
 way?
 
 please help
 
 Regards
 Ahsan
 
 
 
 


Re: Question Solr Index main in RAM

2011-02-24 Thread Bill Bell
How to use this?

Bill Bell
Sent from mobile


On Feb 24, 2011, at 7:19 AM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (11/02/24 21:38), Andrés Ospina wrote:
 
 Hi,
 
 My name is Felipe and i want to use the index main of solr in RAM memory.
 
 How it's possible? I have solr 1.4
 
 Thank you!
 
 Felipe   
 
 Welcome Felipe!
 
 If I understand your question correctly, you can use RAMDirectoryFactory:
 
 https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/RAMDirectoryFactory.html
 
 But I believe it is available 3.1 (to be released soon...).
 
 Koji
 -- 
 http://www.rondhuit.com/en/


Re: DataImportHandler in Solr 4.0

2011-02-24 Thread Mark
It seems this thread has been hijacked. My initial posting was in 
regards to my custom Evaluators always receiving a null context. Same 
Evaluators work in 1.4.1


On 2/23/11 5:47 PM, Alexandre Rocco wrote:

I got it working by building the DIH from the contrib folder and made a
change on the lib statements to map the folder that contains the .jar files.

Thanks!
Alexandre

On Wed, Feb 23, 2011 at 8:55 PM, Smiley, David W.dsmi...@mitre.org  wrote:


The DIH is no longer supplied embedded in the Solr war file.  You need to
get it on the classpath somehow. You could add anotherlib... statement to
solrconfig.xml to resolve this.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Feb 23, 2011, at 4:11 PM, Alexandre Rocco wrote:


Hi guys,

I'm having some issues when trying to use the DataImportHandler on Solr

4.0.

I've downloaded the latest nightly build of Solr 4.0 and configured

normally

(on the example folder) solrconfig.xml file like this:

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdata-config.xml/str
/lst
/requestHandler

At this point I noticed that the DIH jar was not being loaded correctly
causing exceptions like:
Error loading class

'org.apache.solr.handler.dataimport.DataImportHandler'

and
java.lang.ClassNotFoundException:
org.apache.solr.handler.dataimport.DataImportHandler

Do I need to build to get DIH running on Solr 4.0?

Thanks!
Alexandre











Order Facet on ranking score

2011-02-24 Thread Jenny Arduini

Hello everybody,
Is it possibile to order the facet results on some ranking score?

I was doing a query with or operator and sometimes the first facet 
have inside of them only result with small rank and not important.

This cause that users are led to other reasearch not important.

//

--
Jenny Arduini
I.T.T. S.r.l.
Strada degli Angariari, 25
47891 Falciano
Repubblica di San Marino
Tel 0549 941183
Fax 0549 974280
email: jardu...@ittweb.net
http://www.ittweb.net



facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline

Hi all,

I'm having a problem using distributed search in conjunction with the 
facet.offset parameter and lexical facet value sorting.  Is there an 
incompatibility between these?  I'm using Solr 1.41.


I have a facet with ~100k values in one index.  I'm wanting to page 
through them alphabetically.  When not using distributed search, 
everything works just fine, and very quick.  A query like this works, 
returning 10 facet values starting at the 50,001st:


http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5
# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which is 
the same index), I get no facet values returned.


http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr
# empty list :(

Doing a little more testing, I'm finding that with sharding I often get 
an empty list any time the facet.offset = facet.limit.  Also, by 
example, if I do facet.limit=100 and facet.offset=90, I get 10 facet 
values.  Doing so without sharding, I get the expected (by me, at least) 
100 values (starting at what would normally be the 91st).


Can anybody shed any light on this for me?

Thanks,
Peter


Re: Question about Nested Span Near Query

2011-02-24 Thread Ahsan |qbal
Hi

schema and document are attached.

On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell billnb...@gmail.com wrote:

 Send schema and document in XML format and I'll look at it

 Bill Bell
 Sent from mobile


 On Feb 24, 2011, at 7:26 AM, Ahsan |qbal ahsan.iqbal...@gmail.com
 wrote:

  Hi
 
  To narrow down the issue I indexed a single document with one of the
 sample
  queries (given below) which was giving issue.
 
  *evaluation of loan and lease portfolios for purposes of assessing the
  adequacy of *
 
  Now when i Perform a search query (*TextContents:evaluation of loan and
  lease portfolios for purposes of assessing the adequacy of*) the parsed
  query is
 
 
 *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation,
  Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0,
 true),
  Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for],
 0,
  true), Contents:purposes], 0, true), Contents:of], 0, true),
  Contents:assessing], 0, true), Contents:the], 0, true),
 Contents:adequacy],
  0, true), Contents:of], 0, true)*
 
  and search is not successful.
 
  If I remove '*evaluation*' from start OR *'assessing the adequacy of*'
 from
  end it works fine. Issue seems to come on relatively long phrases but I
 have
  not been able to find a pattern and its really mind boggling coz I
 thought
  this issue might be due to large position list but this is a single
 document
  with one phrase. So its definitely not related to size of index.
 
  Any ideas whats going on??
 
  On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal ahsan.iqbal...@gmail.com
 wrote:
 
  Hi
 
  It didn't search.. (means no results found even results exist) one
  observation is that it works well even in the long phrases but when the
 long
  phrases contain stop words and same stop word exist two or more time in
 the
  phrase then, solr can't search with query parsed in this way.
 
 
  On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic 
  otis_gospodne...@yahoo.com wrote:
 
  Hi,
 
  What do you mean by this doesn't work fine?  Does it not work
 correctly
  or is
  it slow or ...
 
  I was going to suggest you look at Surround QP, but it looks like you
  already
  did that.  Wouldn't it be better to get Surround QP to work?
 
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
 
 
 
  - Original Message 
  From: Ahsan |qbal ahsan.iqbal...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tue, February 22, 2011 10:59:26 AM
  Subject: Question about Nested Span Near Query
 
  Hi All
 
  I had a requirement to implement queries that involves phrase
  proximity.
  like user should be able to search ab cd w/5 de fg, both  phrases
 as
  whole should be with in 5 words of each other. For this I  implement a
  query
  parser that make use of nested span queries, so above query  would be
  parsed
  as
 
  spanNear([spanNear([Contents:ab, Contents:cd], 0,  true),
  spanNear([Contents:de, Contents:fg], 0, true)], 5,  false)
 
  Queries like this seems to work really good when phrases are small
  but
  when
  phrases are large this doesn't work fine. Now my question, Is there
  any
  limitation of SpanNearQuery. that we cannot handle large phrases in
  this
  way?
 
  please help
 
  Regards
  Ahsan
 
 
 
 

doc
  field name=DocID3369660/field 
  field name=Contentsevaluation of loan and lease portfolios for purposes of assessing the adequacy of/field 
/doc
?xml version=1.0 encoding=UTF-8 ?
schema name=example version=1.2
 types
fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/
fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/
fieldtype name=binary class=solr.BinaryField/
fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/
fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/
fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/
fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/
fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/
fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/
fieldType name=pint class=solr.IntField omitNorms=true/
fieldType name=plong 

Re: Filter Query

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 6:46 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
 Hi,

 I know Filter Query is really useful due to caching but I am confused about
 how it filter results.

 Lets say I have following criteria

 Text:: Abc def
 Date: 24th Feb, 2011

 Now abc def might be coming in almost every document but if SOLR first
 filters based on date it will have to do search only on few documents
 (instead of millions)

Yes, this is the way Solr works.  The filters are executed separately,
but the query is executed last with the filters (i.e. it will be
faster if the filter cuts down the number of documents).

-Yonik
http://lucidimagination.com


Re: Filter Query

2011-02-24 Thread Salman Akram
So you are agreeing that it does what I want? So in my example Abc def
would only be searched on 24th Feb 2010 documents?

When you say 'last with filters' does it mean first it filters out with
Filter Query and then applies Query on it?

On Thu, Feb 24, 2011 at 9:29 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Feb 24, 2011 at 6:46 AM, Salman Akram
 salman.ak...@northbaysolutions.net wrote:
  Hi,
 
  I know Filter Query is really useful due to caching but I am confused
 about
  how it filter results.
 
  Lets say I have following criteria
 
  Text:: Abc def
  Date: 24th Feb, 2011
 
  Now abc def might be coming in almost every document but if SOLR first
  filters based on date it will have to do search only on few documents
  (instead of millions)

 Yes, this is the way Solr works.  The filters are executed separately,
 but the query is executed last with the filters (i.e. it will be
 faster if the filter cuts down the number of documents).

 -Yonik
 http://lucidimagination.com




-- 
Regards,

Salman Akram


Re: Filter Query

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 11:56 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
 So you are agreeing that it does what I want? So in my example Abc def
 would only be searched on 24th Feb 2010 documents?

Pretty much, but not exactly.  It's close enough to what you want though.

The details are that the scorer and the filter are leapfrogged, but
always starting with the filter again after a match.
If you're interested in further details, look at the source code of
IndexSearcher for a filtered query.

This was added in 1.4:
http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performance-increases-for-solr-14/

-Yonik
http://lucidimagination.com


Re: Solr 4.0 DIH

2011-02-24 Thread Koji Sekiguchi

(11/02/22 6:58), Mark wrote:

I download Solr 4.0 from trunk today and I tried using a custom Evaluator 
during my
full/delta-importing.

Within the evaluate method though, the Context is always null? When using this 
same class with Solr
1.4.1 the context always exists. Is this a bug or is this behavior expected?

Thanks


public class MyEvaluator extends Evaluator {
@Override
public String evaluate(String argument, Context context) {
// Argument is present however context is always null!
}
}



I tried my test Evaluator on Solr 4.0 and it worked as expected, context is not 
null.
What I did on example-DIH is that:

1. add the following tag to db-data-config.xml:

function name=toLowerCase class=LowerCaseFunctionEvaluator/

2. use the above evaluator:

entity name=feature
query=select DESCRIPTION from FEATURE where 
ITEM_ID='${dih.functions.toLowerCase(item.ID)}'

3. do full-import

Mt test evaluator looks like this:

public class LowerCaseFunctionEvaluator extends Evaluator {
  public String evaluate(String expression, Context context) {
System.out.println( * exp =  + expression );
System.out.println( * context =  + context );
return null;
  }
}

and the context was not null.

Koji
--
http://www.rondhuit.com/en/


Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 10:57 AM, Peter Cline pcl...@pobox.upenn.edu wrote:
 Hi all,

 I'm having a problem using distributed search in conjunction with the
 facet.offset parameter and lexical facet value sorting.  Is there an
 incompatibility between these?  I'm using Solr 1.41.

 I have a facet with ~100k values in one index.  I'm wanting to page through
 them alphabetically.  When not using distributed search, everything works
 just fine, and very quick.  A query like this works, returning 10 facet
 values starting at the 50,001st:

 http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5
 # Butterflies - Indiana !

 However, if I enable distributed search, using a single shard (which is the
 same index), I get no facet values returned.

 http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr
 # empty list :(

 Doing a little more testing, I'm finding that with sharding I often get an
 empty list any time the facet.offset = facet.limit.  Also, by example, if I
 do facet.limit=100 and facet.offset=90, I get 10 facet values.  Doing so
 without sharding, I get the expected (by me, at least) 100 values (starting
 at what would normally be the 91st).

 Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed there?

-Yonik
http://lucidimagination.com


Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline

On 02/24/2011 12:37 PM, Yonik Seeley wrote:

On Thu, Feb 24, 2011 at 10:57 AM, Peter Clinepcl...@pobox.upenn.edu  wrote:

Hi all,

I'm having a problem using distributed search in conjunction with the
facet.offset parameter and lexical facet value sorting.  Is there an
incompatibility between these?  I'm using Solr 1.41.

I have a facet with ~100k values in one index.  I'm wanting to page through
them alphabetically.  When not using distributed search, everything works
just fine, and very quick.  A query like this works, returning 10 facet
values starting at the 50,001st:

http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5
# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which is the
same index), I get no facet values returned.

http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr
# empty list :(

Doing a little more testing, I'm finding that with sharding I often get an
empty list any time the facet.offset= facet.limit.  Also, by example, if I
do facet.limit=100 and facet.offset=90, I get 10 facet values.  Doing so
without sharding, I get the expected (by me, at least) 100 values (starting
at what would normally be the 91st).

Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed there?

-Yonik
http://lucidimagination.com


I haven't.  I'll try the current trunk and get back to you.

Thanks,
Peter


Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Peter Cline

On 02/24/2011 02:58 PM, Peter Cline wrote:

On 02/24/2011 12:37 PM, Yonik Seeley wrote:
On Thu, Feb 24, 2011 at 10:57 AM, Peter 
Clinepcl...@pobox.upenn.edu  wrote:

Hi all,

I'm having a problem using distributed search in conjunction with the
facet.offset parameter and lexical facet value sorting.  Is there an
incompatibility between these?  I'm using Solr 1.41.

I have a facet with ~100k values in one index.  I'm wanting to page 
through
them alphabetically.  When not using distributed search, everything 
works

just fine, and very quick.  A query like this works, returning 10 facet
values starting at the 50,001st:

http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5 


# Butterflies - Indiana !

However, if I enable distributed search, using a single shard (which 
is the

same index), I get no facet values returned.

http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr 


# empty list :(

Doing a little more testing, I'm finding that with sharding I often 
get an
empty list any time the facet.offset= facet.limit.  Also, by 
example, if I
do facet.limit=100 and facet.offset=90, I get 10 facet values.  
Doing so
without sharding, I get the expected (by me, at least) 100 values 
(starting

at what would normally be the 91st).

Can anybody shed any light on this for me?

Sounds like a bug.
Have you tried a 3x or trunk development build to see if it's fixed 
there?


-Yonik
http://lucidimagination.com


I haven't.  I'll try the current trunk and get back to you.

Thanks,
Peter


I tried today's builds for the 3.x branch and the trunk.  The problem 
persists in both.


Peter


dataimport

2011-02-24 Thread Brian Lamb
Hi all,

First of all, I'm quite new to solr.

I have the server set up and everything appears to work. I set it up so that
the indexed data comes through a mysql connection:

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
str name=configdb-data-config.xml/str
   /lst
/requestHandler

And here is the contents of db-data-config.xml:

dataConfig
   dataSource type=JdbcDataSource
  name=mystuff
  batchSize=-1
  driver=com.mysql.jdbc.Driver

url=jdbc:mysql://localhost/database?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull
  user=user
  password=password/
document
  entity name=id
dataSource=mystuff
query=SELECT p.id, p.fielda, p.fieldb, p.fieldc, p.fieldd FROM
mytable p
 /entity
   /document
/dataConfig

When I point my browser at localhost:8983/solr/dataimport, the server
produces the following message:

Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=10
Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Feb 24, 2011 8:58:24 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
commit{dir=/wwwroot/apps/apache-solr-1.4.1/example/solr/data/index,segFN=segments_p,version=1297781919778,generation=25,filenames=[_n.nrm,
_n.tis, _n.prx, segments_p, _n.fdt, _n.frq, _n.tii, _n.fdx, _n.fnm]
Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1297781919778
Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity id with URL:
jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull
Feb 24, 2011 8:58:25 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 137
Killed

So it looks like for whatever reason, the server crashes trying to do a full
import. When I add a LIMIT clause on the query, it works fine when the LIMIT
is only 250 records but if I try to do 500 records, I get the same message.

The fields types are:

SHOW CREATE TABLE mytable;
CREATE TABLE mytable (
   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `fielda` varchar(650) COLLATE utf8_unicode_ci DEFAULT NULL,
   `fieldb` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL,
   `fieldc` text COLLATE utf8_unicode_ci,
   `fieldd` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
   PRIMARY KEY (`id`)
);

How can I get Solr to do a full import without crashing? Doing it 250
records at a time is not going to be feasible because there are about 50
records.


Re: query slop issue

2011-02-24 Thread Jayendra Patil
qs is only the amount of slop on phrase queries explicitly specified
in the q for qf fields.
So only if the search q is water treatment plant, would the qs come
into picture.

Slop is the maximum allowable positional distance between terms to be
considered a match is called slop.
and distance is the number of positional moves of terms to reconstruct
the phrase in same order.

So with qs=1 you are allowed for only one positional move to recreate
the exact phrase.

You may also want to check the pf and the ps params for the dismax.

Regards,
Jayendra

On Thu, Feb 24, 2011 at 8:31 AM, Bagesh Sharma mail.bag...@gmail.com wrote:

 Hi all, i have a search string q=water+treatment+plant  and i am using dismax
 request handler where i have qs = 1 . in which way processing will be done
 means with in how many words water or treatment or plant should occur to
 come in result set.


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2567418.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem in full query searching

2011-02-24 Thread Jayendra Patil
With dismax or extended dismax parser you should be able to achieve this.

Dismax :- qf, qs, pf  ps should help you to have exact control on the
fields and boosts.
Extended Dismax :- In addition to qf, qs, pf  ps, you have pf2 and
pf3 for the two and three words shingles.

As Grijesh mentioned, use more weight for phrase or proximity matches

Regards,
Jayendra

On Thu, Feb 24, 2011 at 4:03 AM, Grijesh pintu.grij...@gmail.com wrote:

 Try to configue more waight on ps and pf parameters of dismax request
 handler to boost phrase matching documents.

 Or if you do not want to consider the term frequency then use
 omitTermFreqAndPositions=true in field definition

 -
 Thanx:
 Grijesh
 http://lucidimagination.com
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566230.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: facet.offset with facet.sort=lex and shards problem?

2011-02-24 Thread Yonik Seeley
On Thu, Feb 24, 2011 at 3:53 PM, Peter Cline pcl...@pobox.upenn.edu wrote:
 I tried today's builds for the 3.x branch and the trunk.  The problem
 persists in both.

Thanks Peter, I was now also able to duplicated the bug.  Could you
open a JIRA issue for this?

-Yonik
http://lucidimagination.com


DIH regex remove email + extract url

2011-02-24 Thread Rosa (Anuncios)

Hi,

I'm trying to remove all email address in my content field with 
following line:


field column=description xpath=/product/content 
regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z]{2,4} replaceWith= /


But it doesn't seem to remove emails? Is the syntax right?

Second thing:

I would like to extract domain name from url via a regex:

field column=source xpath=/product/url regex=http://(.*?)\\/(.*)  /

Example: url=http://www.abcd.com/product.php?id=324   -- i want to 
index source = abcd.com


What the syntax for this one?

Thanks for your help

Rosa


Re: Order Facet on ranking score

2011-02-24 Thread Markus Jelsma
No, Solr returns facets ordered alphabetically or count.

 Hello everybody,
 Is it possibile to order the facet results on some ranking score?
 
 I was doing a query with or operator and sometimes the first facet
 have inside of them only result with small rank and not important.
 This cause that users are led to other reasearch not important.
 
 //


Re: CUSTOM JSP FOR APACHE SOLR

2011-02-24 Thread Paul Libbrecht
Hello list,

as suggested below, I tried to implement a custom ResponseWriter that would 
evaluate a JSP but that seems impossible: the HttpServletRequest and the 
HttpServletResponse are not available anymore.

Have I missed something?
Should I rather do a RequestHandler?
Does anyone know an artificial way to run a JSP? (I rather not like it).

thanks in advance

paul


Le 2 févr. 2011 à 20:42, Tomás Fernández Löbbe a écrit :

 Hi Paul, I don't fully understand what you want to do. The way, I think,
 SolrJ is intended to be used is from a client application (outside Solr). If
 what you want is something like what's done with Velocity I think you could
 implement a response writer that renders the JSP and send it on the
 response.
 
 Tomás
 
 
 On Mon, Jan 31, 2011 at 6:25 PM, Paul Libbrecht p...@hoplahup.net wrote:
 
 Tomas,
 
 I also know velocity can be used and works well.
 I would be interested to a simpler way to have the objects of SOLR
 available in a jsp than write a custom jsp processor as a request handler;
 indeed, this seems to be the way solrj is expected to be used in the wiki
 page.
 
 Actually I migrated to velocity (which I like less than jsp) just because I
 did not find a response to this question.
 
 paul
 
 
 Le 31 janv. 2011 à 21:53, Tomás Fernández Löbbe a écrit :
 
 Hi John, you can use whatever you want for building your application,
 using
 Solr on the backend (JSP included). You should find all the information
 you
 need on Solr's wiki page:
 http://wiki.apache.org/solr/
 
 http://wiki.apache.org/solr/including some client libraries to easy
 integrate your application with Solr:
 http://wiki.apache.org/solr/IntegratingSolr
 
 http://wiki.apache.org/solr/IntegratingSolrfor fast prototyping you
 could
 use Velocity:
 http://wiki.apache.org/solr/VelocityResponseWriter
 
 http://wiki.apache.org/solr/VelocityResponseWriterAnyway, I recommend
 you
 to start with Solr's tutorial:
 http://lucene.apache.org/solr/tutorial.html
 
 
 Good luck,
 http://lucene.apache.org/solr/tutorial.htmlTomás
 
 2011/1/31 JOHN JAIRO GÓMEZ LAVERDE jjai...@hotmail.com
 
 
 
 SOLR LUCENE
 DEVELOPERS
 
 Hi i am new to solr and i like to make a custom search page for
 enterprise
 users
 in JSP that takes the results of Apache Solr.
 
 - Where i can find some useful examples for that topic ?
 - Is JSP the correct approach to solve mi requirement ?
 - If not what is the best solution to build a customize search page for
 my
 users?
 
 Thanks
 from South America
 
 JOHN JAIRO GOMEZ LAVERDE
 Bogotá - Colombia
 
 
 



query results filter

2011-02-24 Thread Babak Farhang
Hi everyone,

I have some existing solr cores that for one reason or another have
documents that I need to filter from the query results page.

I would like to do this inside Solr instead of doing it on the
receiving end, in the client.  After searching the mailing list
archives and Solr wiki, it appears you do this by registering a custom
SearchHandler / SearchComponent with Solr.  Still, I don't quite
understand how this machinery fits together.  Any suggestions / ideas
/ pointers much appreciated!

Cheers,
-Babak

~~

Ideally, I'd like to find / code a solution that does the following:

1. A request handler that works like the StandardRequestHandler but
which allows an optional DocFilter (say, modeled like the
java.io.FileFilter interface)
2. Allows current pagination to work transparently.
3. Works transparently with distributed/sharded queries.


RE: query results filter

2011-02-24 Thread Jonathan Rochkind
Hmm, depending on what you are actually needing to do, can you do it with a 
simple fq param to filter out what you want filtered out, instead of needing to 
write custom Java as you are suggesting? It would be a lot easier to just use 
an fq. 

How would you describe the documents you want to filter from the query results 
page?  Can that description be represented by a Solr query you can already 
represent using the lucene, dismax, or any other existing query? If so, why not 
just use a negated fq describing what to omit from the results?

From: Babak Farhang [farh...@gmail.com]
Sent: Thursday, February 24, 2011 6:58 PM
To: solr-user
Subject: query results filter

Hi everyone,

I have some existing solr cores that for one reason or another have
documents that I need to filter from the query results page.

I would like to do this inside Solr instead of doing it on the
receiving end, in the client.  After searching the mailing list
archives and Solr wiki, it appears you do this by registering a custom
SearchHandler / SearchComponent with Solr.  Still, I don't quite
understand how this machinery fits together.  Any suggestions / ideas
/ pointers much appreciated!

Cheers,
-Babak

~~

Ideally, I'd like to find / code a solution that does the following:

1. A request handler that works like the StandardRequestHandler but
which allows an optional DocFilter (say, modeled like the
java.io.FileFilter interface)
2. Allows current pagination to work transparently.
3. Works transparently with distributed/sharded queries.


Re: DataImportHandler in Solr 4.0

2011-02-24 Thread Chris Hostetter

: It seems this thread has been hijacked. My initial posting was in regards to
: my custom Evaluators always receiving a null context. Same Evaluators work in
: 1.4.1

I'm pretty sure you are talking about a completely different thread, with 
a completely differnet subject (Solr 4.0 DIH)



-Hoss


Re: DIH regex remove email + extract url

2011-02-24 Thread Koji Sekiguchi

Hi Rosa,


field column=description xpath=/product/content
regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z]{2,4} replaceWith= /


Shouldn't it be regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,4}?


field column=source xpath=/product/url regex=http://(.*?)\\/(.*) /

Example: url=http://www.abcd.com/product.php?id=324 -- i want to index source 
= abcd.com


Probably it could be regex=http:\/\/(.*?)\/(.*)

I use a regex web tool:

http://www.regexplanet.com/simple/index.html

Koji
--
http://www.rondhuit.com/en/


Re: Make syntax highlighter caseinsensitive

2011-02-24 Thread Koji Sekiguchi

(11/02/24 20:18), Tarjei Huse wrote:

Hi,

I got an index where I have two fields, body and caseInsensitiveBody.
Body is indexed and stored while caseInsensitiveBody is just indexed.

The idea is that by not storing the caseInsensitiveBody I save some
space and gain some performance. So I query against the
caseInsensitiveBody and generate highlighting from the case sensitive one.

The problem is that as a result, I am missing highlighting terms. For
example, when I search for solr and get a match in caseInsensitiveBody
for solr but that it is Solr in the original document, no highlighting
is done.

Is there a way around this? Currently I am using the following
highlighting params:
 'hl' =  'on',
 'hl.fl' =  'header,body',
 'hl.usePhraseHighlighter' =  'true',
 'hl.highlightMultiTerm' =  'true',
 'hl.fragsize' =  200,
 'hl.regex.pattern' =  '[-\w ,/\n\\']{20,200}',


Tarjei,

Maybe silly question, but why no you make body field case insensitive
and eliminate caseInsensitiveBody field, and then query and highlight on
just body field?

Koji
--
http://www.rondhuit.com/en/


Ramdirectory

2011-02-24 Thread Bill Bell
I could not figure out how to setup the ramdirectory option in solrconfig.XML. 
Does anyone have an example for 1.4?

Bill Bell
Sent from mobile



Re: Ramdirectory

2011-02-24 Thread Chris Hostetter

: I could not figure out how to setup the ramdirectory option in 
solrconfig.XML. Does anyone have an example for 1.4?

it wasn't an option in 1.4.

as Koji had already mentioned in the other thread where you chimed in
and asked about this, it was added in the 3x branch...

http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td2567166.html



-Hoss


boosting based on number of terms matched?

2011-02-24 Thread DarkNovaNick
I'm using the edismax handler, although my question is probably the same  
for dismax. When the user types a long query, I use the mm parameter so  
that only 75% of terms need to match. This works fine, however, sometimes  
documents that only match 75% of the terms show up higher in my results  
than documents that match 100%. I'd like to set a boost so that documents  
that match 100% will be much more likely to be put ahead of documents that  
only match 75%. Can anyone give me a pointer of how to do this? Thanks,


Nick


Re: Ramdirectory

2011-02-24 Thread Bill Bell
Thanks - yeah that is why I asked how to use it. But I still don't know
how to use it.

https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/
RAMDirectoryFactory.html


https://issues.apache.org/jira/browse/SOLR-465

directoryProvider class=org.apache.lucene.store.RAMDirectory
!-- Parameters as required by the implementation --
/directoryProvider


Is that right? Examples? Options?

Where do I put that in solrconfig.xml ? Do I put it in
mainIndex/directoryProvider ?

I know that SOLR-465 is more generic, but
https://issues.apache.org/jira/browse/SOLR-480 seems easier to use.



Thanks.


On 2/24/11 6:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote:


: I could not figure out how to setup the ramdirectory option in
solrconfig.XML. Does anyone have an example for 1.4?

it wasn't an option in 1.4.

as Koji had already mentioned in the other thread where you chimed in
and asked about this, it was added in the 3x branch...

http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671
66.html



-Hoss




Re: query results filter

2011-02-24 Thread Babak Farhang
In my case, I want to filter out duplicate docs so that returned
docs are unique w/ respect to a certain field (not the schema's unique
field, of course): a duplicate doc here is one that has same value
for a checksum field as one of the docs already in the results. It
would be great if I could somehow express that w/ a query, but I don't
think that would be possible.

On Thu, Feb 24, 2011 at 5:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Hmm, depending on what you are actually needing to do, can you do it with a 
 simple fq param to filter out what you want filtered out, instead of needing 
 to write custom Java as you are suggesting? It would be a lot easier to just 
 use an fq.

 How would you describe the documents you want to filter from the query 
 results page?  Can that description be represented by a Solr query you can 
 already represent using the lucene, dismax, or any other existing query? If 
 so, why not just use a negated fq describing what to omit from the results?
 
 From: Babak Farhang [farh...@gmail.com]
 Sent: Thursday, February 24, 2011 6:58 PM
 To: solr-user
 Subject: query results filter

 Hi everyone,

 I have some existing solr cores that for one reason or another have
 documents that I need to filter from the query results page.

 I would like to do this inside Solr instead of doing it on the
 receiving end, in the client.  After searching the mailing list
 archives and Solr wiki, it appears you do this by registering a custom
 SearchHandler / SearchComponent with Solr.  Still, I don't quite
 understand how this machinery fits together.  Any suggestions / ideas
 / pointers much appreciated!

 Cheers,
 -Babak

 ~~

 Ideally, I'd like to find / code a solution that does the following:

 1. A request handler that works like the StandardRequestHandler but
 which allows an optional DocFilter (say, modeled like the
 java.io.FileFilter interface)
 2. Allows current pagination to work transparently.
 3. Works transparently with distributed/sharded queries.



Re: query slop issue

2011-02-24 Thread Bagesh Sharma

Thanks  very good explanation.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2573185.html
Sent from the Solr - User mailing list archive at Nabble.com.