date:20081215


See also http://wiki.apache.org/solr/SolrResources


On Dec 15, 2008, at 2:57 AM, Andre Hagenbruch wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sajith Vimukthi schrieb:

Hi Sajith,

I need some sample code of some examples done using solr. I need to  
get an
idea on how I can use solr in my application. Please be kind enough  
to reply

me asap. It would be a grt help.


did you already have a look at the documentation for Solrj
(http://wiki.apache.org/solr/Solrj) or any of the other clients?
Overall, the wiki (http://wiki.apache.org/solr/) is a good place to  
get

started...

Hth,

Andre
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklGDf4ACgkQ3wuzs9k1icW/zgCeMSYFlHAwksHS2UZKZ9ZsaipX
NZcAn1Oibwe8aH9odu4Abc5DqbI1opI3
=HIl+
-END PGP SIGNATURE-


--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

RE: Solrj: Multivalued fields give Bad Request

2008-12-15 Thread Schilperoort , René

Sorry,

Forgot the most important detail.
The document I am adding contains multiple names fields:
sInputDocument.addField(names, value);
sInputDocument.addField(names, value);
sInputDocument.addField(names, value);

There is no problem when a document only contains one value in the names field.


-Original Message-
From: Schilperoort, René [mailto:rene.schilpero...@getronics.com] 
Sent: maandag 15 december 2008 16:52
To: solr-user@lucene.apache.org
Subject: Solrj: Multivalued fields give Bad Request

Hi all,

When adding documents to Solr using solr I receive the following Exception.
org.apache.solr.common.SolrException: Bad Request

The field is configured as followed:
field name=names type=string indexed=true stored=true 
multiValued=true/

Any suggestions?

Regards, Rene

Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread Shalin Shekhar Mangar

Which solr version are you using?

On Mon, Dec 15, 2008 at 6:04 PM, jokkmokk jokkm...@gmx.at wrote:


 HI,

 I'm desperately trying to get the dataimport handler to work, however it
 seems that it just ignores the field name mapping.
 I have the fields body and subject in the database and those are called
 title and content in the solr schema, so I use the following import
 config:

 dataConfig

 dataSource
type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/mydb
user=root
password=/


 document
entity name=phorum_messages query=select * from phorum_messages
field column=body name=content/
field column=subject name=title/
/entity
 /document

 /dataConfig

 however I always get the following exception:

 org.apache.solr.common.SolrException: ERROR:unknown field 'body'
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274)
at

 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at

 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279)
at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326)
at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367)


 but according to the documentation it should add a document with title
 and
 content not body and subject?!

 I'd appreciate any help as I can't see anything wrong with my
 configuration...

 TIA,

 Stefan
 --
 View this message in context:
 http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.

Re: using BoostingTermQuery

In the solrconfig.xml (scroll all the way to the bottom, and I believe  
the example has some commented out)


On Dec 15, 2008, at 5:45 AM, ayyanar wrote:




I'm no QueryParser expert, but I would probably start w/ the default
query parser in Solr (LuceneQParser), and then progress a bit to the
DisMax one.  I'd ask specific questions based on what you see there.
If you get far enough along, you may consider asking for help on the
java-user list as well.



Thanks - I think I've got it working now.  I ended up subclassing
QueryParser and overriding newTermQuery() to create a  
BoostingTermQuery

instead of a plain ol' TermQuery.  Seems to work.

Kindly let me know where and how to configure the overridden query  
parser in

solr


-Ayyanar
--
View this message in context: 
http://www.nabble.com/using-BoostingTermQuery-tp19637123p21011626.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores

2008-12-15 Thread Kay Kay


Hi -
 I am looking at the  article here with a brief introduction to SolrJ .
http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17SolrS_Tact=105AGX59S_CMP=GRsitejw17#solrj 
.


 In case we have multiple SolrCores in the server application - (since 
1.3) - how do I specify as part of SolrQuery as to which core needs to 
be used for the given query. I am trying to dig out the information from 
the code. Meanwhile, if someone is aware of the same - please suggest 
some pointers.

Re: Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores

2008-12-15 Thread Yonik Seeley

A solr core is like a separate solr server... so create a new
CommonsHttpSolrServer that points at the core.
You probably want to create and reuse a single HttpClient instance for
the best efficiency.

-Yonik

On Mon, Dec 15, 2008 at 11:06 AM, Kay Kay kaykay.uni...@gmail.com wrote:
 Hi -
  I am looking at the  article here with a brief introduction to SolrJ .
 http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17SolrS_Tact=105AGX59S_CMP=GRsitejw17#solrj
 .

  In case we have multiple SolrCores in the server application - (since 1.3)
 - how do I specify as part of SolrQuery as to which core needs to be used
 for the given query. I am trying to dig out the information from the code.
 Meanwhile, if someone is aware of the same - please suggest some pointers.

Re: Solrj: Multivalued fields give Bad Request

2008-12-15 Thread Ryan McKinley


What do you see in the admin schema browser?
/admin/schema.jsp

When you select the field names, do you see the property  
Multivalued?


ryan


On Dec 15, 2008, at 10:55 AM, Schilperoort, René wrote:


Sorry,

Forgot the most important detail.
The document I am adding contains multiple names fields:
sInputDocument.addField(names, value);
sInputDocument.addField(names, value);
sInputDocument.addField(names, value);

There is no problem when a document only contains one value in the  
names field.



-Original Message-
From: Schilperoort, René [mailto:rene.schilpero...@getronics.com]
Sent: maandag 15 december 2008 16:52
To: solr-user@lucene.apache.org
Subject: Solrj: Multivalued fields give Bad Request

Hi all,

When adding documents to Solr using solr I receive the following  
Exception.

org.apache.solr.common.SolrException: Bad Request

The field is configured as followed:
field name=names type=string indexed=true stored=true  
multiValued=true/


Any suggestions?

Regards, Rene

CustomQueryParser

2008-12-15 Thread ayyanar


I found the following solution in the forum to use BoostingTermQuery in solr:

I ended up subclassing QueryParser and overriding newTermQuery() to create
a BoostingTermQuery instead of a plain ol' TermQuery.  Seems to work. 

http://www.nabble.com/RE:-using-BoostingTermQuery-p19651792.html

I have some questions on this:

1) Anyone tried this? Is it working
2) Where to specify the query parser subclass name? SolrConfig.xml? What is
the xml tag name for this?
3) Should we use Qparser? I think we can directly subclass the QueryParser
and do that. Am I right?
4) Kindly post the code sample to override the newTermQuery() to create a
BoostingTermQuery 

Thanks in advance
Ayyanar


-- 
View this message in context: 
http://www.nabble.com/CustomQueryParser-tp21012136p21012136.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sample code


http://lucene.apache.org/solr/tutorial.html

On Dec 15, 2008, at 12:56 AM, Sajith Vimukthi wrote:


Hi all,

Can someone of you give me a sample code on a search function done  
with solr

so that I can get an idea on how I can use it.



Regards,

Sajith Vimukthi Weerakoon

Associate Software Engineer | ZONE24X7

| Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 |

http://www.zone24x7.com

dataimport handler with mysql: wrong field mapping

2008-12-15 Thread jokkmokk


HI,

I'm desperately trying to get the dataimport handler to work, however it
seems that it just ignores the field name mapping.
I have the fields body and subject in the database and those are called
title and content in the solr schema, so I use the following import
config:

dataConfig

dataSource
type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/mydb
user=root
password=/


document
entity name=phorum_messages query=select * from phorum_messages
field column=body name=content/
field column=subject name=title/
/entity
/document

/dataConfig

however I always get the following exception:

org.apache.solr.common.SolrException: ERROR:unknown field 'body'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367)


but according to the documentation it should add a document with title and
content not body and subject?!

I'd appreciate any help as I can't see anything wrong with my
configuration...

TIA,

Stefan
-- 
View this message in context: 
http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtractingRequestHandler and XmlUpdateHandler



On Dec 15, 2008, at 8:20 AM, Jacob Singh wrote:


Hi Erik,

Sorry I wasn't totally clear.  Some responses inline:
If the file is visible from the Solr server, there is no need to  
actually
send the bits through HTTP.  Solr's content steam capabilities  
allow a file

to be retrieved from Solr itself.



Yeah, I know.  But in my case not possible.   Perhaps a simple file
receiving HTTP POST handler which simply stored the file on disk and
returned a path to it is the way to go here.

So I could send the file, and receive back a token which I would  
then
throw into one of my fields as a reference.  Then using it to map  
tika

fields as well. like:

str name=file_mod_date${FILETOKEN}.last_modified/str

str name=file_body${FILETOKEN}.content/str


Huh?   I'm don't follow the file token thing.  Perhaps you're  
thinking
you'll post the file, then later update other fields on that same  
document.

An important point here is that Solr currently does not have document
update capabilities.  A document can be fully replaced, but cannot  
have
fields added to it, once indexed.  It needs to be handled all in  
one shot to

accomplish the blending of file/field indexing.  Note the
ExtractingRequestHandler already has the field mapping capability.



Sorta... I was more thinking of a new feature wherein a Solr Request
handler doesn't actually put the file in the index, merely runs it
through tika and stores a datastore which links a token with the
tika extraction.  Then the client could make another request w/ the
XMLUpdateHandler which referenced parts of the stored tika extraction.



Hmmm, thinking out loud

Override SolrContentHandler.  It is responsible for mapping the Tika  
output to a Solr Document.

Capture all the content into a single buffer.
Add said buffer to a field that is stored only
Add a second field that is indexed.  This is your token.  You could,  
just as well, have that token be the only thing that gets returned by  
extract only.


Alternately, you could implement an UpdateProcessor thingamajob that  
takes the output and stores it to the filesystem and just adds the  
token to a document.






But, here's a solution that will work for you right now... let Tika  
extract

the content and return back to you, then turn around and post it and
whatever other fields you like:

http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput

In that example, the contents aren't being indexed, just returned  
back to
the client.  And you can leverage the content stream capability  
with this as

well avoiding posting the actual binary file, pointing the extracting
request to a file path visible by Solr.



Yeah, I saw that.  This is pretty much what I was talking about above,
the only disadvantage (which is a deal breaker in our case) is the
extra bandwidth to move the file back and forth.

Thanks for your help and quick response.

I think we'll integrate the POST fields as Grant has kindly provided
multi-value input now, and see what happens in the future.  I realize
what I'm talking about (XML and binary together) is probably not a
high priority feature.



Is the use case this:

1. You want to assign metadata and also store the original and have it  
stored in binary format, too?  Thus, Solr becomes a backing,  
searchable store?


I think we could possibly add an option to serialize the ContentStream  
onto a Field on the Document.  In other words, store the original with  
the Document.  Of course, buyer beware on the cost of doing so.

Re: Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores

2008-12-15 Thread Kay Kay


Thanks Yonik for the clarification.

Yonik Seeley wrote:

A solr core is like a separate solr server... so create a new
CommonsHttpSolrServer that points at the core.
You probably want to create and reuse a single HttpClient instance for
the best efficiency.

-Yonik

On Mon, Dec 15, 2008 at 11:06 AM, Kay Kay kaykay.uni...@gmail.com wrote:
  

Hi -
 I am looking at the  article here with a brief introduction to SolrJ .
http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17SolrS_Tact=105AGX59S_CMP=GRsitejw17#solrj
.

 In case we have multiple SolrCores in the server application - (since 1.3)
- how do I specify as part of SolrQuery as to which core needs to be used
for the given query. I am trying to dig out the information from the code.
Meanwhile, if someone is aware of the same - please suggest some pointers.

Please help me articulate this query

2008-12-15 Thread Derek Springer

Hey all,
I'm having trouble articulating a query and I'm hopeful someone out there
can help me out :)

My situation is this: I am indexing a series of questions that can either be
asked from a main question entry page, or a specific subject page. I have a
field called referring which indexes the title of the specific subject
page, plus the regular question whenever that document is submitted from a
specific specific subject page. Otherwise, every document is indexed with
just the question.

Specifically, what I am trying to do is when I am on the page specific
subject page (e.g. Tom Cruise) I want to search for all of the questions
asked from that page, plus any question asked about Tom Cruise. Something
like:
q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise)

Have you ever used a Tom Tom? - Not returned
Where is the best place to take a cruise? - Not returned
When did he have is first kid? - Returned iff question was asked from Tom
Cruise page
Do you think that Tom Cruise will make more movies? - Always returned

Any thoughts?

-Derek

Re: Please help me articulate this query

2008-12-15 Thread Stephen Weiss

I think in this case you would want to index each question with the  
possible referrers ( by title might be too imprecise, I'd go with  
filename or ID) and then do a search like this (assuming in this case  
it's by filename)


q=(referring:TomCruise.html) OR (question: Tom AND Cruise)

Which seems to be what you're thinking.

I would make the referrer a type string though so that you don't  
accidentally pull in documents from a different subject (Tom Cruise  
this would work ok, but imagine you need to distinguish between George  
Washington and George Washington Carver).


--
Steve


On Dec 15, 2008, at 2:59 PM, Derek Springer wrote:


Hey all,
I'm having trouble articulating a query and I'm hopeful someone out  
there

can help me out :)

My situation is this: I am indexing a series of questions that can  
either be
asked from a main question entry page, or a specific subject page. I  
have a
field called referring which indexes the title of the specific  
subject
page, plus the regular question whenever that document is submitted  
from a
specific specific subject page. Otherwise, every document is indexed  
with

just the question.

Specifically, what I am trying to do is when I am on the page specific
subject page (e.g. Tom Cruise) I want to search for all of the  
questions
asked from that page, plus any question asked about Tom Cruise.  
Something

like:
q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise)

Have you ever used a Tom Tom? - Not returned
Where is the best place to take a cruise? - Not returned
When did he have is first kid? - Returned iff question was asked  
from Tom

Cruise page
Do you think that Tom Cruise will make more movies? - Always  
returned


Any thoughts?

-Derek

Re: Please help me articulate this query

2008-12-15 Thread Derek Springer

Thanks for the tip, I appreciate it!

However, does anyone know how to articulate the syntax of (This AND That)
OR (Something AND Else) into a query string?

i.e. q=referring:### AND question:###

On Mon, Dec 15, 2008 at 12:32 PM, Stephen Weiss swe...@stylesight.comwrote:

 I think in this case you would want to index each question with the
 possible referrers ( by title might be too imprecise, I'd go with filename
 or ID) and then do a search like this (assuming in this case it's by
 filename)

 q=(referring:TomCruise.html) OR (question: Tom AND Cruise)

 Which seems to be what you're thinking.

 I would make the referrer a type string though so that you don't
 accidentally pull in documents from a different subject (Tom Cruise this
 would work ok, but imagine you need to distinguish between George Washington
 and George Washington Carver).

 --
 Steve



 On Dec 15, 2008, at 2:59 PM, Derek Springer wrote:

  Hey all,
 I'm having trouble articulating a query and I'm hopeful someone out there
 can help me out :)

 My situation is this: I am indexing a series of questions that can either
 be
 asked from a main question entry page, or a specific subject page. I have
 a
 field called referring which indexes the title of the specific subject
 page, plus the regular question whenever that document is submitted from a
 specific specific subject page. Otherwise, every document is indexed with
 just the question.

 Specifically, what I am trying to do is when I am on the page specific
 subject page (e.g. Tom Cruise) I want to search for all of the questions
 asked from that page, plus any question asked about Tom Cruise. Something
 like:
 q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise)

 Have you ever used a Tom Tom? - Not returned
 Where is the best place to take a cruise? - Not returned
 When did he have is first kid? - Returned iff question was asked from
 Tom
 Cruise page
 Do you think that Tom Cruise will make more movies? - Always returned

 Any thoughts?

 -Derek

Re: Multi tokenizer


: I need to tokenize my field on whitespaces, html, punctuation, apostrophe

: but if I use HTMLStripStandardTokenizerFactory it strips only html 
: but no apostrophes

 you might consider using one of the HTML Tokenizers, and then use a 
 PatternReplaceFilterFilter ... or if you know java write a 
 simple Tokenizer that uses the HTMLStripReader.
 
  in the long run, changing the HTMLStripReader to be useble as a 
  CharFilter so it can work with any Tokenizer is probably the way we'll 
 go -- but i don't think anyone has started working on a patch for that.

thanks... I used HTMLStripStandardTokenizerFactory and then a 
PatternReplaceFilterFilter

now it works

TextField size limit

Hi all,

i have a TextField containing over 400k of text

when i try to search a word solr doesn't return any result but if I search 
for a single document, I can see that the word exists there

So I suppose that solr has a textfield size limit (the field is indexed 
using a tokenizer and some filters)

Could anyone help me to undestand the problem? and if is it possible to solve?

Thanks in advance,
  Antonio

Re: TextField size limit

2008-12-15 Thread Erik Hatcher


Check your solrconfig.xml:

 maxFieldLength1/maxFieldLength

That's probably the truncating factor.  That's the maximum number of  
terms, not bytes or characters.


Erik


On Dec 15, 2008, at 5:00 PM, Antonio Zippo wrote:


Hi all,

i have a TextField containing over 400k of text

when i try to search a word solr doesn't return any result but  
if I search for a single document, I can see that the word exists  
there


So I suppose that solr has a textfield size limit (the field is  
indexed using a tokenizer and some filters)


Could anyone help me to undestand the problem? and if is it possible  
to solve?


Thanks in advance,
 Antonio

Slow Response time after optimize

2008-12-15 Thread Sammy Yu

Hi guys,
   I have a typical master/slave setup running with Solr 1.3.0.  I did
some basic scalability test with JMeter and tweaked our environment
and determined that we can handle approximately 26 simultaneous
threads and get end-to-end response times of under 200ms even with
typically every 5 minute distribution.   However, as soon as I issue a
single optimize on the master, the response time goes up to over 500ms
and does not seem to recover.   As soon as I restarted the response
time is back down to 200ms.  My index is approximately 5 GB in size
and the queries are just basic constructed disjunction queries such as
title:iphone OR bodytext:iphone.  Has anybody seen this issue before?

Thanks,
Sammy

Re: Standard request with functional query

2008-12-15 Thread Sammy Yu

Hey guys,
Thanks for the response, but how would make recency a factor on
scoring documents with the standard request handler.
The query (title:iphone OR bodytext:iphone OR title:firmware OR
bodytext:firmware) AND _val_:ord(dateCreated)^0.1
seems to do something very similar to just sorting by dateCreated
rather than having dateCreated being a part of the score.

Thanks,
Sammy

n Thu, Dec 4, 2008 at 1:35 PM, Sammy Yu temi...@gmail.com wrote:
 Hi guys,
I have a standard query that searches across multiple text fields such as
 q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware

 This comes back with documents that have iphone and firmware (I know I
 can use dismax handler but it seems to be really slow), which is
 great.  Now I want to give some more weight to more recent documents
 (there is a dateCreated field in each document).

 So I've modified the query as such:
 (title:iphone OR bodytext:iphone OR title:firmware OR
 bodytext:firmware) AND _val_:ord(dateCreated)^0.1
 URLencoded to 
 q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3Aord(dateCreated)^0.1

 However, the results are not as one would expects.  The first few
 documents only come back with the word iphone and appears to be sorted
 by date created.  It seems to completely ignore the score and use the
 dateCreated field for the score.

 On a not directly related issue it seems like if you put the weight
 within the double quotes:
 (title:iphone OR bodytext:iphone OR title:firmware OR
 bodytext:firmware) AND _val_:ord(dateCreated)^0.1

 the parser complains:
 org.apache.lucene.queryParser.ParseException: Cannot parse
 '(title:iphone OR bodytext:iphone OR title:firmware OR
 bodytext:firmware) AND _val_:ord(dateCreated)^0.1': Expected ',' at
 position 16 in 'ord(dateCreated)^0.1'

 Thanks,
 Sammy

Re: TextField size limit

 
 Check your solrconfig.xml:
 
  maxFieldLength1/maxFieldLength
 
 That's probably the truncating factor.  That's the maximum number of terms, 
 not bytes or characters.
 
 Erik
 


Thanks... I think it could be the problem.
i tried to count whitespace in a single text and it's over 55.000 ... but solr 
truncates to 10.000

do you know if I can change the value to 100.000 without recreate the index? 
(when I modify schema.xml I need to create the index again but with 
solrconfig.xml?)

Thanks,
  Antonio

Re: TextField size limit

2008-12-15 Thread Yonik Seeley

On Mon, Dec 15, 2008 at 5:28 PM, Antonio Zippo reven...@yahoo.it wrote:

 Check your solrconfig.xml:

  maxFieldLength1/maxFieldLength

 That's probably the truncating factor.  That's the maximum number of terms, 
 not bytes or characters.

 Thanks... I think it could be the problem.
 i tried to count whitespace in a single text and it's over 55.000 ... but 
 solr truncates to 10.000

 do you know if I can change the value to 100.000 without recreate the index? 
 (when I modify schema.xml I need to create the index again but with 
 solrconfig.xml?)

No need to re-index with this change.
But you will have to re-index any documents that got cut off of course.

-Yonik

Re: TextField size limit


 No need to re-index with this change.
 But you will have to re-index any documents that got cut off of course.
 
 -Yonik
 

Ok, thanks...
I hoped to reindex the documents over the existent index (with incremental 
update...while solr is running) ...and without delete the index folder

But the important is to solve the problem ;-)

Thanks...
  Antonio

Some solrconfig.xml attributes being ignored

Hello,

In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When
I perform a search, it returns only a single snippet for each highlighted
field. However, when I set the hl.snippets field manually as a search
parameter, I get up to 3 highlighted snippets. This is the configuration
that I am using to set the highlighted parameters:

fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
default=true
lst name=defaults
  str name=hl.snippets3/str
  str name=hl.fragsize100/str
  str name=hl.regex.slop0.5/str
  str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str
/lst
/fragmenter

I tried setting hl.fragmenter=regex as a parameter as well, to be sure that
it was using the correct one, and the result set is the same. Any ideas what
could be causing this attribute not to be read? It has me concerned that
other attributes are being ignored as well.

Thanks,

Mark Ferguson

Re: Using Regex fragmenter to extract paragraphs

You actually don't need to escape most characters inside a character class,
the escaping of the period was unnecessary.

I've tried using the example regex ([-\w ,/\n\']{20,200}), and I'm _still_
getting lots of highlighted snippets that don't match the regex (starting
with a period, etc.) Has anyone else has any trouble with the default regex
fragmenter? If someone has used it and gotten the expected results, can you
let me know, so I know that the problem is on my end?

Thanks for your help,

Mark


On Sun, Dec 14, 2008 at 8:34 AM, Erick Erickson erickerick...@gmail.comwrote:

 Shouldn't you escape the question mark at the end too?

 On Fri, Dec 12, 2008 at 6:22 PM, Mark Ferguson mark.a.fergu...@gmail.com
 wrote:

  Someone helped me with the regex and pointed out a couple mistakes, most
  notably the extra quantifier in .*{400,600}. My new regex is this:
 
  \w.{400,600}[\.!?]
 
  Unfortunately, my results still aren't any better. Some results start
 with
  a
  word character, some don't, and none seem to end with punctuation. Any
  ideas
  would else could be wrong?
 
  Mark
 
 
 
  On Fri, Dec 12, 2008 at 2:37 PM, Mark Ferguson 
 mark.a.fergu...@gmail.com
  wrote:
 
   Hello,
  
   I am trying to use the regex fragmenter and am having a hard time
 getting
   the results I want. I am trying to get fragments that start on a word
   character and end on punctuation, but for some reason the fragments
 being
   returned to me seem to be very inflexible, despite that I've provided a
   large slop. Here are the relevant parameters I'm using, maybe someone
 can
   help point out where I've gone wrong:
  
   str name=hl.fragsize500/str
   str name=hl.fragmenterregex/str
   str name=hl.regex.slop0.8/str
   str name=hl.regex.pattern[\w].*{400,600}[.!?]/str
   str name=hltrue/str
   str name=qchinese/str
  
   This should be matching between 400-600 characters, beginning with a
 word
   character and ending with one of .!?. Here is an example of a typical
   result:
  
   . Check these pictures out. Nine panda cubs on display for the first
 time
   Thursday in southwest China. They're less than a year old. They just
   recently stopped nursing. There are only 1,600 of these guys left in
 the
   mountain forests of central China, another 120 in span
   class='hl'Chinese/span breeding facilities and zoos. And they're
 about
  20
   that live outside China in zoos. They exist almost entirely on bamboo.
  They
   can live to be 30 years old. And these little guys will eventually get
  much
   bigger. They'll grow
  
   As you can see, it is starting with a period and ending on a word
   character! It's almost as if the fragments are just coming out as they
  will
   and the regex isn't doing anything at all, but the results are
 different
   when I use the gap fragmenter. In the above result I don't see any
 reason
   why it shouldn't have stripped out the preceding period and the last
 two
   words, there is plenty of room in the slop and in the regex pattern.
  Please
   help me figure out what I'm doing wrong...
  
   Thanks a lot,
  
   Mark Ferguson

Re: Some solrconfig.xml attributes being ignored

2008-12-15 Thread Yonik Seeley

Try adding echoParams=all to your query to verify the params that the
solr request handler is getting.

-Yonik

On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson
mark.a.fergu...@gmail.com wrote:
 Hello,

 In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When
 I perform a search, it returns only a single snippet for each highlighted
 field. However, when I set the hl.snippets field manually as a search
 parameter, I get up to 3 highlighted snippets. This is the configuration
 that I am using to set the highlighted parameters:

 fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
 default=true
lst name=defaults
  str name=hl.snippets3/str
  str name=hl.fragsize100/str
  str name=hl.regex.slop0.5/str
  str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str
/lst
 /fragmenter

 I tried setting hl.fragmenter=regex as a parameter as well, to be sure that
 it was using the correct one, and the result set is the same. Any ideas what
 could be causing this attribute not to be read? It has me concerned that
 other attributes are being ignored as well.

 Thanks,

 Mark Ferguson

Re: Some solrconfig.xml attributes being ignored

Thanks for this tip, it's very helpful. Indeed, it looks like none of the
highlighting parameters are being included. It's using the correct request
handler and hl is set to true, but none of the highlighting parameters from
solrconfig.xml are in the parameter list.

Here is my query:

http://localhost:8080/solr1/select?rows=50hl=truefl=url,urlmd5,page_title,scoreechoParams=allq=java

Here are the settings for the request handler and the highlighter:

requestHandler name=dismax class=solr.SearchHandler default=true
  lst name=defaults
   str name=defTypedismax/str
   float name=tie0.01/float
   str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str
   str name=q.alt*:*/str
   str name=hl.flbody_text page_title meta_desc/str
   str name=f.page_title.hl.fragsize0/str
   str name=f.meta_desc.hl.fragsize0/str
   str name=hl.fragmenterregex/str
  /lst
/requestHandler

highlighting
  fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
default=true
lst name=defaults
  str name=hl.snippets3/str
  str name=hl.fragsize100/str
  str name=hl.regex.slop0.5/str
  str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str
/lst
  /fragmenter
/highlighting

And here is the param list returned to me:

lst name=params
str name=echoParamsall/str
str name=tie0.01/str
str name=hl.fragmenterregex/str
str name=f.page_title.hl.fragsize0/str
str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str
str name=f.meta_desc.hl.fragsize0/str
str name=q.alt*:*/str
str name=hl.flpage_title,body_text/str
str name=defTypedismax/str
str name=echoParamsall/str
str name=flurl,urlmd5,page_title,score/str
str name=qjava/str
str name=hltrue/str
str name=rows50/str
/lst

So it seems like everything is working except for the highlighter. I should
mention that when I enter a bogus fragmenter as a parameter (e.g.
hl.fragmenter=bogus), it returns a 400 error that the fragmenter cannot be
found, so the config file _is_ finding the regex fragmenter. It just doesn't
seem to actually be including its parameters... Any ideas are appreciated,
thanks again for the help.

Mark


On Mon, Dec 15, 2008 at 4:23 PM, Yonik Seeley ysee...@gmail.com wrote:

 Try adding echoParams=all to your query to verify the params that the
 solr request handler is getting.

 -Yonik

 On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson
 mark.a.fergu...@gmail.com wrote:
  Hello,
 
  In my solrconfig.xml file I am setting the attribute hl.snippets to 3.
 When
  I perform a search, it returns only a single snippet for each highlighted
  field. However, when I set the hl.snippets field manually as a search
  parameter, I get up to 3 highlighted snippets. This is the configuration
  that I am using to set the highlighted parameters:
 
  fragmenter name=regex
 class=org.apache.solr.highlight.RegexFragmenter
  default=true
 lst name=defaults
   str name=hl.snippets3/str
   str name=hl.fragsize100/str
   str name=hl.regex.slop0.5/str
   str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str
 /lst
  /fragmenter
 
  I tried setting hl.fragmenter=regex as a parameter as well, to be sure
 that
  it was using the correct one, and the result set is the same. Any ideas
 what
  could be causing this attribute not to be read? It has me concerned that
  other attributes are being ignored as well.
 
  Thanks,
 
  Mark Ferguson

Re: SolrConfig.xml Replication

2008-12-15 Thread Jeff Newburn

It does appear to be working for us now.  The files replicated out
appropriately which is a huge help.  Thanks to all!


-Jeff


On 12/13/08 9:42 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote:

 Jeff, SOLR-821 has a patch now. It'd be nice to get some feedback if
 you
manage to try it out.

On Thu, Dec 11, 2008 at 8:33 PM, Jeff Newburn
 jnewb...@zappos.com wrote:

 Thank you for the quick response.  I will keep
 an eye on that to see how it
 progresses.


 On 12/10/08 8:03 PM, Noble
 Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com
 wrote:

  This is a known
 issue and I was planning to take it up soon.
 
 https://issues.apache.org/jira/browse/SOLR-821
 
 
  On Thu, Dec 11,
 2008 at 5:30 AM, Jeff Newburn jnewb...@zappos.com
 wrote:
  I am curious
 as to whether there is a solution to be able to replicate
  solrconfig.xml
 with the 1.4 replication.  The obvious problem is that
 the
  master would
 replicate the solrconfig turning all slaves into masters
 with
  its
 config.  I have also tried on a whim to configure the master and
 slave
 
 on the master so that the slave points to the same server but that seems

 to
  break the replication completely.  Please let me know if anybody has
 any
  ideas
 
  -Jeff
 
 
 




--
Regards,
Shalin Shekhar
 Mangar.

Re: Dismax Minimum Match/Stopwords Bug

2008-12-15 Thread Matthew Runo

Would this mean that, for example, if we wanted to search productId
(long) we'd need to make a field type that had stopwords in it rather
than simply using (long)?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Dec 12, 2008, at 11:56 PM, Chris Hostetter wrote:

: I have discovered some weirdness with our Minimum Match
functionality.
: Essentially it comes up with absolutely no results on certain
queries.
: Basically, searches with 2 words and 1 being ³the² don¹t have a
return
: result. From what we can gather the minimum match criteria is
making it
: such that if there are 2 words then both are required.
Unfortunately, the

you haven't mentioned what qf you're using, and you only listed one
field
type, which includes stopwords -- but i suspect your qf contains at
least

one field that *doesn't* remove stopwords.

this is in fact an unfortunate aspect of the way dismax works --
each chunk of text recognized by the querypaser is passed to each
analyzer for each field. Any chunk that produces a query for a field
becomes a DisjunctionMaxQuery, and is included in the mm count --
even
if that chunk is a stopword in every other field (and produces no
query)

so you have to either be consistent with your stopwords across all
fields,
or make your mm really small. searching for dismax stopwords
turns this

up...

http://www.nabble.com/Re%3A-DisMax-request-handler-doesn%27t-work-with-stopwords--p11016770.html

...if i'm wrong about your situation (some fields in the qf with
stopwords
and some fields without) then please post all of the params you are
using
(not just mm) and the full parsedquery_tostring from when
debugQuery=true

is turned on.

-Hoss

Parent Child Entity - DataImport

2008-12-15 Thread sbutalia


I have a parent entity that grabs a list of records of a certain type from 1
table... and a sub-entity that queries another table to retrieve the actual
data... for various reasons I cannot join the tables... the 2nd sql query
converts the rows into an xml to be processed by a custom transformer (done
due to the complex nature of the second table)

Full-import works fine but delta-import is not adding any new records... 

Do I have to specify a deltaQuery for the sub-entity? What else might be
goin on?

document name=doc
entity name=table1 pk=id 
query= SELECT ID,MY_GUID
FROM activityLog
WHERE type in (11, 15)
deltaQuery=
SELECT ID,MY_GUID
FROM activityLog
WHERE type in (11, 15) and created_date 
'${dataimporter.last_index_time}'
field column=MY_GUID name=myGuid/
entity name=table2 pk=ID 
query=select dbms_xmlgen.getxml(' 
select Name, Title, 
Description 
from 
metaDataTable
where MY_GUID = 
${table1.MY_GUID_ID} 
') mdrXmlClob
from dual 
  

transformer=MD.Solr.Utils.transformers.MDTransformer
field column=Name name=mdName/
field column=Title name=mdTitle/
field column=Description 
name=mdDescription/
   /entity
/entity
/document
-- 
View this message in context: 
http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21024979.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Some solrconfig.xml attributes being ignored