date:20081215

It seems like maybe the fragmenter parameters just don't get displayed with
echoParams=all set. It may only display as far as the request handler's
parameters. The reason I think this is because I tried increasing
hl.fragsize to 1000 and the results were returned correctly (much larger
snippets), so I know it was read correctly.

I moved hl.snippets into the requestHandler config instead of the
fragmenter, and this seems to have solved the problem. However, I'm uneasy
with this solution because I don't know why it wasn't being read correctly
when setting it inside the fragmenter.

Mark



On Mon, Dec 15, 2008 at 5:08 PM, Mark Ferguson wrote:

> Thanks for this tip, it's very helpful. Indeed, it looks like none of the
> highlighting parameters are being included. It's using the correct request
> handler and hl is set to true, but none of the highlighting parameters from
> solrconfig.xml are in the parameter list.
>
> Here is my query:
>
>
> http://localhost:8080/solr1/select?rows=50&hl=true&fl=url,urlmd5,page_title,score&echoParams=all&q=java
>
> Here are the settings for the request handler and the highlighter:
>
> 
>   
>dismax
>0.01
>body_text^1.0 page_title^1.6 meta_desc^1.3
>*:*
>body_text page_title meta_desc
>0
>0
>regex
>   
> 
>
> 
>class="org.apache.solr.highlight.RegexFragmenter" default="true">
> 
>   3
>   100
>   0.5
>   \w[-\w ,/\n\"']{50,150}
> 
>   
> 
>
> And here is the param list returned to me:
>
> 
> all
> 0.01
> regex
> 0
> body_text^1.0 page_title^1.6 meta_desc^1.3
> 0
> *:*
> page_title,body_text
> dismax
> all
> url,urlmd5,page_title,score
> java
> true
> 50
> 
>
> So it seems like everything is working except for the highlighter. I should
> mention that when I enter a bogus fragmenter as a parameter (e.g.
> hl.fragmenter=bogus), it returns a 400 error that the fragmenter cannot be
> found, so the config file _is_ finding the regex fragmenter. It just doesn't
> seem to actually be including its parameters... Any ideas are appreciated,
> thanks again for the help.
>
> Mark
>
>
>
> On Mon, Dec 15, 2008 at 4:23 PM, Yonik Seeley  wrote:
>
>> Try adding echoParams=all to your query to verify the params that the
>> solr request handler is getting.
>>
>> -Yonik
>>
>> On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson
>>  wrote:
>> > Hello,
>> >
>> > In my solrconfig.xml file I am setting the attribute hl.snippets to 3.
>> When
>> > I perform a search, it returns only a single snippet for each
>> highlighted
>> > field. However, when I set the hl.snippets field manually as a search
>> > parameter, I get up to 3 highlighted snippets. This is the configuration
>> > that I am using to set the highlighted parameters:
>> >
>> > > class="org.apache.solr.highlight.RegexFragmenter"
>> > default="true">
>> >
>> >  3
>> >  100
>> >  0.5
>> >  \w[-\w ,/\n\"']{50,150}
>> >
>> > 
>> >
>> > I tried setting hl.fragmenter=regex as a parameter as well, to be sure
>> that
>> > it was using the correct one, and the result set is the same. Any ideas
>> what
>> > could be causing this attribute not to be read? It has me concerned that
>> > other attributes are being ignored as well.
>> >
>> > Thanks,
>> >
>> > Mark Ferguson
>> >
>>
>
>

Parent Child Entity - DataImport

2008-12-15 Thread sbutalia


I have a parent entity that grabs a list of records of a certain type from 1
table... and a sub-entity that queries another table to retrieve the actual
data... for various reasons I cannot join the tables... the 2nd sql query
converts the rows into an xml to be processed by a custom transformer (done
due to the complex nature of the second table)

Full-import works fine but delta-import is not adding any new records... 

Do I have to specify a deltaQuery for the sub-entity? What else might be
goin on?








   


-- 
View this message in context: 
http://www.nabble.com/Parent-Child-Entity---DataImport-tp21024979p21024979.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dismax Minimum Match/Stopwords Bug

2008-12-15 Thread Matthew Runo

Would this mean that, for example, if we wanted to search productId
(long) we'd need to make a field type that had stopwords in it rather
than simply using (long)?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Dec 12, 2008, at 11:56 PM, Chris Hostetter wrote:

: I have discovered some weirdness with our Minimum Match
functionality.
: Essentially it comes up with absolutely no results on certain
queries.
: Basically, searches with 2 words and 1 being ³the² don¹t have a
return
: result. From what we can gather the minimum match criteria is
making it
: such that if there are 2 words then both are required.
Unfortunately, the

you haven't mentioned what qf you're using, and you only listed one
field
type, which includes stopwords -- but i suspect your qf contains at
least

one field that *doesn't* remove stopwords.

this is in fact an unfortunate aspect of the way dismax works --
each "chunk" of text recognized by the querypaser is passed to each
analyzer for each field. Any chunk that produces a query for a field
becomes a DisjunctionMaxQuery, and is included in the "mm" count --
even
if that "chunk" is a stopword in every other field (and produces no
query)

so you have to either be consistent with your stopwords across all
fields,
or make your mm really small. searching for "dismax stopwords"
turns this

up...

http://www.nabble.com/Re%3A-DisMax-request-handler-doesn%27t-work-with-stopwords--p11016770.html

...if i'm wrong about your situation (some fields in the qf with
stopwords
and some fields without) then please post all of the params you are
using
(not just mm) and the full parsedquery_tostring from when
debugQuery=true

is turned on.

-Hoss

Re: SolrConfig.xml Replication

2008-12-15 Thread Jeff Newburn

It does appear to be working for us now.  The files replicated out
appropriately which is a huge help.  Thanks to all!


-Jeff


On 12/13/08 9:42 AM, "Shalin Shekhar Mangar"  wrote:

> Jeff, SOLR-821 has a patch now. It'd be nice to get some feedback if
> you
manage to try it out.

On Thu, Dec 11, 2008 at 8:33 PM, Jeff Newburn
>  wrote:

> Thank you for the quick response.  I will keep
> an eye on that to see how it
> progresses.
>
>
> On 12/10/08 8:03 PM, "Noble
> Paul നോബിള്‍ नोब्ळ्" 
> wrote:
>
> > This is a known
> issue and I was planning to take it up soon.
> >
> https://issues.apache.org/jira/browse/SOLR-821
> >
> >
> > On Thu, Dec 11,
> 2008 at 5:30 AM, Jeff Newburn 
> wrote:
> >> I am curious
> as to whether there is a solution to be able to replicate
> >> solrconfig.xml
> with the 1.4 replication.  The obvious problem is that
> the
> >> master would
> replicate the solrconfig turning all slaves into masters
> with
> >> its
> config.  I have also tried on a whim to configure the master and
> slave
> >>
> on the master so that the slave points to the same server but that seems
>
> to
> >> break the replication completely.  Please let me know if anybody has
> any
> >> ideas
> >>
> >> -Jeff
> >>
> >
> >
>
>


--
Regards,
Shalin Shekhar
> Mangar.

Re: Some solrconfig.xml attributes being ignored

Thanks for this tip, it's very helpful. Indeed, it looks like none of the
highlighting parameters are being included. It's using the correct request
handler and hl is set to true, but none of the highlighting parameters from
solrconfig.xml are in the parameter list.

Here is my query:

http://localhost:8080/solr1/select?rows=50&hl=true&fl=url,urlmd5,page_title,score&echoParams=all&q=java

Here are the settings for the request handler and the highlighter:

   dismax
   0.01
   body_text^1.0 page_title^1.6 meta_desc^1.3
   *:*
   body_text page_title meta_desc
   0
   0
   regex

  3
  100
  0.5
  \w[-\w ,/\n\"']{50,150}

And here is the param list returned to me:

all
0.01
regex
0
body_text^1.0 page_title^1.6 meta_desc^1.3
0
*:*
page_title,body_text
dismax
all
url,urlmd5,page_title,score
java
true
50

So it seems like everything is working except for the highlighter. I should
mention that when I enter a bogus fragmenter as a parameter (e.g.
hl.fragmenter=bogus), it returns a 400 error that the fragmenter cannot be
found, so the config file _is_ finding the regex fragmenter. It just doesn't
seem to actually be including its parameters... Any ideas are appreciated,
thanks again for the help.

Mark

On Mon, Dec 15, 2008 at 4:23 PM, Yonik Seeley  wrote:

> Try adding echoParams=all to your query to verify the params that the
> solr request handler is getting.
>
> -Yonik
>
> On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson
>  wrote:
> > Hello,
> >
> > In my solrconfig.xml file I am setting the attribute hl.snippets to 3.
> When
> > I perform a search, it returns only a single snippet for each highlighted
> > field. However, when I set the hl.snippets field manually as a search
> > parameter, I get up to 3 highlighted snippets. This is the configuration
> > that I am using to set the highlighted parameters:
> >
> >  class="org.apache.solr.highlight.RegexFragmenter"
> > default="true">
> >
> >  3
> >  100
> >  0.5
> >  \w[-\w ,/\n\"']{50,150}
> >
> > 
> >
> > I tried setting hl.fragmenter=regex as a parameter as well, to be sure
> that
> > it was using the correct one, and the result set is the same. Any ideas
> what
> > could be causing this attribute not to be read? It has me concerned that
> > other attributes are being ignored as well.
> >
> > Thanks,
> >
> > Mark Ferguson
> >
>

Re: Some solrconfig.xml attributes being ignored

Try adding echoParams=all to your query to verify the params that the
solr request handler is getting.

-Yonik

On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson
 wrote:
> Hello,
>
> In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When
> I perform a search, it returns only a single snippet for each highlighted
> field. However, when I set the hl.snippets field manually as a search
> parameter, I get up to 3 highlighted snippets. This is the configuration
> that I am using to set the highlighted parameters:
>
>  default="true">
>
>  3
>  100
>  0.5
>  \w[-\w ,/\n\"']{50,150}
>
> 
>
> I tried setting hl.fragmenter=regex as a parameter as well, to be sure that
> it was using the correct one, and the result set is the same. Any ideas what
> could be causing this attribute not to be read? It has me concerned that
> other attributes are being ignored as well.
>
> Thanks,
>
> Mark Ferguson
>

Re: Using Regex fragmenter to extract paragraphs

You actually don't need to escape most characters inside a character class,
the escaping of the period was unnecessary.

I've tried using the example regex ([-\w ,/\n\"']{20,200}), and I'm _still_
getting lots of highlighted snippets that don't match the regex (starting
with a period, etc.) Has anyone else has any trouble with the default regex
fragmenter? If someone has used it and gotten the expected results, can you
let me know, so I know that the problem is on my end?

Thanks for your help,

Mark


On Sun, Dec 14, 2008 at 8:34 AM, Erick Erickson wrote:

> Shouldn't you escape the question mark at the end too?
>
> On Fri, Dec 12, 2008 at 6:22 PM, Mark Ferguson  >wrote:
>
> > Someone helped me with the regex and pointed out a couple mistakes, most
> > notably the extra quantifier in .*{400,600}. My new regex is this:
> >
> > \w.{400,600}[\.!?]
> >
> > Unfortunately, my results still aren't any better. Some results start
> with
> > a
> > word character, some don't, and none seem to end with punctuation. Any
> > ideas
> > would else could be wrong?
> >
> > Mark
> >
> >
> >
> > On Fri, Dec 12, 2008 at 2:37 PM, Mark Ferguson <
> mark.a.fergu...@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > I am trying to use the regex fragmenter and am having a hard time
> getting
> > > the results I want. I am trying to get fragments that start on a word
> > > character and end on punctuation, but for some reason the fragments
> being
> > > returned to me seem to be very inflexible, despite that I've provided a
> > > large slop. Here are the relevant parameters I'm using, maybe someone
> can
> > > help point out where I've gone wrong:
> > >
> > > 500
> > > regex
> > > 0.8
> > > [\w].*{400,600}[.!?]
> > > true
> > > chinese
> > >
> > > This should be matching between 400-600 characters, beginning with a
> word
> > > character and ending with one of .!?. Here is an example of a typical
> > > result:
> > >
> > > . Check these pictures out. Nine panda cubs on display for the first
> time
> > > Thursday in southwest China. They're less than a year old. They just
> > > recently stopped nursing. There are only 1,600 of these guys left in
> the
> > > mountain forests of central China, another 120 in  > > class='hl'>Chinese breeding facilities and zoos. And they're
> about
> > 20
> > > that live outside China in zoos. They exist almost entirely on bamboo.
> > They
> > > can live to be 30 years old. And these little guys will eventually get
> > much
> > > bigger. They'll grow
> > >
> > > As you can see, it is starting with a period and ending on a word
> > > character! It's almost as if the fragments are just coming out as they
> > will
> > > and the regex isn't doing anything at all, but the results are
> different
> > > when I use the gap fragmenter. In the above result I don't see any
> reason
> > > why it shouldn't have stripped out the preceding period and the last
> two
> > > words, there is plenty of room in the slop and in the regex pattern.
> > Please
> > > help me figure out what I'm doing wrong...
> > >
> > > Thanks a lot,
> > >
> > > Mark Ferguson
> > >
> >
>

Some solrconfig.xml attributes being ignored

Hello,

In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When
I perform a search, it returns only a single snippet for each highlighted
field. However, when I set the hl.snippets field manually as a search
parameter, I get up to 3 highlighted snippets. This is the configuration
that I am using to set the highlighted parameters:



  3
  100
  0.5
  \w[-\w ,/\n\"']{50,150}



I tried setting hl.fragmenter=regex as a parameter as well, to be sure that
it was using the correct one, and the result set is the same. Any ideas what
could be causing this attribute not to be read? It has me concerned that
other attributes are being ignored as well.

Thanks,

Mark Ferguson

Re: TextField size limit

>
> No need to re-index with this change.
> But you will have to re-index any documents that got cut off of course.
> 
> -Yonik
> 

Ok, thanks...
I hoped to reindex the documents over the existent index (with incremental 
update...while solr is running) ...and without delete the index folder

But the important is to solve the problem ;-)

Thanks...
  Antonio

Re: TextField size limit

On Mon, Dec 15, 2008 at 5:28 PM, Antonio Zippo  wrote:
>>
>> Check your solrconfig.xml:
>>
>>  1
>>
>> That's probably the truncating factor.  That's the maximum number of terms, 
>> not bytes or characters.
>>
> Thanks... I think it could be the problem.
> i tried to count whitespace in a single text and it's over 55.000 ... but 
> solr truncates to 10.000
>
> do you know if I can change the value to 100.000 without recreate the index? 
> (when I modify schema.xml I need to create the index again but with 
> solrconfig.xml?)

No need to re-index with this change.
But you will have to re-index any documents that got cut off of course.

-Yonik

Re: Slow Response time after optimize

Can you try with the latest nightly build?
That may help pinpoint if it's index file locking contention, or OS
disk cache misses when reading the index.  If the time never recovers,
it suggests the former.

-Yonik

On Mon, Dec 15, 2008 at 5:14 PM, Sammy Yu  wrote:
> Hi guys,
>   I have a typical master/slave setup running with Solr 1.3.0.  I did
> some basic scalability test with JMeter and tweaked our environment
> and determined that we can handle approximately 26 simultaneous
> threads and get end-to-end response times of under 200ms even with
> typically every 5 minute distribution.   However, as soon as I issue a
> single optimize on the master, the response time goes up to over 500ms
> and does not seem to recover.   As soon as I restarted the response
> time is back down to 200ms.  My index is approximately 5 GB in size
> and the queries are just basic constructed disjunction queries such as
> title:iphone OR bodytext:iphone.  Has anybody seen this issue before?
>
> Thanks,
> Sammy
>

Re: TextField size limit

> 
> Check your solrconfig.xml:
> 
>  1
> 
> That's probably the truncating factor.  That's the maximum number of terms, 
> not bytes or characters.
> 
> Erik
> 


Thanks... I think it could be the problem.
i tried to count whitespace in a single text and it's over 55.000 ... but solr 
truncates to 10.000

do you know if I can change the value to 100.000 without recreate the index? 
(when I modify schema.xml I need to create the index again but with 
solrconfig.xml?)

Thanks,
  Antonio

Re: Standard request with functional query

2008-12-15 Thread Sammy Yu

Hey guys,
Thanks for the response, but how would make recency a factor on
scoring documents with the standard request handler.
The query (title:iphone OR bodytext:iphone OR title:firmware OR
bodytext:firmware) AND _val_:"ord(dateCreated)"^0.1
seems to do something very similar to just sorting by dateCreated
rather than having dateCreated being a part of the score.

Thanks,
Sammy

n Thu, Dec 4, 2008 at 1:35 PM, Sammy Yu  wrote:
> Hi guys,
>I have a standard query that searches across multiple text fields such as
> q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware
>
> This comes back with documents that have iphone and firmware (I know I
> can use dismax handler but it seems to be really slow), which is
> great.  Now I want to give some more weight to more recent documents
> (there is a dateCreated field in each document).
>
> So I've modified the query as such:
> (title:iphone OR bodytext:iphone OR title:firmware OR
> bodytext:firmware) AND _val_:"ord(dateCreated)"^0.1
> URLencoded to 
> q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3A"ord(dateCreated)"^0.1
>
> However, the results are not as one would expects.  The first few
> documents only come back with the word iphone and appears to be sorted
> by date created.  It seems to completely ignore the score and use the
> dateCreated field for the score.
>
> On a not directly related issue it seems like if you put the weight
> within the double quotes:
> (title:iphone OR bodytext:iphone OR title:firmware OR
> bodytext:firmware) AND _val_:"ord(dateCreated)^0.1"
>
> the parser complains:
> org.apache.lucene.queryParser.ParseException: Cannot parse
> '(title:iphone OR bodytext:iphone OR title:firmware OR
> bodytext:firmware) AND _val_:"ord(dateCreated)^0.1"': Expected ',' at
> position 16 in 'ord(dateCreated)^0.1'
>
> Thanks,
> Sammy
>

Slow Response time after optimize

2008-12-15 Thread Sammy Yu

Hi guys,
   I have a typical master/slave setup running with Solr 1.3.0.  I did
some basic scalability test with JMeter and tweaked our environment
and determined that we can handle approximately 26 simultaneous
threads and get end-to-end response times of under 200ms even with
typically every 5 minute distribution.   However, as soon as I issue a
single optimize on the master, the response time goes up to over 500ms
and does not seem to recover.   As soon as I restarted the response
time is back down to 200ms.  My index is approximately 5 GB in size
and the queries are just basic constructed disjunction queries such as
title:iphone OR bodytext:iphone.  Has anybody seen this issue before?

Thanks,
Sammy

Re: TextField size limit

2008-12-15 Thread Erik Hatcher


Check your solrconfig.xml:

 1

That's probably the truncating factor.  That's the maximum number of  
terms, not bytes or characters.


Erik


On Dec 15, 2008, at 5:00 PM, Antonio Zippo wrote:


Hi all,

i have a TextField containing over 400k of text

when i try to search a word solr doesn't return any result but  
if I search for a single document, I can see that the word exists  
there


So I suppose that solr has a textfield size limit (the field is  
indexed using a tokenizer and some filters)


Could anyone help me to undestand the problem? and if is it possible  
to solve?


Thanks in advance,
 Antonio

TextField size limit

Hi all,

i have a TextField containing over 400k of text

when i try to search a word solr doesn't return any result but if I search 
for a single document, I can see that the word exists there

So I suppose that solr has a textfield size limit (the field is indexed 
using a tokenizer and some filters)

Could anyone help me to undestand the problem? and if is it possible to solve?

Thanks in advance,
  Antonio

Re: Multi tokenizer


>>: I need to tokenize my field on whitespaces, html, punctuation, apostrophe
>>
>>: but if I use HTMLStripStandardTokenizerFactory it strips only html 
>>: but no apostrophes

> you might consider using one of the HTML Tokenizers, and then use a 
> PatternReplaceFilterFilter ... or if you know java write a 
> simple Tokenizer that uses the HTMLStripReader.
> 
>  in the long run, changing the HTMLStripReader to be useble as a 
>  "CharFilter" so it can work with any Tokenizer is probably the way we'll 
> go -- but i don't think anyone has started working on a patch for that.

thanks... I used HTMLStripStandardTokenizerFactory and then a 
PatternReplaceFilterFilter

now it works

Re: Please help me articulate this query

2008-12-15 Thread Derek Springer

Thanks for the tip, I appreciate it!

However, does anyone know how to articulate the syntax of "(This AND That)
OR (Something AND Else)" into a query string?

i.e. q=referring:### AND question:###

On Mon, Dec 15, 2008 at 12:32 PM, Stephen Weiss wrote:

> I think in this case you would want to index each question with the
> possible referrers ( by title might be too imprecise, I'd go with filename
> or ID) and then do a search like this (assuming in this case it's by
> filename)
>
> q=(referring:TomCruise.html) OR (question: Tom AND Cruise)
>
> Which seems to be what you're thinking.
>
> I would make the referrer a type "string" though so that you don't
> accidentally pull in documents from a different subject (Tom Cruise this
> would work ok, but imagine you need to distinguish between George Washington
> and George Washington Carver).
>
> --
> Steve
>
>
>
> On Dec 15, 2008, at 2:59 PM, Derek Springer wrote:
>
>  Hey all,
>> I'm having trouble articulating a query and I'm hopeful someone out there
>> can help me out :)
>>
>> My situation is this: I am indexing a series of questions that can either
>> be
>> asked from a main question entry page, or a specific subject page. I have
>> a
>> field called "referring" which indexes the title of the specific subject
>> page, plus the regular question whenever that document is submitted from a
>> specific specific subject page. Otherwise, every document is indexed with
>> just the question.
>>
>> Specifically, what I am trying to do is when I am on the page specific
>> subject page (e.g. Tom Cruise) I want to search for all of the questions
>> asked from that page, plus any question asked about Tom Cruise. Something
>> like:
>> q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise)
>>
>> "Have you ever used a Tom Tom?" - Not returned
>> "Where is the best place to take a cruise?" - Not returned
>> "When did he have is first kid?" - Returned iff question was asked from
>> Tom
>> Cruise page
>> "Do you think that Tom Cruise will make more movies?" - Always returned
>>
>> Any thoughts?
>>
>> -Derek
>>
>
>

Re: Please help me articulate this query

2008-12-15 Thread Stephen Weiss

I think in this case you would want to index each question with the  
possible referrers ( by title might be too imprecise, I'd go with  
filename or ID) and then do a search like this (assuming in this case  
it's by filename)


q=(referring:TomCruise.html) OR (question: Tom AND Cruise)

Which seems to be what you're thinking.

I would make the referrer a type "string" though so that you don't  
accidentally pull in documents from a different subject (Tom Cruise  
this would work ok, but imagine you need to distinguish between George  
Washington and George Washington Carver).


--
Steve


On Dec 15, 2008, at 2:59 PM, Derek Springer wrote:


Hey all,
I'm having trouble articulating a query and I'm hopeful someone out  
there

can help me out :)

My situation is this: I am indexing a series of questions that can  
either be
asked from a main question entry page, or a specific subject page. I  
have a
field called "referring" which indexes the title of the specific  
subject
page, plus the regular question whenever that document is submitted  
from a
specific specific subject page. Otherwise, every document is indexed  
with

just the question.

Specifically, what I am trying to do is when I am on the page specific
subject page (e.g. Tom Cruise) I want to search for all of the  
questions
asked from that page, plus any question asked about Tom Cruise.  
Something

like:
q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise)

"Have you ever used a Tom Tom?" - Not returned
"Where is the best place to take a cruise?" - Not returned
"When did he have is first kid?" - Returned iff question was asked  
from Tom

Cruise page
"Do you think that Tom Cruise will make more movies?" - Always  
returned


Any thoughts?

-Derek

Please help me articulate this query

2008-12-15 Thread Derek Springer

Hey all,
I'm having trouble articulating a query and I'm hopeful someone out there
can help me out :)

My situation is this: I am indexing a series of questions that can either be
asked from a main question entry page, or a specific subject page. I have a
field called "referring" which indexes the title of the specific subject
page, plus the regular question whenever that document is submitted from a
specific specific subject page. Otherwise, every document is indexed with
just the question.

Specifically, what I am trying to do is when I am on the page specific
subject page (e.g. Tom Cruise) I want to search for all of the questions
asked from that page, plus any question asked about Tom Cruise. Something
like:
q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise)

"Have you ever used a Tom Tom?" - Not returned
"Where is the best place to take a cruise?" - Not returned
"When did he have is first kid?" - Returned iff question was asked from Tom
Cruise page
"Do you think that Tom Cruise will make more movies?" - Always returned

Any thoughts?

-Derek

Re: Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores

2008-12-15 Thread Kay Kay


Thanks Yonik for the clarification.

Yonik Seeley wrote:

A solr core is like a separate solr server... so create a new
CommonsHttpSolrServer that points at the core.
You probably want to create and reuse a single HttpClient instance for
the best efficiency.

-Yonik

On Mon, Dec 15, 2008 at 11:06 AM, Kay Kay  wrote:
  

Hi -
 I am looking at the  article here with a brief introduction to SolrJ .
http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17Solr&S_Tact=105AGX59&S_CMP=GRsitejw17#solrj
.

 In case we have multiple SolrCores in the server application - (since 1.3)
- how do I specify as part of SolrQuery as to which core needs to be used
for the given query. I am trying to dig out the information from the code.
Meanwhile, if someone is aware of the same - please suggest some pointers.

CoreContainer : register(String, SolrCore, boolean) documentation clarification about returnPrev argument

2008-12-15 Thread Kay Kay


Hi -
  In CoreContainer.java :: register  the documentation says it would 
return a previous core having the same name if it existed *and 
returnPrev = true*.


  * @return a previous core having the same name if it existed and 
returnPrev==true

  */
 public SolrCore register(String name, SolrCore core, boolean 
returnPrev) ..



But as per the code towards the end - the previous core is returned 
anyway, irrespective of the value of returnPrev. The difference, though, 
seems to be that when returnPrev is false, the previous core (of the 
same name, if exists) is closed.


Which one of them is correct . If the code were correct , would the 
variable be better renamed as closePrevious , as opposed to returnPrevious.

Re: Solrj: Multivalued fields give Bad Request

2008-12-15 Thread Ryan McKinley


What do you see in the admin schema browser?
/admin/schema.jsp

When you select the field "names", do you see the property  
"Multivalued"?


ryan


On Dec 15, 2008, at 10:55 AM, Schilperoort, René wrote:


Sorry,

Forgot the most important detail.
The document I am adding contains multiple "names" fields:
sInputDocument.addField("names", value);
sInputDocument.addField("names", value);
sInputDocument.addField("names", value);

There is no problem when a document only contains one value in the  
names field.



-Original Message-
From: Schilperoort, René [mailto:rene.schilpero...@getronics.com]
Sent: maandag 15 december 2008 16:52
To: solr-user@lucene.apache.org
Subject: Solrj: Multivalued fields give Bad Request

Hi all,

When adding documents to Solr using solr I receive the following  
Exception.

org.apache.solr.common.SolrException: Bad Request

The field is "configured" as followed:
multiValued="true"/>


Any suggestions?

Regards, Rene

Re: Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores

A solr core is like a separate solr server... so create a new
CommonsHttpSolrServer that points at the core.
You probably want to create and reuse a single HttpClient instance for
the best efficiency.

-Yonik

On Mon, Dec 15, 2008 at 11:06 AM, Kay Kay  wrote:
> Hi -
>  I am looking at the  article here with a brief introduction to SolrJ .
> http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17Solr&S_Tact=105AGX59&S_CMP=GRsitejw17#solrj
> .
>
>  In case we have multiple SolrCores in the server application - (since 1.3)
> - how do I specify as part of SolrQuery as to which core needs to be used
> for the given query. I am trying to dig out the information from the code.
> Meanwhile, if someone is aware of the same - please suggest some pointers.

Solrj - SolrQuery - specifying SolrCore - when the Solr Server has multiple cores

2008-12-15 Thread Kay Kay


Hi -
 I am looking at the  article here with a brief introduction to SolrJ .
http://www.ibm.com/developerworks/library/j-solr-update/index.html?ca=dgr-jw17Solr&S_Tact=105AGX59&S_CMP=GRsitejw17#solrj 
.


 In case we have multiple SolrCores in the server application - (since 
1.3) - how do I specify as part of SolrQuery as to which core needs to 
be used for the given query. I am trying to dig out the information from 
the code. Meanwhile, if someone is aware of the same - please suggest 
some pointers.

RE: Solrj: Multivalued fields give Bad Request

2008-12-15 Thread Schilperoort , René

Sorry,

Forgot the most important detail.
The document I am adding contains multiple "names" fields:
sInputDocument.addField("names", value);
sInputDocument.addField("names", value);
sInputDocument.addField("names", value);

There is no problem when a document only contains one value in the names field.


-Original Message-
From: Schilperoort, René [mailto:rene.schilpero...@getronics.com] 
Sent: maandag 15 december 2008 16:52
To: solr-user@lucene.apache.org
Subject: Solrj: Multivalued fields give Bad Request

Hi all,

When adding documents to Solr using solr I receive the following Exception.
org.apache.solr.common.SolrException: Bad Request

The field is "configured" as followed:


Any suggestions?

Regards, Rene

Solrj: Multivalued fields give Bad Request

2008-12-15 Thread Schilperoort , René

Hi all,

When adding documents to Solr using solr I receive the following Exception.
org.apache.solr.common.SolrException: Bad Request

The field is "configured" as followed:


Any suggestions?

Regards, Rene

Feature Request: Return count for documents which are possible to select

2008-12-15 Thread Neil Ireson


Hi all,

Whilst Solr is a great resource (a big thank you to the developers) it 
presents me with a couple of issues.


The need for hierarchical facets I would say is a fairly crucial missing 
piece but has already been pointed out 
(http://issues.apache.org/jira/browse/SOLR-64).


The other issue relates to providing (count) feedback for disjoint 
selections. When a facet value is selected this constrains the documents 
and solr returns the counts for all the other facet values. Thus the 
user can see all the possible valid selections (i.e. having a count >0) 
and the number of documents which will be returned if that value is 
selected. However one of the valid selections is to select another value 
in the facet, creating a disjoint selection and increasingly the number 
of returned documents. However there is currently no way for the user to 
know which values are valid to select as the count only relates to 
currently selected documents and not documents which are also still 
possible to select.


I hope this is clear, it's not the easiest issue to explain (or perhaps 
I just do it badly). Anyway other Faceted Browsers, such as the Simile 
Project's Exhibit, do return counts showing the effect of disjoint 
selections which is more useful for the user.



N



PS I'm unsure whether this should be posted to the developer's list so I 
posted here first.

Re: dataimport handler with mysql: wrong field mapping

2008-12-15 Thread Luca Molteni

Have you tried using the



options in the schema.xml? After the indexing, take a look to the
fields DIH has generated.

Bye,

L.M.



2008/12/15 jokkmokk :
>
> HI,
>
> I'm desperately trying to get the dataimport handler to work, however it
> seems that it just ignores the field name mapping.
> I have the fields "body" and "subject" in the database and those are called
> "title" and "content" in the solr schema, so I use the following import
> config:
>
> 
>
> type="JdbcDataSource"
>driver="com.mysql.jdbc.Driver"
>url="jdbc:mysql://localhost/mydb"
>user="root"
>password=""/>
>
>
> 
>
>
>
>
> 
>
> 
>
> however I always get the following exception:
>
> org.apache.solr.common.SolrException: ERROR:unknown field 'body'
>at
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:274)
>at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
>at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
>at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:279)
>at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
>at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:137)
>at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:326)
>at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:367)
>
>
> but according to the documentation it should add a document with "title" and
> "content" not "body" and "subject"?!
>
> I'd appreciate any help as I can't see anything wrong with my
> configuration...
>
> TIA,
>
> Stefan
> --
> View this message in context: 
> http://www.nabble.com/dataimport-handler-with-mysql%3A-wrong-field-mapping-tp21013109p21013109.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: ExtractingRequestHandler and XmlUpdateHandler



On Dec 15, 2008, at 8:20 AM, Jacob Singh wrote:


Hi Erik,

Sorry I wasn't totally clear.  Some responses inline:
If the file is visible from the Solr server, there is no need to  
actually
send the bits through HTTP.  Solr's content steam capabilities  
allow a file

to be retrieved from Solr itself.



Yeah, I know.  But in my case not possible.   Perhaps a simple file
receiving HTTP POST handler which simply stored the file on disk and
returned a path to it is the way to go here.

So I could send the file, and receive back a token which I would  
then
throw into one of my fields as a reference.  Then using it to map  
tika

fields as well. like:

${FILETOKEN}.last_modified

${FILETOKEN}.content


Huh?   I'm don't follow the file token thing.  Perhaps you're  
thinking
you'll post the file, then later update other fields on that same  
document.

An important point here is that Solr currently does not have document
update capabilities.  A document can be fully replaced, but cannot  
have
fields added to it, once indexed.  It needs to be handled all in  
one shot to

accomplish the blending of file/field indexing.  Note the
ExtractingRequestHandler already has the field mapping capability.



Sorta... I was more thinking of a new feature wherein a Solr Request
handler doesn't actually put the file in the index, merely runs it
through tika and stores a datastore which links a "token" with the
tika extraction.  Then the client could make another request w/ the
XMLUpdateHandler which referenced parts of the stored tika extraction.



Hmmm, thinking out loud

Override SolrContentHandler.  It is responsible for mapping the Tika  
output to a Solr Document.

Capture all the content into a single buffer.
Add said buffer to a field that is stored only
Add a second field that is indexed.  This is your "token".  You could,  
just as well, have that token be the only thing that gets returned by  
extract only.


Alternately, you could implement an UpdateProcessor thingamajob that  
takes the output and stores it to the filesystem and just adds the  
token to a document.






But, here's a solution that will work for you right now... let Tika  
extract

the content and return back to you, then turn around and post it and
whatever other fields you like:



In that example, the contents aren't being indexed, just returned  
back to
the client.  And you can leverage the content stream capability  
with this as

well avoiding posting the actual binary file, pointing the extracting
request to a file path visible by Solr.



Yeah, I saw that.  This is pretty much what I was talking about above,
the only disadvantage (which is a deal breaker in our case) is the
extra bandwidth to move the file back and forth.

Thanks for your help and quick response.

I think we'll integrate the POST fields as Grant has kindly provided
multi-value input now, and see what happens in the future.  I realize
what I'm talking about (XML and binary together) is probably not a
high priority feature.



Is the use case this:

1. You want to assign metadata and also store the original and have it  
stored in binary format, too?  Thus, Solr becomes a backing,  
searchable store?


I think we could possibly add an option to serialize the ContentStream  
onto a Field on the Document.  In other words, store the original with  
the Document.  Of course, buyer beware on the cost of doing so.

Re: using BoostingTermQuery

In the solrconfig.xml (scroll all the way to the bottom, and I believe  
the example has some commented out)


On Dec 15, 2008, at 5:45 AM, ayyanar wrote:




I'm no QueryParser expert, but I would probably start w/ the default
query parser in Solr (LuceneQParser), and then progress a bit to the
DisMax one.  I'd ask specific questions based on what you see there.
If you get far enough along, you may consider asking for help on the
java-user list as well.



Thanks - I think I've got it working now.  I ended up subclassing
QueryParser and overriding >newTermQuery() to create a  
BoostingTermQuery

instead of a plain ol' TermQuery.  Seems to work.

Kindly let me know where and how to configure the overridden query  
parser in

solr


-Ayyanar
--
View this message in context: 
http://www.nabble.com/using-BoostingTermQuery-tp19637123p21011626.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Sample code for some examples for using solr in applications


See also http://wiki.apache.org/solr/SolrResources


On Dec 15, 2008, at 2:57 AM, Andre Hagenbruch wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sajith Vimukthi schrieb:

Hi Sajith,

I need some sample code of some examples done using solr. I need to  
get an
idea on how I can use solr in my application. Please be kind enough  
to reply

me asap. It would be a grt help.


did you already have a look at the documentation for Solrj
(http://wiki.apache.org/solr/Solrj) or any of the other clients?
Overall, the wiki (http://wiki.apache.org/solr/) is a good place to  
get

started...

Hth,

Andre
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklGDf4ACgkQ3wuzs9k1icW/zgCeMSYFlHAwksHS2UZKZ9ZsaipX
NZcAn1Oibwe8aH9odu4Abc5DqbI1opI3
=HIl+
-END PGP SIGNATURE-


--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Sample code