RE: Index an entire Phrase and not it's constituent parts?

2010-03-14 Thread MitchK

Hmm, I don't understand the problem.

Look: If your analyzer looks like:






And your document would looks like:
"There is a big performance issue. Solving the problem would be great. As
long as we try to give our best, ..."

After the LowerCaseFilterFactory every word would be lowercased. Now you are
passing it to the SynonymFilterFactory and it would be transformed in
something like (I ignore the changes of the other tokenizers):
"there is a big performance issue. solving the problem would be great.
specialphrase1 we try to give our best,..."

Afterwards the StopFilterFactory may change it this way:
"there big performance issue. solving problem great. specialphrase1 try give
our best,..."

My idea to do this work in another field comes from considering the case,
when a user is searching for "amount of blabla" instead of "in amount of
blabla". After passing this phrase to a StopFilter it would looks like:
"amount bla bla". So you got a chance to find the right document in the
index, when a user is not using the full pre-defined phrase.
You don't need to do so. You even don't need to score the normal-field and
the phrase-field differently. It was only a suggestion. :)  

However, if I missunderstood your post, and you don't want to replace the
phrases with something like "specialphrase1", try to use the keepWordFilter
- it sounds like it may do what you want. Have a look at the
analysis.jsp-page to see what its results are.

BTW: If you really need to code your own tokenizer, have a look at a filter,
that summerizes several words as one term. 

Something that I really hate are manual-phrases like "it behaves like the
inversion of xy-filter" - almost nobody can really imagine what this filter
exactly is for... so,  if this filter is what you are searching for, please
contribute a better description for the javadocs :).

Kind regards
- Mitch
-- 
View this message in context: 
http://old.nabble.com/Index-an-entire-Phrase-and-not-it%27s-constituent-parts--tp27785521p27893777.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Index an entire Phrase and not it's constituent parts?

2010-03-14 Thread MitchK

I'm sorry for doubleposting:
Drinking a coup of coffee was a good idea. KeepWordFilter seems to mean,
that you give a Set of words to it. Everything that is not in the set, will
be deleted. Furthermore, the description is correct, since it really behaves
like an inversion of StopWordFilter.
-- 
View this message in context: 
http://old.nabble.com/Index-an-entire-Phrase-and-not-it%27s-constituent-parts--tp27785521p27893833.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH field options

2010-03-14 Thread Dennis Gearon
I asked, but did not see a reply to the following question, (for a newbie like 
me):

Question: 
  What does DIH mean?
Answer:
  Data Import Handler

Sent to list to aid searches by other newbies in the future.
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 3/11/10, blargy  wrote:

> From: blargy 
> Subject: DIH field options
> To: solr-user@lucene.apache.org
> Date: Thursday, March 11, 2010, 10:58 PM
> 
> How can you simply add a static value like?  name="id" value="123"/>
> How does one add a static multi-value field?  name="category_ids"
> values="123, 456"/>
> 
> Is there any documentation on all the options for the field
> tag in
> data-config.xml?
> 
> Thanks for the help
> -- 
> View this message in context: 
> http://old.nabble.com/DIH-field-options-tp27873996p27873996.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 
>


DIH datasource configuration

2010-03-14 Thread blargy

My current DIH is configured via the requestHandler block in solrconfig.xml



  data-config.xml
  
${datasource.driver}
${datasource.url}
${datasource.user}
${datasource.password}
 -1
true
  

  

My question is, does the batchsize and readOnly properties sill work if I
specify it here as opposed to the data-config.xml? I can't seem to find this
answer anywhere. An even better question is how can I check my current
datasource configuration while the application is running?

Thanks!


-- 
View this message in context: 
http://old.nabble.com/DIH-datasource-configuration-tp27897206p27897206.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best performance for facet dates in trunk using solr.TrieDateField

2010-03-14 Thread Peter Sturge
Hi Yonik,

I'm a bit confused now. In your recent Mastering Solr webinar (great stuff,
btw, thank you!), the slides imply using tdate fields with a precisionStep
of 8 for faster range queries:

   - Use tint, tfloat, tlong, tdouble, tdate for faster range queries
  - 
  - Date faceting also uses range queries

I was thinking from this, the higher the value, the bigger the index, but
for faster range query speed.

Below you mention a lower value is better for faster range queries (at the
cost of a bit bigger index).
Can you clarify, for fast 'wide' date range queries (for date faceting and
otherwise), what is the best precisionStep value to use for tdate?

Thanks!
Peter




On Sun, Mar 14, 2010 at 6:03 AM, Yonik Seeley wrote:

> On Wed, Mar 3, 2010 at 7:51 AM, Marc Sturlese 
> wrote:
> > I am testing date facets in trunk with huge index. Aparently, as the
> default
> > solrconfig.xml shows, the fastest way to run dace facets queries is index
> > the field with this data type:
> >
> >
> > > precisionStep="6" positionIncrementGap="0"/>
> >
> > I am wandering... setting precisionStep="8" to the TriedateField would
> > improve even more the speed of the queries??
>
> The lower the precisionStep, the more tokens are indexed per value,
> and the faster that "wide" range queries get (those that cover many
> terms).  Lower also means bigger index  though.
>
> -Yonik
> http://www.lucidimagination.com
>


RegexTransformer

2010-03-14 Thread blargy

How would I go about splitting a column by a certain delimiter AND ignore all
empty matches.

For example:
 


I have a some columns that dont have a value for values but so its getting
actually index as blank. I just want to totally ignore those values. Is this
possible?

-- 
View this message in context: 
http://old.nabble.com/RegexTransformer-tp27897870p27897870.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best performance for facet dates in trunk using solr.TrieDateField

2010-03-14 Thread Yonik Seeley
On Sun, Mar 14, 2010 at 3:39 PM, Peter Sturge
 wrote:
> I'm a bit confused now. In your recent Mastering Solr webinar (great stuff,
> btw, thank you!), the slides imply using tdate fields with a precisionStep
> of 8 for faster range queries:
>
>   - Use tint, tfloat, tlong, tdouble, tdate for faster range queries
>      -       omitNorms="true" positionIncrementGap="0"/>
>      - Date faceting also uses range queries

Correct.  precisionStep of 8 compared to no precision step at all.
Precision step of 0 is like normal - a single token is indexed for a
single value.
Precision step of 8 would index a 32 bit int with 4 tokens (bigger
index, faster range queries)
Precision step of 4 would index a 32 bit int with 8 tokens (even
bigger index, even faster ranges)

> Can you clarify, for fast 'wide' date range queries (for date faceting and
> otherwise), what is the best precisionStep value to use for tdate?

It's a tradeoff - I don't know what the best is.
By "wide"... it means a range that covers a lot of unique values.
If the ranges only cover a few unique values, the trie strategy
doesn't help much.

-Yonik
http://www.lucidimagination.com


some hyphenated words not found

2010-03-14 Thread george young
I have a nearly generic out-of-box installation of solr.  When I
search on a short text document containing a few hyphenated words, I
get hits on *some* of the words, but not all.  I'm quite puzzled as to
why.  I've checked that the text is only plain ascii.  How can I find
out what's wrong?  In the file below, solr finds life-long, but not
love-lorn.

Here's the file:
This is a small sample document just to insure that a type *.doc can
be accessed by X Documentation.
It is sung to the moon by a love-lorn loon,
who fled from the mocking throng O!
It’s the song of a merryman, moping mum,
whose soul was sad and whose glance was glum. Misery me — lack-a-day-dee!
He sipped no sup, and he craved no crumb,
As he sighed for the love of a ladye!
Who sipped no sup, and who craved no crumb,
As he sighed for the love of a ladye.
Heighdy! heighdy! Misery me — lack-a-day-dee!
He sipped no sup, and he craved no crumb,
As he sighed for the love of a ladye!

I have a song to sing, O!
Sing me your song, O!

It is sung with the ring
Of the songs maids sing
Who love with a love life-long, O!
It's the song of a merrymaid, peerly proud,
Who loved a lord, and who laughed aloud
At the moan of the merryman, moping mum,
Whose soul was sad, and whose glance was glum,
Who sipped no sup, and who craved no crumb,
As he sighed for the love of a ladye!
Heighdy! heighdy!
Misery me — lack-a-day-dee!
He sipped no sup, and he craved no crumb,
As he sighed for the love of a ladye!


-- 
georgeryo...@gmail.com


create core with separate solrconfig.xml

2010-03-14 Thread Mark Fletcher
Hi,

I wanted to configure one core as Master and one core as slave.
This is my existing configuration:-

In my SOLR_HOME I have conf/schema.xml, conf/solrconfig.xml  and the others
when no core was present
Also in my SOLR_HOME are solr.xml and coreA created using the CREATE command
for cores

I have my other coreB's index in a different dataDir

I believe in this configuration both the cores share the same schema.xml and
solrconfig.xml. I added the master slave replication code in my
{SOLR_HOME}/conf/solrconfig.xml.

 



optimize




   
00:00:10




Just below that I specified the slave




{specified the instanceDir}
/coreA/replication






internal

5000
1

username
password
 


When I optimize coreA, replication to coreB doesn't happen. CoreA (my
supposed to be master here) gets the new values but not coreB. When I tried
the *startup* option in the first block of replication it gave lucene write
error in the index so I went for optimize.

Is there something wrong here or do I need to have separate solrconfig.xml
for coreA and coreB to clearly indicate who is master and who is slave by
including only one of the replicaiton codes in the corresponding
solrconfig.xml rather than have a common solrconfig.xml and specify both in
that.

If I need to specify separate solrconfig.xml for both cores, how do I do
that??

Any help is appreciated.

Thanks and Rgds,
Mark


Re: some hyphenated words not found

2010-03-14 Thread Lance Norskog
Look at the terms in the index with the analysis.jsp file, or with Luke.

The different here is that love-lorn is a separate phrase, but
life-long has a comma after it. Try inserting a space before the
comma.

On 3/14/10, george young  wrote:
> I have a nearly generic out-of-box installation of solr.  When I
> search on a short text document containing a few hyphenated words, I
> get hits on *some* of the words, but not all.  I'm quite puzzled as to
> why.  I've checked that the text is only plain ascii.  How can I find
> out what's wrong?  In the file below, solr finds life-long, but not
> love-lorn.
>
> Here's the file:
> This is a small sample document just to insure that a type *.doc can
> be accessed by X Documentation.
> It is sung to the moon by a love-lorn loon,
> who fled from the mocking throng O!
> It’s the song of a merryman, moping mum,
> whose soul was sad and whose glance was glum. Misery me — lack-a-day-dee!
> He sipped no sup, and he craved no crumb,
> As he sighed for the love of a ladye!
> Who sipped no sup, and who craved no crumb,
> As he sighed for the love of a ladye.
> Heighdy! heighdy! Misery me — lack-a-day-dee!
> He sipped no sup, and he craved no crumb,
> As he sighed for the love of a ladye!
>
> I have a song to sing, O!
> Sing me your song, O!
>
> It is sung with the ring
> Of the songs maids sing
> Who love with a love life-long, O!
> It's the song of a merrymaid, peerly proud,
> Who loved a lord, and who laughed aloud
> At the moan of the merryman, moping mum,
> Whose soul was sad, and whose glance was glum,
> Who sipped no sup, and who craved no crumb,
> As he sighed for the love of a ladye!
> Heighdy! heighdy!
> Misery me — lack-a-day-dee!
> He sipped no sup, and he craved no crumb,
> As he sighed for the love of a ladye!
>
>
> --
> georgeryo...@gmail.com
>


-- 
Lance Norskog
goks...@gmail.com


Re: Warning : no lockType configured for...

2010-03-14 Thread Lance Norskog
Doing an exhaustive scan of this problem, I did find this one hole:

This constructor is not deprecated, but it uses a super() call that is
deprecated. Also, this constructor is not used anywhere. I nominate it
for deprecation as well.

SolrIndexWriter.java, around line 170
  /**
   *
   */
  public SolrIndexWriter(String name, String path, DirectoryFactory
dirFactory, boolean create, IndexSchema schema) throws IOException {
super(getDirectory(path, dirFactory, null), false,
schema.getAnalyzer(), create);
init(name, schema, null);
  }


On 3/9/10, Chris Hostetter  wrote:
>
> : Ok I think I know where the problem is
>   ...
> : It's  the constructor used by SolrCore  in r772051
>
> Ughhh... so to be clear: you haven't been using Solr 1.4 at any point in
> this thread?
>
> that explains why no one else could recreate the problem you were
> describing.
>
> For future refrence: if you aren't using the most recently
> released version of Solr when you post a question about a possible bug,
> please make that very clear right up at the top of your message, and if
> you think you've found a bug, pelase make sure to test against the most
> recently released version to see if it's already been fixed.
>
> : PS : should I fill some kind of bug report even if everything is ok now ?
> (I'm
> : asking because I didn't see anything related to this problem in JIRA, so
> maybe
> : if you want to keep a trace...)
>
> If you can recreate the problem using Solr 1.3, then feel free to file a
> bug, noting that it was only a problem in 1.3, but has already been fixed
> in 1.4 ... but we don't usually bother tracking bugs against arbitrary
> unlreased points from the trunk (unless they are current).  I'm sure there
> are lots of bugs that existed only transiently as features were being
> fleshed out.
>
>
> -Hoss
>
>


-- 
Lance Norskog
goks...@gmail.com


Re: Multi valued fields

2010-03-14 Thread Lance Norskog
This could be done with a function query, except that the function I
would use does not exist.  There is no function that returns the
number of values that exist for a field. If there were, you could say:

-field:A OR (field:A and function() > 1)

I don't know the Lucene data structures well, but I suspect this would
be incredibly expensive to calculate.

On 3/11/10, Jean-Sebastien Vachon  wrote:
> Hi All,
>
> I'd like to know if it is possible to do the following on a multi-value
> field:
>
> Given the following data:
>
> document A:  field1   = [ A B C D]
> document B:  field 1  = [A B]
> document C:  field 1  = [A]
>
> Can I build a query such as :
>
>   -field: A
>
> which will return all documents that do not have "exclusive" A in the their
> field's values. By exclusive I mean that I don't want documents that only
> have A in their list of values. In my sample case, the query would return
> doc A and B.
> Because they both have other values in field1.
>
> It this kind of query possible with Solr/Lucene?
>
> Thanks
>
>
>
>


-- 
Lance Norskog
goks...@gmail.com