How to handle database replication delay when using DataImportHandler?

2009-01-28 Thread Gregg
I'd like to use the DataImportHandler running against a slave database that,
at any given time, may be significantly behind the master DB. This can cause
updates to be missed if you use the clock-time as the "last_index_time."
E.g., if the slave catches up to the master between two delta-imports.

Has anyone run into this? In our non-DIH indexing system we get around this
by either using the slave DB's seconds-behind-master or the max last update
time of the records returned.

Thanks.

Gregg


SolrUpdateServlet Warning

2008-09-23 Thread Gregg
I've got a small configuration question. When posting docs via SolrJ, I get
the following warning in the Solr logs:

WARNING: The @Deprecated SolrUpdateServlet does not accept query parameters:
wt=xml&version=2.2
  If you are using solrj, make sure to register a request handler to /update
rather then use this servlet.
  Add: 
to your solrconfig.xml

I have an update handler configured in solrconfig.xml as follows:



What's the preferred solution? Should I comment out the SolrUpdateServlet in
solr's web.xml? My Solr server is running at /solr, if that helps.

Thanks.

Gregg


Re: SolrUpdateServlet Warning

2008-09-23 Thread Gregg
This turned out to be a fairly pedestrian bug on my part: I had "/update"
appended to the Solr base URL when I was adding docs via SolrJ.

Thanks for the help.

--Gregg

On Tue, Sep 23, 2008 at 12:42 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

>
> On Sep 23, 2008, at 12:35 PM, Gregg wrote:
>
>  I've got a small configuration question. When posting docs via SolrJ, I
>> get
>> the following warning in the Solr logs:
>>
>> WARNING: The @Deprecated SolrUpdateServlet does not accept query
>> parameters:
>> wt=xml&version=2.2
>>  If you are using solrj, make sure to register a request handler to
>> /update
>> rather then use this servlet.
>>  Add: > >
>> to your solrconfig.xml
>>
>> I have an update handler configured in solrconfig.xml as follows:
>>
>> 
>>
>>
> are you sure?
>
> check http://localhost:8983/solr/admin/stats.jsp
> and search for XmlUpdateRequestHandler
> make sure it is registered to /update
>
>
>  What's the preferred solution? Should I comment out the SolrUpdateServlet
>> in
>> solr's web.xml? My Solr server is running at /solr, if that helps.
>>
>>
> that will definitely work, but it should not be necessary to crack open the
> .war file.
>
>
> ryan
>


Difficulty with Multi-Word Synonyms

2009-09-14 Thread Gregg Donovan
I'm running into an odd issue with multi-word synonyms in Solr (using
the latest [9/14/09] nightly ). Things generally seem to work as
expected, but I sometimes see words that are the leading term in a
multi-word synonym being replaced with the token that follows them in
the stream when they should just be ignored (i.e. there's no synonym
match for just that token). When I preview the analysis at
admin/analysis.jsp it looks fine, but at runtime I see problems like
the one in the unit test below. It's a simple case, so I assume I'm
making some sort of configuration and/or usage error.

package org.apache.solr.analysis;
import java.io.*;
import java.util.*;
import org.apache.lucene.analysis.WhitespaceTokenizer;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;

public class TestMultiWordSynonmys extends junit.framework.TestCase {

  public void testMultiWordSynonmys() throws IOException {
    List rules = new ArrayList();
    rules.add( "a b c,d" );
    SynonymMap synMap = new SynonymMap( true );
    SynonymFilterFactory.parseRules( rules, synMap, "=>", ",", true, null);

    SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new
StringReader("a e")), synMap );
    TermAttribute termAtt = (TermAttribute)
ts.getAttribute(TermAttribute.class);

    ts.reset();
    List tokens = new ArrayList();
    while (ts.incrementToken()) tokens.add( termAtt.term() );

// This fails because ["e","e"] is the value of the token stream
    assertEquals(Arrays.asList("a","e"), tokens);
  }
}

Any help would be much appreciated. Thanks.

--Gregg


Re: Difficulty with Multi-Word Synonyms

2009-09-17 Thread Gregg Donovan
Thanks. And thanks for the help -- we're hoping to switch from query-time to
index-time synonym expansion for all of the reasons listed on the
wiki<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46>,
so this will be great to resolve.

I created SOLR-1445 <https://issues.apache.org/jira/browse/SOLR-1445>,
though the problem seems to be caused by
LUCENE-1919<https://issues.apache.org/jira/browse/LUCENE-1919>,
as you noted.

Is there a recommended workaround that avoids combining the new and old
APIs? Would a version of SynonymFilter that also implemented
incrementToken() be helpful?

--Gregg

On Thu, Sep 17, 2009 at 7:38 PM, Yonik Seeley wrote:

> On Thu, Sep 17, 2009 at 6:29 PM, Lance Norskog  wrote:
> > Please add a Jira issue for this. It will get more attention there.
> >
> > BTW, thanks for creating such a precise bug report.
>
> +1
>
> Thanks, I had missed this.  This is serious, and looks due to a Lucene
> back compat break.
> I've added the testcase and can confirm the bug.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> > On Mon, Sep 14, 2009 at 1:52 PM, Gregg Donovan 
> wrote:
> >> I'm running into an odd issue with multi-word synonyms in Solr (using
> >> the latest [9/14/09] nightly ). Things generally seem to work as
> >> expected, but I sometimes see words that are the leading term in a
> >> multi-word synonym being replaced with the token that follows them in
> >> the stream when they should just be ignored (i.e. there's no synonym
> >> match for just that token). When I preview the analysis at
> >> admin/analysis.jsp it looks fine, but at runtime I see problems like
> >> the one in the unit test below. It's a simple case, so I assume I'm
> >> making some sort of configuration and/or usage error.
> >>
> >> package org.apache.solr.analysis;
> >> import java.io.*;
> >> import java.util.*;
> >> import org.apache.lucene.analysis.WhitespaceTokenizer;
> >> import org.apache.lucene.analysis.tokenattributes.TermAttribute;
> >>
> >> public class TestMultiWordSynonmys extends junit.framework.TestCase {
> >>
> >>   public void testMultiWordSynonmys() throws IOException {
> >> List rules = new ArrayList();
> >> rules.add( "a b c,d" );
> >> SynonymMap synMap = new SynonymMap( true );
> >> SynonymFilterFactory.parseRules( rules, synMap, "=>", ",", true,
> null);
> >>
> >> SynonymFilter ts = new SynonymFilter( new WhitespaceTokenizer( new
> >> StringReader("a e")), synMap );
> >> TermAttribute termAtt = (TermAttribute)
> >> ts.getAttribute(TermAttribute.class);
> >>
> >> ts.reset();
> >> List tokens = new ArrayList();
> >> while (ts.incrementToken()) tokens.add( termAtt.term() );
> >>
> >>// This fails because ["e","e"] is the value of the token stream
> >> assertEquals(Arrays.asList("a","e"), tokens);
> >>   }
> >> }
> >>
> >> Any help would be much appreciated. Thanks.
> >>
> >> --Gregg
> >>
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
>


solrj query size limit?

2009-11-02 Thread Gregg Horan
I'm constructing a query using solrj that has a fairly large number of 'OR'
clauses.  I'm just adding it as a big string to setQuery(), in the format
"accountId:(this OR that OR yada)".

This works all day long with 300 values.  When I push it up to 350-400
values, I get a "Bad Request" SolrServerException.  It appears to just be a
client error - nothing reaching the server logs.  Very repeatable... dial it
back down and it goes through again fine.

The total string length of the query (including a handful of other faceting
entries) is about 9500chars.   I do have the maxBooleanClauses jacked up to
2048.  Using javabin.  1.4-dev.

Are there any other options or settings I might be overlooking?

-Gregg


Re: solrj query size limit?

2009-11-03 Thread Gregg Horan

That was it.  Didn't see that optional parameter - the POST works.

Thanks!


On Nov 3, 2009, at 1:57 AM, Avlesh Singh wrote:

Did you hit the limit for maximum number of characters in a GET  
request?


Cheers
Avlesh

On Tue, Nov 3, 2009 at 9:36 AM, Gregg Horan   
wrote:


I'm constructing a query using solrj that has a fairly large number  
of 'OR'
clauses.  I'm just adding it as a big string to setQuery(), in the  
format

"accountId:(this OR that OR yada)".

This works all day long with 300 values.  When I push it up to  
350-400
values, I get a "Bad Request" SolrServerException.  It appears to  
just be a
client error - nothing reaching the server logs.  Very  
repeatable... dial

it
back down and it goes through again fine.

The total string length of the query (including a handful of other  
faceting
entries) is about 9500chars.   I do have the maxBooleanClauses  
jacked up to

2048.  Using javabin.  1.4-dev.

Are there any other options or settings I might be overlooking?

-Gregg





Re: How to handle database replication delay when using DataImportHandler?

2009-01-29 Thread Gregg Donovan
Noble,

Thanks for the suggestion. The unfortunate thing is that we really don't
know ahead of time what sort of replication delay we're going to encounter
-- it could be one millisecond or it could be one hour. So, we end up
needing to do something like:

For delta-import run N:
1. query DB slave for "seconds_behind_master", use this to calculate
Date(N).
2. query DB slave for records updated since Date(N - 1)

I see there are plugin points for EventListener classes (onImportStart,
onImportEnd). Would those be the right spot to calculate these dates so that
I could expose them to my custom function at query time?

Thanks.

--Gregg

On Wed, Jan 28, 2009 at 11:20 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> The problem you are trying to solve is that you cannot use
> ${dataimporter.last_index_time} as is. you may need something like
> ${dataimporter.last_index_time} - 3secs
>
> am I right?
>
> There are no straight ways to do this .
> 1) you may write your own function say 'lastIndexMinus3Secs' and add
> them. functions can be plugged in to DIH using a  name="lastIndexMinus3Secs" class=""foo.Foo/> under the 
> tag. And you can use it as
> ${dataimporter.functions.lastIndexMinus3Secs()}
> this will add to the existing in-built functions
>
> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>
> the class must extend org.apache.solr.handler.dataimport.Evaluator
>
> we may add a standard function for this too . you can raise an issue
> --Noble
>
>
>
> On Thu, Jan 29, 2009 at 6:26 AM, Gregg  wrote:
> > I'd like to use the DataImportHandler running against a slave database
> that,
> > at any given time, may be significantly behind the master DB. This can
> cause
> > updates to be missed if you use the clock-time as the "last_index_time."
> > E.g., if the slave catches up to the master between two delta-imports.
> >
> > Has anyone run into this? In our non-DIH indexing system we get around
> this
> > by either using the slave DB's seconds-behind-master or the max last
> update
> > time of the records returned.
> >
> > Thanks.
> >
> > Gregg
> >
>
>
>
> --
> --Noble Paul
>


1.4 Replication

2009-05-27 Thread Matthew Gregg
Does replication in 1.4 support passing credentials/basic auth?  If not
what is the best option to protect replication?



Re: 1.4 Replication

2009-05-27 Thread Matthew Gregg
On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg  
> wrote:
> > Does replication in 1.4 support passing credentials/basic auth?  If not
> > what is the best option to protect replication?
> do you mean protecting the url /replication ?
Yes I would like to put /replication behind basic auth, which I can do,
but replication fails.  I naively tried the obvious
http://user:p...@host/replication, but that fails.

> 
> ideally Solr is expected to run in an unprotected environment. if you
> wish to introduce some security it has to be built by you.
> >
> >
I guess you meant Solr is expected to run in a "protected" environment?
It's pretty easy to put up a basic auth in front of Solr, but the
replication infra. in 1.4 doesn't seem to support it. Or does it, and I
just don't know how?

-- 
Matthew Gregg 



Re: 1.4 Replication

2009-05-27 Thread Matthew Gregg
I would like the to protect both reads and writes. Reads could have a
significant impact.  I guess the answer is no, replication has no built
in security?

On Wed, 2009-05-27 at 20:11 +0530, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> The question is what all do you wish to protect.
> There are 'read' as well as 'write' attributes .
> 
> The reads are the ones which will not cause any harm other than
> consuming some cpu cycles.
> 
> The writes are the ones which can change the state of the system.
> 
> The slave uses the 'read' API's which i feel may not need to be protected
> 
> The other API's methods can have security . say dnappull, diableSnapPoll etc
> 
> 
> 
> On Wed, May 27, 2009 at 7:47 PM, Matthew Gregg  
> wrote:
> > On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >> On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg  
> >> wrote:
> >> > Does replication in 1.4 support passing credentials/basic auth?  If not
> >> > what is the best option to protect replication?
> >> do you mean protecting the url /replication ?
> > Yes I would like to put /replication behind basic auth, which I can do,
> > but replication fails.  I naively tried the obvious
> > http://user:p...@host/replication, but that fails.
> >
> >>
> >> ideally Solr is expected to run in an unprotected environment. if you
> >> wish to introduce some security it has to be built by you.
> >> >
> >> >
> > I guess you meant Solr is expected to run in a "protected" environment?
> > It's pretty easy to put up a basic auth in front of Solr, but the
> > replication infra. in 1.4 doesn't seem to support it. Or does it, and I
> > just don't know how?
> >
> > --
> > Matthew Gregg 
> >
> >
> 
> 
> 
-- 
Matthew Gregg 



Re: 1.4 Replication

2009-05-27 Thread Matthew Gregg
That is disappointing then.  Restricting by IP may be doable, but much
more work than basic auth.

On Wed, 2009-05-27 at 20:41 +0530, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> replication has no builtin security
> 
> 
> 
> On Wed, May 27, 2009 at 8:37 PM, Matthew Gregg  
> wrote:
> > I would like the to protect both reads and writes. Reads could have a
> > significant impact.  I guess the answer is no, replication has no built
> > in security?
> >
> > On Wed, 2009-05-27 at 20:11 +0530, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >> The question is what all do you wish to protect.
> >> There are 'read' as well as 'write' attributes .
> >>
> >> The reads are the ones which will not cause any harm other than
> >> consuming some cpu cycles.
> >>
> >> The writes are the ones which can change the state of the system.
> >>
> >> The slave uses the 'read' API's which i feel may not need to be protected
> >>
> >> The other API's methods can have security . say dnappull, diableSnapPoll 
> >> etc
> >>
> >>
> >>
> >> On Wed, May 27, 2009 at 7:47 PM, Matthew Gregg  
> >> wrote:
> >> > On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >> >> On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg 
> >> >>  wrote:
> >> >> > Does replication in 1.4 support passing credentials/basic auth?  If 
> >> >> > not
> >> >> > what is the best option to protect replication?
> >> >> do you mean protecting the url /replication ?
> >> > Yes I would like to put /replication behind basic auth, which I can do,
> >> > but replication fails.  I naively tried the obvious
> >> > http://user:p...@host/replication, but that fails.
> >> >
> >> >>
> >> >> ideally Solr is expected to run in an unprotected environment. if you
> >> >> wish to introduce some security it has to be built by you.
> >> >> >
> >> >> >
> >> > I guess you meant Solr is expected to run in a "protected" environment?
> >> > It's pretty easy to put up a basic auth in front of Solr, but the
> >> > replication infra. in 1.4 doesn't seem to support it. Or does it, and I
> >> > just don't know how?
> >> >
> >> > --
> >> > Matthew Gregg 
> >> >
> >> >
> >>
> >>
> >>
> > --
> > Matthew Gregg 
> >
> >
> 
> 
> 
-- 
Matthew Gregg 



Re: 1.4 Replication

2009-05-27 Thread Matthew Gregg
Bug filed.  Thankyou.
On Wed, 2009-05-27 at 22:40 +0530, Shalin Shekhar Mangar wrote:
> On Wed, May 27, 2009 at 9:01 PM, Matthew Gregg wrote:
> 
> > That is disappointing then.  Restricting by IP may be doable, but much
> > more work than basic auth.
> >
> >
> The beauty of open source is that this can be changed :)
> 
> Please open an issue, we can have basic http authentication made
> configurable.
> 
-- 
Matthew Gregg 



Help on spelling.

2010-09-09 Thread Gregg Hoshovsky
I am trying to use the spellchecker but cannot get past the point of having the 
spelling possibilities returned.

I have a text field define in the schema.xml file as:

   

I modified solrconfig.xml to point the analyzer to the same field type and have 
the name set the same.

  

text_ws


  default
  text
  ./spellchecker



I left the handler alone

  


I see that the spellchecker folder gets files built so I am assuming that the 
spelling data is being created

Then I ran the query as
http://localhost:8983/solr/biolibrary/spell/?q=text:wedg&version=2.2&start=0&rows=10&indent=on&wt=json

I would expect that this would have returned some spelling suggestions ( such 
as wedge) but don’t get anything besides:

{
 "responseHeader":{
  "status":0,
  "QTime":1},
 "response":{"numFound":0,"start":0,"docs":[]
 }}

Any help is appreciated.

Gregg



Re: Help on spelling.

2010-09-09 Thread Gregg Hoshovsky


Okay putting "spellcheck=true" makes all the difference in the world.

 Thanks


On 9/9/10 1:58 PM, "Markus Jelsma"  wrote:

> I don't see you passing spellcheck parameters in the query string. Are they
> configured as default in your search handler?
>  
> -----Original message-
> From: Gregg Hoshovsky 
> Sent: Thu 09-09-2010 22:40
> To: solr-user@lucene.apache.org;
> Subject: Help on spelling.
> 
> I am trying to use the spellchecker but cannot get past the point of having
> the spelling possibilities returned.
> 
> I have a text field define in the schema.xml file as:
> 
>    multiValued="true"/>
> 
> I modified solrconfig.xml to point the analyzer to the same field type and
> have the name set the same.
> 
>  
> 
>    text_ws
> 
>    
>      default
>      text
>      ./spellchecker
>    
> 
> 
> I left the handler alone
> 
>  
>    
> 
> I see that the spellchecker folder gets files built so I am assuming that the
> spelling data is being created
> 
> Then I ran the query as
> http://localhost:8983/solr/biolibrary/spell/?q=text:wedg&version=2.2&start=0&r
> ows=10&indent=on&wt=json
> 
> I would expect that this would have returned some spelling suggestions ( such
> as wedge) but don t get anything besides:
> 
> {
> "responseHeader":{
>  "status":0,
>  "QTime":1},
> "response":{"numFound":0,"start":0,"docs":[]
> }}
> 
> Any help is appreciated.
> 
> Gregg
> 



Sorting and filtering on fluctuating multi-currency price data?

2010-10-20 Thread Gregg Donovan
In our current search app, we have sorting and filtering based on item
prices. We'd like to extend this to support sorting and filtering in the
buyer's native currency with the items themselves listed in the seller's
native currency. E.g: as a buyer, if my native currency is the Euro, my
search of all items between 10 and 20 Euros would also find all items listed
in USD between 13.90 and 27.80, in CAD between 14.29 and 28.58, etc.

I wanted to run a few possible approaches by the list to see if we were on
the right track or not. Our index is updated every few minutes, but we only
update our currency conversions every few hours.

The easiest approach would be to update the documents with non-USD listings
every few hours with the USD-converted price. That will be fine, but if the
number of non-USD listings is large, this would be too expensive (i.e. large
parts of the index getting recreated frequently).

Another approach would be to use ExternalFileField and keep the price data,
normalized to USD, outside of the index. Every time the currency rates
changed, we would calculate new normalized prices for every document in the
index.

Still another approach would be to do the currency conversion at IndexReader
warmup time. We would index native price and currency code and create a
normalized currency field on the fly. This would be somewhat like
ExternalFileField in that it involved data from outside the index, but it
wouldn't need to be scoped to the parent SolrIndexReader, but could be
per-segment. Perhaps a custom poly-field could accomplish something like
this?

Has anyone dealt with this sort of problem? Do any of these approaches sound
more or less reasonable? Are we missing anything?

Thanks for the help!

Gregg Donovan
Technical Lead, Search
Etsy.com


Re: Getting solr response data in a JS query

2010-01-11 Thread Gregg Hoshovsky
You might be running into  an Ajax restriction.

See if an article like this helps.


http://www.nathanm.com/ajax-bypassing-xmlhttprequest-cross-domain-restriction/


On 1/9/10 11:37 PM, "Otis Gospodnetic"  wrote:

Dan,

You didn't mention whether you tried &wt=json .  Does it work if you use that 
to tell Solr to return its response in JSON format?

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: Dan Yamins 
> To: solr-user@lucene.apache.org
> Sent: Sat, January 9, 2010 10:05:54 PM
> Subject: Getting solr response data in a JS query
>
> Hi:
>
> I'm trying to use figure out how to get solr responses and use them in my
> website.I'm having some problems figure out how to
>
> 1) My initial thought is is to use ajax, and insert a line like this in my
> script:
>
>  data = eval($.get("http://localhost:8983/solr/select/?q=*:*
> ").responseText)
>
> ... and then do what I want with the data, with logic being done in
> Javascript on the front page.
>
> However, this is just not working technically:  no matter what alternative I
> use, I always seem to get no response to this query.  I think I'm having
> exactly the same problem as described here:
>
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html<%20http://www.mail-archive.com/solr-user@lucene.apache.org/msg29949.html>
>
> and here:
>
> http://stackoverflow.com/questions/1906498/solr-responses-to-webbrowser-url-but-not-from-javascript-code
>
> Just like those two OPs, I can definitely access my solr responese through a
> web browser, but my jquery is getting nothing.Unfortunately, in neither
> thread did the answer seem to have been figured out satisfactorily.   Does
> anybody know what the problem is?
>
>
> 2)  As an alternative, I _can_ use  the ajax-solr library.   Code like this:
>
> var Manager;
> (function ($) {
>   $(function () {
> Manager = new AjaxSolr.Manager({
>   solrUrl: 'http://localhost:8983/solr/'
>});
>
>   Manager.init();
>   Manager.store.addByValue('q', '*:*');
>   Manager.store.addByValue('rows', '1000');
>   Manager.doRequest();
>   });
> })(jQuery);
>
> does indeed load solr data into my DOM.Somehow, ajax-solr's doRequest
> method is doing something that makes it possible to receive the proper
> response from the solr servlet, but I don't know what it is so I can't
> replicate it with my own ajax.   Does anyone know what is happening?
>
> (Of course, I _could_ just use ajax-solr, but doing so would mean figuring
> out how to re-write my existing application for how to display search
> results in a form that works with the ajax-solr api, and I' d rather avoid
> this if possible since it looks somewhat nontrivial.)
>
>
> Thanks!
> Dan




How to return filtered tokens as query results?

2010-02-04 Thread Gregg Horan
Is there a way to return Solr's analyzed/filtered tokens from a query,
rather than the original indexed data?  (Ideally at a fairly high level like
solrj).

Thanks


Re: How to return filtered tokens as query results?

2010-02-05 Thread Gregg Horan
On Fri, Feb 5, 2010 at 2:31 AM, Ahmet Arslan  wrote:

>
> > Is there a way to return Solr's
> > analyzed/filtered tokens from a query,
> > rather than the original indexed data?  (Ideally at a
> > fairly high level like
> > solrj).
>
> TermVectorComponent [1] can do that.
>
> [1]http://wiki.apache.org/solr/TermVectorComponent
>
>
Excellent!  The wiki seems to imply that if particular stats (tv.df,
tv.tf_idf in particular) aren't requested, then you don't incur the overhead
of them being calculated (ie. it's not a all or nothing request if
tv=true).  I really don't need any of that info... just the terms.  Any idea
if that's actually the case?

Thanks for the response


Re: DIH field options

2010-03-13 Thread Gregg Hoshovsky
You can use mysql , select *, “staticdata” as staticdata from table x.
As long as your  field name is staticdata, this should add it there.


On 3/12/10 8:39 AM, "Tommy Chheng"  wrote:

 Haven't tried this myself but try adding a default value  and don't
specify it during the import.
http://wiki.apache.org/solr/SchemaXml


On 3/12/10 7:56 AM, blargy wrote:
> Forgive me but I'm slightly retarded... I grew up underneath some power lines
> ;)
>
> I've read through that wiki but I still can't find what I'm looking for. I
> just want to give one of the DIH entities/fields a static value (ie it
> doesnt come from a database column). How can I configure this?
>
> FYI this is data-config.xml not schema.xml.
>
>
>  
>
>
>  
>
>
>
>
>
> Tommy Chheng-4 wrote:
>>The wiki page has most of the info you need
>> *http://wiki*.apache.org/*solr*/DataImportHandler
>>
>> To use multi-value fields, your schema.xml must define it with
>> multiValued="true"
>>
>>
>> On 3/11/10 10:58 PM, blargy wrote:
>>> How can you simply add a static value like?>> value="123"/>
>>> How does one add a static multi-value field?>> values="123, 456"/>
>>>
>>> Is there any documentation on all the options for the field tag in
>>> data-config.xml?
>>>
>>> Thanks for the help
>> --
>> Tommy Chheng
>> Programmer and UC Irvine Graduate Student
>> Twitter @tommychheng
>> http://tommy.chheng.com
>>
>>
>>

--
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com




Highlight question

2010-06-23 Thread Gregg Hoshovsky
I just started working with the highlighting.  I am using the default 
configurations. I have a field that I can get a single highlight to occur 
marking the data.

What I would like to do is this,

Given a word say 'tumor', and the sentence

" the lower tumor grew 1.5 cm. blah blah blah  we need to remove the tumor in 
the next surgery"

I would like to get ."... the lower tumor grew 1.5 cm . blah blah 
blah  we need to ... remove the tumor in the next . surgery"

Thus finding multiple references to the work and  only grabbing a few words 
around it.



In the solrconfig.xml I have been able to change the hl.simple.pre/post 
variable, but when I try to change the hl,regex pattern or the hl.snippets they 
don't have any effect. I thought the hl.snippets would alow me to find more 
than one and highlight it, and well I tried a bunch of regex patterns but they 
didn't do anything.

here is a snippet of the config file.

Any help is appreciated.

Gregg


   
   

  
  4  70
  
  0.2
  
  [-\w ,/\n\"']{1,1}

   

   
   

  4
 100
 
 




Re: Solr and NLP

2010-07-02 Thread Gregg Hoshovsky
I saw mention earlier about a way to link in openNLP into solr ( 
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-Findability-Lucene-and-Solr)

.I haven't followed up on that yet so I don't know much about  it. However if 
you do figure anything out please share your findings.  I will have to venture 
down this path someday myself.


Gregg



On 7/2/10 8:15 AM, "Moazzam Khan"  wrote:

Hi guys,

Is there a way I can make Solr work with an NLP application? Are there
any NLP applications that will work with Solr? Can someone please
point me to a tutorial or something if it's possible.

Thanks,

Moazzam



Good time for an upgrade to Solr/Lucene trunk?

2011-06-21 Thread Gregg Donovan
We (Etsy.com) are currently using a version of trunk from mid-October 2010
(SVN tag 1021515, to be exact). We'd like to upgrade to the current trunk
and are wondering if this is a good time. Is the new stuff (esp. DocValues)
stable? Are any other major features or performance improvements about to
land on trunk that are worth waiting a few weeks for?

Thanks for the guidance!

--Gregg

Gregg Donovan
Technical Lead, Search, Etsy.com
gr...@etsy.com