from:"Charlie Jackson"

Field compression

2011-04-15 Thread Charlie Jackson

I know I'm late to the party, but I recently learned that field compression was 
removed as of Solr 1.4.1. I think a lot of sites were relying on that feature, 
so I'm curious what people are doing now that it's gone. Specifically, what are 
people doing to efficiently store *and highlight* large fulltext fields? I can 
think of ways to store the text efficiently (compress it myself), or highlight 
it (leave it uncompressed), but not both at the same time.

Also, is anyone working on anything to restore compression to Solr? I 
understand it was removed because Lucene removed support for it, but I was 
hoping to upgrade my site to 3.1 soon and we rely on that feature.

- Charlie

RE: How to extend IndexSchema and SchemaField

2010-09-10 Thread Charlie Jackson

Have you already explored the idea of using a custom analyzer for your
field? Depending on your use case, that might work for you.

- Charlie

Status of Solr in the cloud?

2010-08-26 Thread Charlie Jackson

There seem to be a few parallel efforts at putting Solr in a cloud
configuration. See http://wiki.apache.org/solr/KattaIntegration, which
is based off of https://issues.apache.org/jira/browse/SOLR-1395. Also
http://wiki.apache.org/solr/SolrCloud which is
https://issues.apache.org/jira/browse/SOLR-1873. And another JIRA:
https://issues.apache.org/jira/browse/SOLR-1301. 

 

These all seem aimed at the same goal, correct? I'm interested in
evaluating one of these solutions for my company; which is the most
stable or most likely to eventually be part of the Solr distribution?

 

 

Thanks,

Charlie

Allow custom overrides

2010-07-23 Thread Charlie Jackson

I need to implement a search engine that will allow users to override
pieces of data and then search against or view that data. For example, a
doc that has the following values:

 

DocId   FulltextMeta1  Meta2 Meta3

1   The quick brown fox foofoo   foo

 

Now say a user overrides Meta2 :

 

DocId   FulltextMeta1  Meta2 Meta3

1   The quick brown fox foofoo   foo

   bar

 

For that user, if they search for Meta2:bar, I need to hit, but no other
user should hit on it. Likewise, if that user searches for Meta2:foo, it
should not hit. Also, any searches against that document for that user
should return the value 'bar' for Meta2, but should return 'foo' for
other users.  

 

I'm not sure the best way to implement this. Maybe I could do this with
field collapsing somehow? Or with payloads? Custom analyzer? Any help
would be appreciated.

 

 

- Charlie

RE: Odd query result

2010-04-20 Thread Charlie Jackson

I'll take another look and see if it makes sense to have the index and
query time parameters the same or different.

As far as the initial issue, I think you're right Tom, it is hitting on
both. I think what threw me off was the highlighting -- in one of my
matching documents, the term "I-CAR" is highlighted, but I think it
actually hit on the term "ISHIN-I (car" which is also in the document.

The debug output for my query is 

ft:I-Car
ft:I-Car
+MultiPhraseQuery(ft:"i (car icar)")
+ft:"i (car icar)"

Thanks!

-Original Message-
From: Tom Hill [mailto:solr-l...@worldware.com] 
Sent: Tuesday, April 20, 2010 2:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Odd query result

I agree that, if they are the same, you want to merge them.

In this case, I don't think you want them to be the same. In particular,
you
usually don't want to catenateWords and catenateNumbers both index time
AND
at query time. You generate the permutations on one, or the other, but
you
don't need to do it for both. I usually do it at index time

Tom

On Tue, Apr 20, 2010 at 11:29 AM, MitchK  wrote:

>
> It has nothing to do with your problem, since it seems to work when
Tom
> tested it.
> However, it seems like you are using the same configurations on query-
and
> index-type analyzer.
> If you did not hide anything from (for example own
filter-implementations),
> because you don't want to confuse us, you can just delete the
definitions
> "type=index" and "type=query". If you do so, the whole
> fieldType-filter-configuration will be applied on both: index- and
> query-time. There is no need to specify two equal ones.
>
> I think this would be easier to maintain in future :).
>
> Kind regards
> - Mitch
>
> -->
>  
>
>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>
>
>ignoreCase="true"
>
>words="stopwords.txt"
>
>enablePositionIncrements="true"
>
>/>
>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
>
>
>
>
>  
> --
> View this message in context:
> http://n3.nabble.com/Odd-query-result-tp732958p733095.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Odd query result

2010-04-20 Thread Charlie Jackson

I've got an odd scenario with a query a user's running. The user is
searching for the term "I-Car". It will hit if the document contains the
term "I-CAR" (all caps) but not if it's "I-Car".  When I throw the terms
into the analysis page, the resulting tokens look identical, and my
"I-Car" tokens hit on either term. 

 

Here's the definition of the field:

 



  











  

  













  



 

I'm pretty sure this has to do with the settings on the
WordDelimiterFactory, but I must be missing something because I don't
see anything that would cause the behavior I'm seeing.

RE: HTTP caching and distributed search

2010-02-09 Thread Charlie Jackson

I tried your suggestion, Hoss, but committing to the new coordinator
core doesn't change the indexVersion and therefore the ETag value isn't
changed.

I opened a new JIRA issue for this
http://issues.apache.org/jira/browse/SOLR-1765


Thanks,
Charlie


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, February 04, 2010 2:16 PM
To: solr-user@lucene.apache.org
Subject: Re: HTTP caching and distributed search


: >
http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards
: > =localhost:8080/solr/core1,localhost:8080/solr/core2

: You are right, etag is calculated using the searcher on core1 only and
it
: does not take other shards into account. Can you open a Jira issue?

...as a possible work arround i would suggest creating a seperate 
"coordinator" core that is neither core1 nor core2 ... it doesn't have
to 
have any docs in it, it just has to have consistent schemas with the
other 
two cores.  That way you can use a distinct  settings on 
the coordinator core (perhaps never304="true" but with an explicit 
 setting? ... or lastModifiedFrom="openTime" and then you

could send an explicit "commit" to the (empty) coordinator core anytime 
you modify one of the shards.



-Hoss

HTTP caching and distributed search

2010-02-02 Thread Charlie Jackson

Currently, I've got a Solr setup in which we're distributing searches
across two cores on a machine, say core1 and core2. I'm toying with the
notion of enabling Solr's HTTP caching on our system, but I noticed an
oddity when using it in combination with distributed searching. Say, for
example, I have this query:

 

http://localhost:8080/solr/core1/select/?q=google&start=0&rows=10&shards
=localhost:8080/solr/core1,localhost:8080/solr/core2

 

Both cores have HTTP caching enabled, and it seems to be working. First
time I run the query through Squid, it correctly sees it doesn't have
this cached and so requests it from Solr. Second time I request it, it
hits the Squid cache. That part works fine. 

 

Here's the problem. If I commit to core1, it changes the ETag value of
the request, which will invalidate the cache, as it should. But
committing to core2 doesn't, so I get the cached version back, even
though core2 has changed and the cache is stale. I'm guessing this is
because the request is going against core1, hence using core1's cache
values, but in a distributed search, it seems like it should be using
cache values from all cores in the shards parameter. Is this a known
issue, and if so, is there a patch for it?

 

Thanks,

Charlie

RE: Rounding dates on sort and filter

2010-01-19 Thread Charlie Jackson

Good point. So it doesn't sound like there's a way to do this without
adding a new field or reindexing. Thanks anyway. 

- Charlie


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, January 19, 2010 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Rounding dates on sort and filter

Charlie,

Query-time terms/tokens need to match what's in your index, and my guess
is that if you just altered query-time date field analysis, you'd get a
mismatch.  Easy enough to check through Solr Admin Analysis page.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message ----
> From: Charlie Jackson 
> To: solr-user@lucene.apache.org
> Sent: Tue, January 19, 2010 1:20:02 PM
> Subject: Rounding dates on sort and filter
> 
> I've got a legacy date field that I'd like to round for sorting and
> filtering. Right now, the index is large enough that sorting or
> filtering on a date field takes 10-20 seconds (unless it's cached). I
> know this is because the date field's precision is down to the
> millisecond, and I don't really need that level of precision for most
of
> my searches. So, is it possible to round my field at query time
without
> having to reindex the field or add a second one? 
> 
> 
> 
> I already tried the function sorting in 1.5-dev, but my field isn't a
> TrieDate field so I can't use the ms() function (which seems to allow
> date math unlike the other functions). 
> 
> 
> 
> Thanks,
> 
> Charlie

Rounding dates on sort and filter

2010-01-19 Thread Charlie Jackson

I've got a legacy date field that I'd like to round for sorting and
filtering. Right now, the index is large enough that sorting or
filtering on a date field takes 10-20 seconds (unless it's cached). I
know this is because the date field's precision is down to the
millisecond, and I don't really need that level of precision for most of
my searches. So, is it possible to round my field at query time without
having to reindex the field or add a second one? 

 

I already tried the function sorting in 1.5-dev, but my field isn't a
TrieDate field so I can't use the ms() function (which seems to allow
date math unlike the other functions). 

 

Thanks,

Charlie

RE: NGram query failing

2009-10-23 Thread Charlie Jackson


Well, I fixed my own problem in the end. For the record, this is the
schema I ended up going with:














I could have left it a trigram but went with a bigram because with this
setup, I can get queries to properly hit as long as the min/max gram
size is met. In other words, for any queries two or more characters
long, this works for me. Less than two characters and it fails. 

I don't know exactly why that is, but I'll take it anyway!

- Charlie


-Original Message-----
From: Charlie Jackson [mailto:charlie.jack...@cision.com] 
Sent: Friday, October 23, 2009 10:00 AM
To: solr-user@lucene.apache.org
Subject: NGram query failing

I have a requirement to be able to find hits within words in a free-form
id field. The field can have any type of alphanumeric data - it's as
likely it will be something like "123456" as it is to be "SUN-123-ABC".
I thought of using NGrams to accomplish the task, but I'm having a
problem. I set up a field like this

 









  



 

After indexing a field like this, the analysis page indicates my queries
should work. If I give it a sample field value of "ABC-123456-SUN" and a
query value of "45" it shows hits in several places, which is what I
expected.

 

However, when I actually query the field with something like "45" I get
no hits back. Looking at the debugQuery output, it looks like it's
taking my analyzed query text and putting it into a phrase query. So,
for a query of "45" it turns into a phrase query of :"4 5 45"
which then doesn't hit on anything in my index.

 

What am I missing to make this work?

 

- Charlie

NGram query failing

2009-10-23 Thread Charlie Jackson

I have a requirement to be able to find hits within words in a free-form
id field. The field can have any type of alphanumeric data - it's as
likely it will be something like "123456" as it is to be "SUN-123-ABC".
I thought of using NGrams to accomplish the task, but I'm having a
problem. I set up a field like this

 









  



 

After indexing a field like this, the analysis page indicates my queries
should work. If I give it a sample field value of "ABC-123456-SUN" and a
query value of "45" it shows hits in several places, which is what I
expected.

 

However, when I actually query the field with something like "45" I get
no hits back. Looking at the debugQuery output, it looks like it's
taking my analyzed query text and putting it into a phrase query. So,
for a query of "45" it turns into a phrase query of :"4 5 45"
which then doesn't hit on anything in my index.

 

What am I missing to make this work?

 

- Charlie

RE: Sorting/paging problem

2009-10-01 Thread Charlie Jackson

Oops, the missing trailing Z was probably just a cut and paste error.

It might be tough to come up with a case that can reproduce it -- it's a
sticky issue. I'll post it if I can, though. 

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, September 29, 2009 6:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting/paging problem

: 2009-09-23T19:25:03.400Z
: 
: 2009-09-23T19:25:19.951
: 
: 2009-09-23T20:10:07.919Z

is that a cut/paste error, or did you really get a date back from Solr 
w/o the trailing "Z" ?!?!?!

...

: So, not only is the date sorting wrong, but the exact same document
: shows up on the next page, also still out of date order. I've seen the
: same document show up in 4-5 pages in some cases. It's always the last
: record on the page, too. If I change the page size, the problem seems
to

that is really freaking weird.  can you reproduce this in a simple 
example?  maybe an index that's small enough (and doesn't contain 
confidential information) that you could zip up and post online?

-Hoss

Sorting/paging problem

2009-09-24 Thread Charlie Jackson

I've run into a strange issue with my Solr installation. I'm running
queries that are sorting by a DateField field but from time to time, I'm
seeing individual records very much out of order. What's more, they
appear on multiple pages of my result set. Let me give an example.
Starting with a basic query, I sort on the date that the document was
added to the index and see these rows on the first page (I'm just
showing the date field here):

 

2009-09-23T19:24:47.419Z

2009-09-23T19:25:03.229Z

2009-09-23T19:25:03.400Z

2009-09-23T19:25:19.951

2009-09-23T20:10:07.919Z

 

Note how the last document's date jumps a bit. Not necessarily a
problem, but the next page looks this:

 

2009-09-23T19:26:16.022Z

2009-09-23T19:26:32.547Z

2009-09-23T19:27:45.470Z

2009-09-23T19:27:45.592Z

2009-09-23T20:10:07.919Z

 

So, not only is the date sorting wrong, but the exact same document
shows up on the next page, also still out of date order. I've seen the
same document show up in 4-5 pages in some cases. It's always the last
record on the page, too. If I change the page size, the problem seems to
disappear for a while, but then starts up again later. Also, running the
same query/queries later on doesn't show the same behavior. 

 

Could it be some sort of page boundary issue with the cache? Has anyone
else run into a problem like this? I'm using the Sept 22 nightly build. 

 

- Charlie

Availability during merge

2009-07-13 Thread Charlie Jackson

The wiki page for merging solr cores
(http://wiki.apache.org/solr/MergingSolrIndexes) mentions that the cores
being merged cannot be indexed to during the merge. What about the core
being merged *to*? In terms of the example on the wiki page, I'm asking
if core0 can add docs while core1 and core2 are being merged into it. 

 

Thanks,

- Charlie

RE: Entity extraction?

2008-10-27 Thread Charlie Jackson

Yeah, when they first mentioned it, my initial thought was "cool, but we don't 
need it." However, some of the higher ups in the company are saying we might 
want it at some point, so I've been asked to look into it. I'll be sure to let 
them know about the flaws in the concept, thanks for that info.

________
Charlie Jackson
[EMAIL PROTECTED]


-Original Message-
From: Walter Underwood [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 27, 2008 11:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Entity extraction?

The vendor mentioned entity extraction, but that doesn't mean you need it.
Entity extraction is a pretty specific technology, and it has been a
money-losing product at many companies for many years, going back to
Xerox ThingFinder well over ten years ago.

My guess is that very few people really need entity extraction.

Using EE for automatic taxonomy generation is even harder to get right.
At best, that is a way to get a starter set of categories that you can
edit. You will not get a production quality taxonomy automatically.

wunder

On 10/27/08 8:31 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote:

> True, though I may be able to convince the powers that be that it's worth the
> investment. 
> 
> There are a number of open source or free tools listed on the Wikipedia entry
> for entity extraction
> (http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) --
> does anyone have any experience with any of these?
> 
> 
> Charlie Jackson
> 312-873-6537
> [EMAIL PROTECTED]
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 27, 2008 10:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Entity extraction?
> 
> For the record, LingPipe is not free.  It's good, but it's not free.
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Rafael Rossini <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, October 24, 2008 6:08:14 PM
>> Subject: Re: Entity extraction?
>> 
>> Solr can do a simple facet seach like FAST, but the entity extraction
>> demands other tecnologies. I do not know how FAST does it but at the company
>> I´m working on (www.cortex-intelligence.com), we use a mix of statistical
>> and language-specific tasks to recognize and categorize entities in the
>> text. Ling Pipe is another tool (free) that does that too. In case you would
>> like to see a simple demo: http://www.cortex-intelligence.com/tech/
>> 
>> Rossini
>> 
>> 
>> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson
>>> wrote:
>> 
>>> During a recent sales pitch to my company by FAST, they mentioned entity
>>> extraction. I'd never heard of it before, but they described it as
>>> basically recognizing people/places/things in documents being indexed
>>> and then being able to do faceting on this data at query time. Does
>>> anything like this already exist in SOLR? If not, I'm not opposed to
>>> developing it myself, but I could use some pointers on where to start.
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> - Charlie
>>> 
>>> 
> 
> 
>

RE: Entity extraction?

2008-10-27 Thread Charlie Jackson

True, though I may be able to convince the powers that be that it's worth the 
investment. 

There are a number of open source or free tools listed on the Wikipedia entry 
for entity extraction 
(http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) -- 
does anyone have any experience with any of these? 


Charlie Jackson
312-873-6537
[EMAIL PROTECTED]

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 27, 2008 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Entity extraction?

For the record, LingPipe is not free.  It's good, but it's not free.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Rafael Rossini <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, October 24, 2008 6:08:14 PM
> Subject: Re: Entity extraction?
> 
> Solr can do a simple facet seach like FAST, but the entity extraction
> demands other tecnologies. I do not know how FAST does it but at the company
> I´m working on (www.cortex-intelligence.com), we use a mix of statistical
> and language-specific tasks to recognize and categorize entities in the
> text. Ling Pipe is another tool (free) that does that too. In case you would
> like to see a simple demo: http://www.cortex-intelligence.com/tech/
> 
> Rossini
> 
> 
> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson 
> > wrote:
> 
> > During a recent sales pitch to my company by FAST, they mentioned entity
> > extraction. I'd never heard of it before, but they described it as
> > basically recognizing people/places/things in documents being indexed
> > and then being able to do faceting on this data at query time. Does
> > anything like this already exist in SOLR? If not, I'm not opposed to
> > developing it myself, but I could use some pointers on where to start.
> >
> >
> >
> > Thanks,
> >
> > - Charlie
> >
> >

RE: Entity extraction?

2008-10-24 Thread Charlie Jackson

Thanks for the replies, guys, that gives me a good place to start looking. 

- Charlie

-Original Message-
From: Rogerio Pereira [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 24, 2008 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Entity extraction?

You can find more about this topic in this book availabe at amazon:
http://www.amazon.com/Building-Search-Applications-Lucene-Lingpipe/dp/0615204252/

2008/10/24 Rafael Rossini <[EMAIL PROTECTED]>

> Solr can do a simple facet seach like FAST, but the entity extraction
> demands other tecnologies. I do not know how FAST does it but at the
> company
> I´m working on (www.cortex-intelligence.com), we use a mix of statistical
> and language-specific tasks to recognize and categorize entities in the
> text. Ling Pipe is another tool (free) that does that too. In case you
> would
> like to see a simple demo: http://www.cortex-intelligence.com/tech/
>
> Rossini
>
>
> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson <
> [EMAIL PROTECTED]
> > wrote:
>
> > During a recent sales pitch to my company by FAST, they mentioned entity
> > extraction. I'd never heard of it before, but they described it as
> > basically recognizing people/places/things in documents being indexed
> > and then being able to do faceting on this data at query time. Does
> > anything like this already exist in SOLR? If not, I'm not opposed to
> > developing it myself, but I could use some pointers on where to start.
> >
> >
> >
> > Thanks,
> >
> > - Charlie
> >
> >
>



-- 
Regards,

Rogério (_rogerio_)

[Blog: http://faces.eti.br]  [Sandbox: http://bmobile.dyndns.org]  [Twitter:
http://twitter.com/ararog]

"Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
distribua e aprenda mais."
(http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento)

Entity extraction?

2008-10-24 Thread Charlie Jackson

During a recent sales pitch to my company by FAST, they mentioned entity
extraction. I'd never heard of it before, but they described it as
basically recognizing people/places/things in documents being indexed
and then being able to do faceting on this data at query time. Does
anything like this already exist in SOLR? If not, I'm not opposed to
developing it myself, but I could use some pointers on where to start.

 

Thanks,

- Charlie

RE: Shared index base

2008-02-26 Thread Charlie Jackson

How do you handle commits to the index? By that, I mean that Solr
recreates its searcher when you issue a commit, but only for the system
that does the commit. Wouldn't you be left with searchers on the other
machines that are stale? 

- Charlie


-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 26, 2008 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Shared index base

I hope so. I've found that every once in a while Solr 1.2 replication  
will die, from a temp-index file that seems to ham it up. Removing  
that file on all the servers fixes the issue though.

We'd like to be able to point all the servers at an NFS location for  
their index files, and use a single server to update it.

Thanks!

Matthew Runo
Software Developer
Zappos.com
702.943.7833

On Feb 26, 2008, at 9:39 AM, Alok Dhir wrote:

> Are you saying all the servers will use the same 'data' dir?  Is  
> that a supported config?
>
> On Feb 26, 2008, at 12:29 PM, Matthew Runo wrote:
>
>> We're about to do the same thing here, but have not tried yet. We  
>> currently run Solr with replication across several servers. So long  
>> as only one server is doing updates to the index, I think it should  
>> work fine.
>>
>>
>> Thanks!
>>
>> Matthew Runo
>> Software Developer
>> Zappos.com
>> 702.943.7833
>>
>> On Feb 26, 2008, at 7:51 AM, Evgeniy Strokin wrote:
>>
>>> I know there was such discussions about the subject, but I want to  
>>> ask again if somebody could share more information.
>>> We are planning to have several separate servers for our search  
>>> engine. One of them will be index/search server, and all others  
>>> are search only.
>>> We want to use SAN (BTW: should we consider something else?) and  
>>> give access to it from all servers. So all servers will use the  
>>> same index base, without any replication, same files.
>>> Is this a good practice? Did somebody do the same? Any problems  
>>> noticed? Or any suggestions, even about different configurations  
>>> are highly appreciated.
>>>
>>> Thanks,
>>> Gene
>>
>

RE: Bossting a token with space at the end

2008-02-13 Thread Charlie Jackson

If you haven't explicity set the sort parameter, Solr will default to
ordering my score. Information about Lucene scoring can be found here

http://lucene.apache.org/java/docs/scoring.html

And, specifically, the score formula can be found here

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apac
he/lucene/search/Similarity.html

I'm curious, though, what are you basing your "expected" order on? If
it's based on some other data in your domain (such as company size or
location or something) you can explicitly set your sort parameter
accordingly. 

- Charlie


-Original Message-
From: Yerraguntla [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 13, 2008 1:50 PM
To: solr-user@lucene.apache.org
Subject: Bossting a token with space at the end


Hi,

  I have to search on Company Names, which contain multiple words. Some
of
the examples are
 Micro Image Systems, Microsoft Corp, Sun Microsystems, Advanced Micro
systems.

For the above example when the search is for micro, the expected results
order is 
Micro Image Systems
Advanced Micro systems
Microsoft Corp
Sun Microsystems

What needs to be done both for field type, indexing and query . There
are
bunch of company names for the each of the compnay name example I
mentioned.
I have been trying with couple of ways with multiple queries, but I am
not
able retrieve Micro Image systems on the top at all.


Appreciate any hints and help.

--Yerra

-- 
View this message in context:
http://www.nabble.com/Bossting-a-token-with-space-at-the-end-tp15465726p
15465726.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: highlighting marks wrong words

2008-01-15 Thread Charlie Jackson

I believe changing the "AND id: etc etc " part of the query to it's on
filter query will take care of your highlighting problem. 

In other words, try a query like this:

q=(auto)&fq=id:(100 OR 1 OR 2 OR 3 OR 5 OR
6)&fl=score&hl.fl=content&hl=true&hl.fragsize=200&hl.snippets=2&hl.simpl
e.pre=%3Cb%3E&hl.simple.post=%3C%2Fb%3E&start=0&rows=10

This could also get you a performance boost if you're querying against
this set of ids often.

-Original Message-
From: Alexey Shakov [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 15, 2008 6:54 AM
To: solr-user@lucene.apache.org
Subject: highlighting marks wrong words

Hi all,

I have a query like this:

q=(auto) AND id:(100 OR 1 OR 2 OR 3 OR 5 OR 
6)&fl=score&hl.fl=content&hl=true&hl.fragsize=200&hl.snippets=2&hl.simpl
e.pre=%3Cb%3E&hl.simple.post=%3C%2Fb%3E&start=0&rows=10

Default field is content.

So, I expect, that only occurrencies of "auto" will be marked.

BUT: the occurrencies of id (100, 1, 2, ..), which occasionally also 
present in content field, are marked as well...

The result looks like:

North American International Auto Show 2007 - Celebrating 
100 years


Any ideas?

Thanx in advance!

RE: Backup of a Solr index

2008-01-03 Thread Charlie Jackson

> But however one has first to shutdown the Solr server before copying the 
index folder?

If you want to copy the hard files from the data/index directory, yes, you'll 
probably want to shut down the server first. You may be able to get away with 
leaving the server up but stopping any index/commit operations, but I could be 
wrong.

> It notes a script "abc", but I cannot find it in my Solr distribution 
(nightly build)?

All of the collection distribution scripts can be found in src/scripts in the 
nightly build if they aren't in the bin directory of the example solr 
directory. 

> Run those scripts on Windows XP?

No, unfortunately the Collection Distribution scripts won't work in Windows 
because they use Unix filesystem trickery to operate. 

-Original Message-
From: Jörg Kiegeland [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 03, 2008 11:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Backup of a Solr index

Charlie Jackson wrote:
> Solr indexes are file-based, so there's no need to "dump" the index to a 
> file. 
>   
But however one has first to shutdown the Solr server before copying the 
index folder?

> In terms of how to create backups and move those backups to other servers, 
> check out this page http://wiki.apache.org/solr/CollectionDistribution. 
>   
It notes a script "abc", but I cannot find it in my Solr distribution 
(nightly build)? Run those scripts on Windows XP?

RE: Backup of a Solr index

2008-01-02 Thread Charlie Jackson

Solr indexes are file-based, so there's no need to "dump" the index to a file. 

In terms of how to create backups and move those backups to other servers, 
check out this page http://wiki.apache.org/solr/CollectionDistribution. 

Hope that helps.



-Original Message-
From: Jörg Kiegeland [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 02, 2008 3:17 AM
To: solr-user@lucene.apache.org
Subject: Backup of a Solr index


Is there a standard way to dump the Solr index to a file or to a 
directory as backup, and to import a such saved index to another Solr 
index later?

Another question I have, is whether one is allowed to copy the 
/data/index folder while the Solr server is still running, as easy 
alternative to do a backup (may this conflict with Solr holding open 
files?)?

Happy new year,
Jörg

RE: Successful project based on SOLR

2007-12-20 Thread Charlie Jackson

Yeah I remember seeing that at one point when I was first looking at the
solrj client. I had plans to build on it but I got pulled away on
something else. Maybe it's time to take another look and see what I can
do with it. As Jonathan said, it's a good project to work on. 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 20, 2007 3:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Successful project based on SOLR

> 
> Ultimately, what I'd like is something like Hibernate Search or like
> Compass GPS (http://www.opensymphony.com/compass/content/about.html)
but
> leveraging Solr's features. That ability to transition back and forth
> between object and index record would be really elegant but I
> need those extras that Solr brings to a Lucene index.
> 

Here is an old verison of solrj that connects to Hibernate similar to 
Compass GPS...

http://solrstuff.org/svn/solrj-hibernate/

It is out of date and references some classes that did not make it into 
the official solrj release, but it is worth a look.

ryan

RE: Successful project based on SOLR

2007-12-20 Thread Charlie Jackson

That's the first I've seen of Hibernate Search. Looks interesting, but I
think it's a little different than what I was looking for. Since it
indexes into Lucene, it's close, but I wouldn't have a bunch of my
favorite Solr features, such as remote indexing and field-level analysis
at index and query time. 

Ultimately, what I'd like is something like Hibernate Search or like
Compass GPS (http://www.opensymphony.com/compass/content/about.html) but
leveraging Solr's features. That ability to transition back and forth
between object and index record would be really elegant but I
need those extras that Solr brings to a Lucene index.

- Charlie

-Original Message-
From: Jonathan Ariel [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 20, 2007 12:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Successful project based on SOLR

What's the difference with that and Hibernate
Search<http://www.hibernate.org/410.html>
?

On Dec 20, 2007 2:09 PM, Charlie Jackson <[EMAIL PROTECTED]>
wrote:

> Congratulations!
>
> > It uses an custom hibernate-SOLR
> bridge which allows transparent persistence of entities on different
> SOLR servers.
>
> Any chance of this code making its way back to the SOLR community? Or,
> if not, can you give me an idea how you did it? This seamless
> integration of Hibernate and Solr is something I'm interested in.
>
> -- Charlie
>
> -Original Message-
> From: Marius Hanganu [mailto:[EMAIL PROTECTED]
> Sent: Thursday, December 20, 2007 10:43 AM
> To: solr-user@lucene.apache.org
> Subject: Successful project based on SOLR
>
> Hi guys,
>
> I just wanted to let you know our company has successfully launched a
> new high traffic website based on a powerful CMS built on top of SOLR.
>
> The website - http://www.hotnews.ro - serves up to 80k users per day
> with an average 400K pages per day. It uses an custom hibernate-SOLR
> bridge which allows transparent persistence of entities on different
> SOLR servers. The CMS behind it also uses an in house developed API
for
> querying SOLR.
>
> It was and will be a pleasure to use SOLR in this project. It has many
> advantages that you're all probably aware of, but the most impressing
> thing was its reliability. You start your SOLR server and simply
forget
> about it.
>
> Once again, congratulations!
> Marius Hanganu,
> Director, Tremend Software Consulting
> www.tremend.ro
>

RE: Successful project based on SOLR

2007-12-20 Thread Charlie Jackson

Congratulations!

> It uses an custom hibernate-SOLR 
bridge which allows transparent persistence of entities on different 
SOLR servers.

Any chance of this code making its way back to the SOLR community? Or,
if not, can you give me an idea how you did it? This seamless
integration of Hibernate and Solr is something I'm interested in.

-- Charlie

-Original Message-
From: Marius Hanganu [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 20, 2007 10:43 AM
To: solr-user@lucene.apache.org
Subject: Successful project based on SOLR

Hi guys,

I just wanted to let you know our company has successfully launched a 
new high traffic website based on a powerful CMS built on top of SOLR.

The website - http://www.hotnews.ro - serves up to 80k users per day 
with an average 400K pages per day. It uses an custom hibernate-SOLR 
bridge which allows transparent persistence of entities on different 
SOLR servers. The CMS behind it also uses an in house developed API for 
querying SOLR.

It was and will be a pleasure to use SOLR in this project. It has many 
advantages that you're all probably aware of, but the most impressing 
thing was its reliability. You start your SOLR server and simply forget 
about it.

Once again, congratulations!
Marius Hanganu,
Director, Tremend Software Consulting
www.tremend.ro

RE: Tomcat6?

2007-12-03 Thread Charlie Jackson

$CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can 
create it and it will work exactly the same way it did in Tomcat 5. It's not 
created by default because its not needed by the manager webapp anymore.


-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 03, 2007 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Tomcat6?

In context.xml, I added..



I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:

> In the Solr wiki, there is not described how to install Solr on  
> Tomcat 6, and I not managed it myself :(
> In the chapter "Configuring Solr Home with JNDI" there is mentioned  
> the directory $CATALINA_HOME/conf/Catalina/localhost , which not  
> exists with TOMCAT 6.
>
> Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ 
> localhost, but with no success.. (I can query the top level page,  
> but the "Solr Admin" link then not works).
>
> Can anybody help?
>
> -- 
> Dipl.-Inf. Jörg Kiegeland
> ikv++ technologies ag
> Bernburger Strasse 24-25, D-10963 Berlin
> e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
> phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
> =
> Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
> board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
> supervising board: Prof. Dr. Bernd Mahr (chairman)
> _
>

RE: Forced Top Document

2007-10-24 Thread Charlie Jackson

Took the words right out my mouth! That second method would be
particularly effective but will only work if you can identify these docs
at index time. 


-Original Message-
From: Kyle Banerjee [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 24, 2007 1:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Forced Top Document

This method Charlie suggested will work just fine with a minor tweak.
For relevancy sorting

?q=foo OR (foo AND id:bar)

For nonrelevancy sorting, all you need is a multilevel sort. Just add
a bogus field that only the important document contains. Then sort by
bogus field in descending order before any other sorting criteria are
applied.

Either way, the document only appears when it matches the search
criteria, and it will always be on top.

kyle

On 10/24/07, Charlie Jackson <[EMAIL PROTECTED]> wrote:
> Yes, this will only work if the results are sorted by score (the
> default).
>
> One thing I thought of after I sent this out was that this will
include
> the specified document even if it doesn't match your search criteria,
> which may not be what you want.
>
>
> -Original Message-
> From: mark angelillo [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 24, 2007 12:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Forced Top Document
>
> Charlie,
>
> That's interesting. I did try something like this. Did you try your
> query with a sorting parameter?
>
> What I've read suggests that all the results are returned based on
> the query specified, but then resorted as specified. Boosting (which
> modifies the document's score) should not change the order unless the
> results are sorted by score.
>
> Mark
>
> On Oct 24, 2007, at 1:05 PM, Charlie Jackson wrote:
>
> > Do you know which document you want at the top? If so, I believe you
> > could just add an "OR" clause to your query to boost that document
> > very
> > high, such as
> >
> > ?q=foo OR id:bar^1000
> >
> > Tried this on my installation and it did, indeed push the document
> > specified to the top.
> >
> >
> >
> > -Original Message-
> > From: Matthew Runo [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, October 24, 2007 10:17 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Forced Top Document
> >
> > I'd love to know this, as I just got a development request for this
> > very feature. I'd rather not spend time on it if it already exists.
> >
> > ++
> >   | Matthew Runo
> >   | Zappos Development
> >   | [EMAIL PROTECTED]
> >   | 702-943-7833
> > ++
> >
> >
> > On Oct 23, 2007, at 10:12 PM, mark angelillo wrote:
> >
> >> Hi all,
> >>
> >> Is there a way to get a specific document to appear on top of
> >> search results even if a sorting parameter would push it further
> >> down?
> >>
> >> Thanks in advance,
> >> Mark
> >>
> >> mark angelillo
> >> snooth inc.
> >> o: 646.723.4328
> >> c: 484.437.9915
> >> [EMAIL PROTECTED]
> >> snooth -- 1.8 million ratings and counting...
> >>
> >>
> >
>
> mark angelillo
> snooth inc.
> o: 646.723.4328
> c: 484.437.9915
> [EMAIL PROTECTED]
> snooth -- 1.8 million ratings and counting...
>
>
>


-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
[EMAIL PROTECTED] / 541.359.9599

RE: Forced Top Document

2007-10-24 Thread Charlie Jackson

Yes, this will only work if the results are sorted by score (the
default). 

One thing I thought of after I sent this out was that this will include
the specified document even if it doesn't match your search criteria,
which may not be what you want. 

-Original Message-
From: mark angelillo [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 24, 2007 12:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Forced Top Document

Charlie,

That's interesting. I did try something like this. Did you try your  
query with a sorting parameter?

What I've read suggests that all the results are returned based on  
the query specified, but then resorted as specified. Boosting (which  
modifies the document's score) should not change the order unless the  
results are sorted by score.

Mark

On Oct 24, 2007, at 1:05 PM, Charlie Jackson wrote:

> Do you know which document you want at the top? If so, I believe you
> could just add an "OR" clause to your query to boost that document  
> very
> high, such as
>
> ?q=foo OR id:bar^1000
>
> Tried this on my installation and it did, indeed push the document
> specified to the top.
>
>
>
> -Original Message-
> From: Matthew Runo [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 24, 2007 10:17 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Forced Top Document
>
> I'd love to know this, as I just got a development request for this
> very feature. I'd rather not spend time on it if it already exists.
>
> ++
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> ++
>
>
> On Oct 23, 2007, at 10:12 PM, mark angelillo wrote:
>
>> Hi all,
>>
>> Is there a way to get a specific document to appear on top of
>> search results even if a sorting parameter would push it further  
>> down?
>>
>> Thanks in advance,
>> Mark
>>
>> mark angelillo
>> snooth inc.
>> o: 646.723.4328
>> c: 484.437.9915
>> [EMAIL PROTECTED]
>> snooth -- 1.8 million ratings and counting...
>>
>>
>

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.8 million ratings and counting...

RE: Forced Top Document

2007-10-24 Thread Charlie Jackson

Do you know which document you want at the top? If so, I believe you
could just add an "OR" clause to your query to boost that document very
high, such as

?q=foo OR id:bar^1000

Tried this on my installation and it did, indeed push the document
specified to the top. 

-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 24, 2007 10:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Forced Top Document

I'd love to know this, as I just got a development request for this  
very feature. I'd rather not spend time on it if it already exists.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++

On Oct 23, 2007, at 10:12 PM, mark angelillo wrote:

> Hi all,
>
> Is there a way to get a specific document to appear on top of  
> search results even if a sorting parameter would push it further down?
>
> Thanks in advance,
> Mark
>
> mark angelillo
> snooth inc.
> o: 646.723.4328
> c: 484.437.9915
> [EMAIL PROTECTED]
> snooth -- 1.8 million ratings and counting...
>
>

RE: Timeout Settings

2007-10-23 Thread Charlie Jackson

The CommonsHttpSolrServer has a setConnectionTimeout method. For my
import, which was on a similar scale as yours, I had to set it up to
1000 (1 second). I think messing with this setting may take care of your
timeout problem.



-Original Message-
From: Daniel Clark [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 22, 2007 6:59 PM
To: solr-user@lucene.apache.org
Subject: Timeout Settings

I'm indexing about 10,000,000 docs and I'm getting the following error
at
the optimize stage.  I'm using Tomcat 6.  I believe it's timing out due
to
the size of the index.  How can increase the timeout setting while it's
optimizing?  Any help would be greatly appreciated.

 

java.lang.Exception:

at org.apache.solr.client.SolrClient.update(SolrClient.java:660)

at org.apache.solr.client.SolrClient.update(SolrClient.java:620)

at
org.apache.solr.client.SolrClient.addDocuments(SolrClient.java:580)

at
org.apache.solr.client.SolrClient.addDocuments(SolrClient.java:595)

at
com.aol.music.search.indexer2.MusicIndexer$SolrUpdateTask.call(MusicInde
xer.
java:244)

at
com.aol.music.search.indexer2.MusicIndexer$SolrUpdateTask.call(MusicInde
xer.
java:214)

at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)

at java.util.concurrent.FutureTask.run(FutureTask.java:123)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
r.ja
va:650)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:6
75)

at java.lang.Thread.run(Thread.java:595)

Caused by: java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)

at
java.io.BufferedInputStream.read(BufferedInputStream.java:235)

at
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77)

at
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105)

at
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.jav
a:11
15)

at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpCon
nect
ionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1373)

at
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBa
se.j
ava:1832)

at
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase
.jav
a:1590)

at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java
:995
)

at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMe
thod
Director.java:397)

at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMetho
dDir
ector.java:170)

at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3
96)

at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3
24)

at org.apache.solr.client.SolrClient.update(SolrClient.java:637)

... 10 more

 

~

Daniel Clark, President

DAC Systems, Inc.

(703) 403-0340

~

RE: quick allowDups questions

2007-10-11 Thread Charlie Jackson

Cool, thanks for the clarification, Ryan.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 10, 2007 5:28 PM
To: solr-user@lucene.apache.org
Subject: Re: quick allowDups questions

the default solrj implementation should do what you need.

> 
> As for Solrj, you're probably right, but I'm not going to take any
> chances for the time being. The server.add method has an optional
> Boolean flag named "overwrite" that defaults to true. Without knowing
> for sure what it does, I'm not going to mess with it. 
> 

direct solr update allows a few extra fields allowDups, 
overwritePending, overwriteCommited -- the future of overwritePending, 
overwriteCommited is in doubt (SOLR-60), so i did not want to bake that 
into the solrj API.

internally,

  allowDups = !overwrite; (the one field you can set)
  overwritePending = !allowDups;
  overwriteCommited = !allowDups;

ryan

RE: quick allowDups questions

2007-10-10 Thread Charlie Jackson

Thanks for the response, Mike. A quick test using the example app
confirms your statement. 

As for Solrj, you're probably right, but I'm not going to take any
chances for the time being. The server.add method has an optional
Boolean flag named "overwrite" that defaults to true. Without knowing
for sure what it does, I'm not going to mess with it. 

For the purposes of my problem, I've got an upper and lower bound of
affected docs, so I'm just going to delete them all and then initiate a
re-index of those specific ids from my source. 

Thanks again for the help!

-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 10, 2007 3:58 PM
To: solr-user@lucene.apache.org
Subject: Re: quick allowDups questions

On 10-Oct-07, at 1:11 PM, Charlie Jackson wrote:

> Anyway, I need to update some docs in my index because my client  
> program
> wasn't accurately putting these docs in (values for one of the fields
> was missing). I'm hoping I won't have to write additional code to go
> through and delete each existing doc before I add the new one, and I
> think setting allowDups on the add command to false will allow me  
> to do
> this. I seem to recall something in the update handler code that goes
> through and deletes all but the last copy of the doc if allowDups is
> false - does that sound accurate?

Yes.  But you need to define a uniqueKey in schema and make sure it  
is the same for docs you want overwritten.  This is how solr detects  
"dups".

>
> If so, I just need to make sure that solrj properly sets that flag,
> which leads me to my next question. Does solrj default allowDups to
> false? If not, what do I need to do to make sure allowDups is set to
> false when I'm adding these docs?

It is the normal mode of operation for Solr, so I'd be surprised if  
it wasn't the default in solrj (but I don't actually know).

-Mike

quick allowDups questions

2007-10-10 Thread Charlie Jackson

Normally this is the type of thing I'd just scour through the online
docs or the source code for, but I'm under the gun a bit. 

 

Anyway, I need to update some docs in my index because my client program
wasn't accurately putting these docs in (values for one of the fields
was missing). I'm hoping I won't have to write additional code to go
through and delete each existing doc before I add the new one, and I
think setting allowDups on the add command to false will allow me to do
this. I seem to recall something in the update handler code that goes
through and deletes all but the last copy of the doc if allowDups is
false - does that sound accurate?

 

If so, I just need to make sure that solrj properly sets that flag,
which leads me to my next question. Does solrj default allowDups to
false? If not, what do I need to do to make sure allowDups is set to
false when I'm adding these docs?

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Charlie Jackson

Sorry, I meant that it maxed out in the sense that my maxDoc field on
the stats page was 8.8 million, which indicates that the most docs it
has ever had was around 8.8 million. It's down to about 7.8 million
currently. I have seen no signs of a "maximum" number of docs Solr can
handle. 


-Original Message-
From: Chris Harris [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: dataset parameters suitable for lucene application

By "maxed out" do you mean that Solr's performance became unacceptable
beyond 8.8M records, or that you only had 8.8M records to index? If
the former, can you share the particular symptoms?

On 9/26/07, Charlie Jackson <[EMAIL PROTECTED]> wrote:
> My experiences so far with this level of data have been good.
>
> Number of records: Maxed out at 8.8 million
> Database size: friggin huge (100+ GB)
> Index size: ~24 GB
>
> 1) It took me about a day to index 8 million docs using a
non-optimized
> program I wrote. It's non-optimized in the sense that it's not
> multi-threaded. It batched together groups of about 5,000 docs at a
time
> to be indexed.
>
> 2) Search times for a basic search are almost always sub-second. If we
> toss in some faceting, it takes a little longer, but I've hardly ever
> seen it go above 1-2 seconds even with the most advanced queries.
>
> Hope that helps.
>
>
> Charlie
>
> 
>
> -Original Message-
> From: Law, John [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 26, 2007 9:28 AM
> To: solr-user@lucene.apache.org
> Subject: dataset parameters suitable for lucene application
>
> I am new to the list and new to lucene and solr. I am considering
Lucene
> for a potential new application and need to know how well it scales.
>
> Following are the parameters of the dataset.
>
> Number of records: 7+ million
> Database size: 13.3 GB
> Index Size:  10.9 GB
>
> My questions are simply:
>
> 1) Approximately how long would it take Lucene to index these
documents?
> 2) What would the approximate retrieval time be (i.e. search response
> time)?
>
> Can someone provide me with some informed guidance in this regard?
>
> Thanks in advance,
> John
>
> __
> John Law
> Director, Platform Management
> ProQuest
> 789 Eisenhower Parkway
> Ann Arbor, MI 48106
> 734-997-4877
> [EMAIL PROTECTED]
> www.proquest.com
> www.csa.com
>
> ProQuest... Start here.
>
>
>
>

RE: dataset parameters suitable for lucene application

2007-09-26 Thread Charlie Jackson

My experiences so far with this level of data have been good.

Number of records: Maxed out at 8.8 million
Database size: friggin huge (100+ GB)
Index size: ~24 GB

1) It took me about a day to index 8 million docs using a non-optimized
program I wrote. It's non-optimized in the sense that it's not
multi-threaded. It batched together groups of about 5,000 docs at a time
to be indexed.

2) Search times for a basic search are almost always sub-second. If we
toss in some faceting, it takes a little longer, but I've hardly ever
seen it go above 1-2 seconds even with the most advanced queries. 

Hope that helps.


Charlie



-Original Message-
From: Law, John [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 9:28 AM
To: solr-user@lucene.apache.org
Subject: dataset parameters suitable for lucene application

I am new to the list and new to lucene and solr. I am considering Lucene
for a potential new application and need to know how well it scales. 

Following are the parameters of the dataset.

Number of records: 7+ million
Database size: 13.3 GB
Index Size:  10.9 GB 

My questions are simply:

1) Approximately how long would it take Lucene to index these documents?
2) What would the approximate retrieval time be (i.e. search response
time)?

Can someone provide me with some informed guidance in this regard?

Thanks in advance,
John

__
John Law
Director, Platform Management
ProQuest
789 Eisenhower Parkway
Ann Arbor, MI 48106
734-997-4877
[EMAIL PROTECTED]
www.proquest.com
www.csa.com

ProQuest... Start here.

RE: UTF-8 encoding problem on one of two Solr setups

2007-08-17 Thread Charlie Jackson

You might want to check out this page
http://wiki.apache.org/solr/SolrTomcat

Tomcat needs a small config change out of the box to properly support UTF-8. 


Thanks,
Charlie


-Original Message-
From: Mario Knezovic [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 17, 2007 12:58 PM
To: solr-user@lucene.apache.org
Subject: UTF-8 encoding problem on one of two Solr setups

Hi all,

I have set up an identical Solr 1.1 on two different machines. One works
fine, the other one has a UTF-8 encoding problem.

#1 is my local Windows XP machine. Solr is running basically in a
configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows
XP/5.1 x86 java/1.6.0). Everything works fine here as expected.

#2 is a Linux machine with Solr running inside Tomcat 6. The problem happens
here. This is the place where Solr will be running finally.

To rule out all problems in my PHP and Java code, I tested the problem with
the Solr admin page and it happens there as well. (Tested with Firefox 2
with site's char encoding UTF-8.)

When entering an arbitrary search string containing UTF-8 chars I get a
correct response from the local Windows Solr setup:




 0
 0
 
  on
  0
  München  <-- sample string containing a German
umlaut-u
  10
  2.2
 

[...]

When I do exactly the same, just on the admin page of the other Solr setup
(but from exactly the same browser), I get the following response:

[...]
item$searchstring_de:MÃ¼nchen
[...]

Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two
8-bit chars instead of one UTF-8 char.

Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was
not able to find the problem yet. My guess is that it is outside of Solr,
maybe in the Tomcat configuration, but so far I spent the entire day without
a further clue.

But apart from that Solr really rocks. Indexing tons of content and
searching works just fine and fast and it was pretty easy to get into
everything. Now I am changing all data to UTF-8 and ran into my first
serious obstacle... after a few weeks of Solr usage!

Any hint/help appreciated. Thank you very much.

Mario

RE: Solrsharp highlighting

2007-08-15 Thread Charlie Jackson

Thanks for adding in those facet examples. That should help me out a
great deal.

As for the highlighting, did you have any ideas about a good way to go
about it? I was thinking about taking a stab at it, but I want to get
your input first.

Thanks,
Charlie

-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 14, 2007 1:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Solrsharp highlighting

Pull down the latest example code from
http://solrstuff.org/svn/solrsharpwhich includes adding facets to
search results.  It's really short and
simple to add facets; the example application implements one form of it.
The nice thing about the facet support is that it utilizes generics to
allow
you to have strongly typed name/value pairs for the fieldname/count
data.

Hope this helps.

-- jeff r.

On 8/10/07, Charlie Jackson <[EMAIL PROTECTED]> wrote:
>
> Also, are there any examples out there of how to use Solrsharp's
> faceting capabilities?
>
> ________
> Charlie Jackson
> 312-873-6537
> [EMAIL PROTECTED]
> -Original Message-
> From: Charlie Jackson [mailto:[EMAIL PROTECTED]
> Sent: Friday, August 10, 2007 3:51 PM
> To: solr-user@lucene.apache.org
> Subject: Solrsharp highlighting
>
> Trying to use Solrsharp (which is a great tool, BTW) to get some
results
> in a C# application. I see the HighlightFields method of the
> QueryBuilder object and I've set it to my highlight field, but how do
I
> get at the results? I don't see anything in the SearchResults code
that
> does anything with the highlight results XML. Did I miss something?
>
>
>
>
>
> Thanks,
>
> Charlie
>
>

RE: Solrsharp highlighting

2007-08-10 Thread Charlie Jackson

Also, are there any examples out there of how to use Solrsharp's
faceting capabilities?


Charlie Jackson
312-873-6537
[EMAIL PROTECTED]
-Original Message-
From: Charlie Jackson [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 10, 2007 3:51 PM
To: solr-user@lucene.apache.org
Subject: Solrsharp highlighting

Trying to use Solrsharp (which is a great tool, BTW) to get some results
in a C# application. I see the HighlightFields method of the
QueryBuilder object and I've set it to my highlight field, but how do I
get at the results? I don't see anything in the SearchResults code that
does anything with the highlight results XML. Did I miss something?

 

 

Thanks,

Charlie

Solrsharp highlighting

2007-08-10 Thread Charlie Jackson

Trying to use Solrsharp (which is a great tool, BTW) to get some results
in a C# application. I see the HighlightFields method of the
QueryBuilder object and I've set it to my highlight field, but how do I
get at the results? I don't see anything in the SearchResults code that
does anything with the highlight results XML. Did I miss something?

 

 

Thanks,

Charlie

RE: fast update handlers

2007-05-10 Thread Charlie Jackson

What about issuing separate commits to the index on a regularly
scheduled basis? For example, you add documents to the index every 2
seconds, or however often, but these operations don't commit. Instead,
you have a cron'd script or something that just issues a commit every 5
or 10 minutes or whatever interval you'd like. 

I had to do something similar when I was running a re-index of my entire
dataset. My program wasn't issuing commits, so I just cron'd a commit
for every half hour so it didn't overload the server. 

Thanks,
Charlie

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, May 10, 2007 9:07 AM
To: solr-user@lucene.apache.org
Subject: Re: fast update handlers

On 5/10/07, Will Johnson <[EMAIL PROTECTED]> wrote:
> I guess I was more concerned with doing the frequent commits and how
> that would affect the caches.  Say I have 2M docs in my main index but
I
> want to add docs every 2 seconds all while doing queries.  if I do
> commits every 2 seconds I basically loose any caching advantage and my
> faceting performance goes down the tube.  If however, I were to add
> things to a smaller index and then roll it into the larger one every
~30
> minutes then I only take the hit on computing the larger filters
caches
> on that interval.  Further, if my smaller index were based on a
> RAMDirectory instead of a FSDirectory I assume computing the filter
sets
> for the smaller index should be fast enough even every 2 seconds.

There isn't currently any support for incrementally updating filters.

-Yonik

Index corruptions?

2007-05-03 Thread Charlie Jackson

I have a couple of questions regarding index corruptions. 

 

1) Has anyone using Solr in a production environment ever experienced an
index corruption? If so, how frequently do they occur?

 

2) It seems like the CollectionDistribution setup would be a good way to
put in place a recovery plan for (or at least have some viable backups
of) the index. However, I have a small concern that if the index gets
corrupted on the master server, the corruption would propagate down to
the slave servers as well. Is this concern unfounded? Also, each of the
snapshots taken by snapshooter are viable full indexes, correct? If so,
that means I'd have a backup of the index each and every time a commit
(or optimize for that matter) is done, which would be awesome.

 

One of our biggest requirements for the indexing process is to have a
good backup/recover strategy in place and I want to make sure Solr will
be able to provide that. 

 

Thanks in advance!

 

Charlie

RE: NullPointerException (not schema related)

2007-05-02 Thread Charlie Jackson

Otis,

Thanks for the response, that list should be very useful!

Charlie

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 11:13 AM
To: solr-user@lucene.apache.org
Subject: Re: NullPointerException (not schema related)

Charlie,

There is nothing built into Solr for that.  But you can use any of the
numerous free proxies/load balancers.  Here is a collection that I've
got:
http://www.simpy.com/user/otis/search/load%2Bbalance+OR+proxy

Otis 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Charlie Jackson <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 1, 2007 5:31:13 PM
Subject: RE: NullPointerException (not schema related)

I went with the first approach which got me up and running. Your other
example config (using ./snapshooter) made me realize how foolish my
original problem was!

Anyway, I've got the whole thing up and running and it looks pretty
awesome! 

One quick question, though. As stated in the wiki, one of the benefits
of distributing the indexes is load balance the queries. Is there a
built-in solr mechanism for performing this query load balancing? I'm
suspecting there is not, and I haven't seen anything about it in the
wiki, but I wanted to check because I know I'm going to be asked.

Thanks,
Charlie

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 01, 2007 3:20 PM
To: solr-user@lucene.apache.org
Subject: RE: NullPointerException (not schema related)

: 
:   snapshooter
:   /usr/local/Production/solr/solr/bin/
:   true
: 

: the directory. However, when I committed data to the index, I was
: getting "No such file or directory" errors from the Runtime.exec call.
I
: verified all of the permissions, etc, with the user I was trying to
use.
: In the end, I wrote up a little test program to see if it was a
problem
: with the Runtime.exec call and I think it is. I'm running this on
CentOS
: 4.4 and Runtime.exec seems to have a hard time directly executing bash
: scripts. For example, if I called Runtime.exec with a command of
: "test_program" (which is a bash script), it failed. If I called
: Runtime.exec with a command of "/bin/bash test_program" it worked.

this initial problem you were having may be a result of path issues.
dir
doesn't need to be the directory where your script lives, it's the
directory where you wnat your script to run (the "cwd" of the process).
it's possible that the error you were getting was because "." isn't in
the
PATH that was being used, you should try something like this...

   /usr/local/Production/solr/solr/bin/snapshooter
   /usr/local/Production/solr/solr/bin/
   true

...or maybe even...

   ./snapshooter 
   /usr/local/Production/solr/solr/bin/
   true

-Hoss

RE: NullPointerException (not schema related)

2007-05-01 Thread Charlie Jackson

I went with the first approach which got me up and running. Your other
example config (using ./snapshooter) made me realize how foolish my
original problem was!

Anyway, I've got the whole thing up and running and it looks pretty
awesome! 

One quick question, though. As stated in the wiki, one of the benefits
of distributing the indexes is load balance the queries. Is there a
built-in solr mechanism for performing this query load balancing? I'm
suspecting there is not, and I haven't seen anything about it in the
wiki, but I wanted to check because I know I'm going to be asked.

Thanks,
Charlie

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 01, 2007 3:20 PM
To: solr-user@lucene.apache.org
Subject: RE: NullPointerException (not schema related)


: 
:   snapshooter
:   /usr/local/Production/solr/solr/bin/
:   true
: 

: the directory. However, when I committed data to the index, I was
: getting "No such file or directory" errors from the Runtime.exec call.
I
: verified all of the permissions, etc, with the user I was trying to
use.
: In the end, I wrote up a little test program to see if it was a
problem
: with the Runtime.exec call and I think it is. I'm running this on
CentOS
: 4.4 and Runtime.exec seems to have a hard time directly executing bash
: scripts. For example, if I called Runtime.exec with a command of
: "test_program" (which is a bash script), it failed. If I called
: Runtime.exec with a command of "/bin/bash test_program" it worked.

this initial problem you were having may be a result of path issues.
dir
doesn't need to be the directory where your script lives, it's the
directory where you wnat your script to run (the "cwd" of the process).
it's possible that the error you were getting was because "." isn't in
the
PATH that was being used, you should try something like this...

 
   /usr/local/Production/solr/solr/bin/snapshooter
   /usr/local/Production/solr/solr/bin/
   true
 

...or maybe even...

 
   ./snapshooter 
   /usr/local/Production/solr/solr/bin/
   true
 

-Hoss

RE: NullPointerException (not schema related)

2007-05-01 Thread Charlie Jackson

Nevermind this...looks like my problem was tagging the "args" as an
 node instead of an  node. Thanks anyway!

Charlie

-Original Message-----
From: Charlie Jackson [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 01, 2007 12:02 PM
To: solr-user@lucene.apache.org
Subject: NullPointerException (not schema related)

Hello,

I'm evaluating solr for potential use in an application I'm working on,
and it sounds like a really great fit. I'm having trouble getting the
Collection Distribution part set up, though. Initially, I had problems
setting up the postCommit listener. I first used this xml to configure
the listener:

  snapshooter

  /usr/local/Production/solr/solr/bin/

  true

This is what came in the solrconfig.xml file with just a minor tweak to
the directory. However, when I committed data to the index, I was
getting "No such file or directory" errors from the Runtime.exec call. I
verified all of the permissions, etc, with the user I was trying to use.
In the end, I wrote up a little test program to see if it was a problem
with the Runtime.exec call and I think it is. I'm running this on CentOS
4.4 and Runtime.exec seems to have a hard time directly executing bash
scripts. For example, if I called Runtime.exec with a command of
"test_program" (which is a bash script), it failed. If I called
Runtime.exec with a command of "/bin/bash test_program" it worked. 

So, with this knowledge in hand, I modified the solrconfig.xml file
again to this:

  /bin/bash

  /usr/local/Production/solr/solr/bin/

  true

   snapshooter 

When I commit data now, however, I get a NullPointerException. I'm
including the stack trace here:

SEVERE: java.lang.NullPointerException

at org.apache.solr.core.SolrCore.update(SolrCore.java:716)

at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:
53)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:269)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:188)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:210)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:174)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:127)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:117)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:108)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1
51)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87
0)

at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc
essConnection(Http11BaseProtocol.java:665)

at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint
.java:528)

at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollow
erWorkerThread.java:81)

at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool
.java:685)

at java.lang.Thread.run(Thread.java:619)

I know this has something to do with my config change (the problem goes
away if I turn off the postCommit listener) but I don't know what!

BTW I'm using solr-1.1.0-incubating. 

Thanks in advance for any help!

Charlie

NullPointerException (not schema related)

2007-05-01 Thread Charlie Jackson

Hello,

 

I'm evaluating solr for potential use in an application I'm working on,
and it sounds like a really great fit. I'm having trouble getting the
Collection Distribution part set up, though. Initially, I had problems
setting up the postCommit listener. I first used this xml to configure
the listener:

 



  snapshooter

  /usr/local/Production/solr/solr/bin/

  true



 

This is what came in the solrconfig.xml file with just a minor tweak to
the directory. However, when I committed data to the index, I was
getting "No such file or directory" errors from the Runtime.exec call. I
verified all of the permissions, etc, with the user I was trying to use.
In the end, I wrote up a little test program to see if it was a problem
with the Runtime.exec call and I think it is. I'm running this on CentOS
4.4 and Runtime.exec seems to have a hard time directly executing bash
scripts. For example, if I called Runtime.exec with a command of
"test_program" (which is a bash script), it failed. If I called
Runtime.exec with a command of "/bin/bash test_program" it worked. 

 

So, with this knowledge in hand, I modified the solrconfig.xml file
again to this:



  /bin/bash

  /usr/local/Production/solr/solr/bin/

  true

   snapshooter 



 

When I commit data now, however, I get a NullPointerException. I'm
including the stack trace here:

SEVERE: java.lang.NullPointerException

at org.apache.solr.core.SolrCore.update(SolrCore.java:716)

at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:
53)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:269)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:188)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:210)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:174)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:127)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:117)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:108)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1
51)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87
0)

at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc
essConnection(Http11BaseProtocol.java:665)

at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint
.java:528)

at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollow
erWorkerThread.java:81)

at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool
.java:685)

at java.lang.Thread.run(Thread.java:619)

 

I know this has something to do with my config change (the problem goes
away if I turn off the postCommit listener) but I don't know what!

 

BTW I'm using solr-1.1.0-incubating. 

 

Thanks in advance for any help!

 

Charlie

47 matches

Mail list logo