Solr wiki link broken

2010-01-25 Thread Teruhiko Kurosaka
In
http://lucene.apache.org/solr/
the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion 
are the link to
http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a generic help 
page of the Wiki system.

What's going on? Did I just hit the site in a maintenance time?

Kuro


RE: Solr vs. Compass

2010-01-25 Thread Funtick


Minutello, Nick wrote:
> 
> Maybe spend some time playing with Compass rather than speculating ;)
> 

I spent few weeks by studying Compass source code, it was three years ago,
and Compass docs (3 years ago) were saying the same as now:
"Compass::Core provides support for two phase commits transactions
(read_committed and serializable), implemented on top of Lucene index
segmentations. The implementation provides fast commits (faster than
Lucene), though they do require the concept of Optimizers that will keep the
index at bay. Compass::Core comes with support for Local and JTA
transactions, and Compass::Spring comes with Spring transaction
synchronization. When only adding data to the index, Compass comes with the
batch_insert transaction, which is the same IndexWriter operation with the
same usual suspects for controlling performance and memory. "

It is just blatant advertisement, trick; even JavaDocs remain unchanged...


Clever guys from Compass can re-apply transaction log to Lucene in case of
server crash (for instance, server was 'killed'  _before_ Lucene flushed new
segment to disk).

Internally, it is implemented as a background thread. Nothing says in docs
"lucene is part of transaction"; I studied source - it is just
'speculating'.




Minutello, Nick wrote:
> 
> If it helps, on the project where I last used compass, we had what I
> consider to be a small dataset - just a few million documents. Nothing
> related to indexing/searching took more than a second or 2 - mostly it
> was 10's or 100's of milliseconds. That app has been live almost 3
> years.
> 

I did the same, and I was happy with Compass: I got Lucene-powered search
without any development. But I got performance problems after few weeks... I
needed about 300 TPS, and Compass-based approach didn't work. With SOLR, I
have 4000 index updates per second.


-Fuad
http://www.tokenizer.org

-- 
View this message in context: 
http://old.nabble.com/Solr-vs.-Compass-tp27259766p27317213.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing TrieDateField Using Lucene

2010-01-25 Thread Yonik Seeley
On Mon, Jan 25, 2010 at 8:03 PM, brad anderson  wrote:
> I'm trying to create a faster index generator for testing purposes. Using
> lucene has helped immensely to increase indexing speed.

Have you tried using other indexing methods such as CSV or
StreamingUpdateSolrServer?
If there are any performance issues, fixing it once and letting
everyone enjoy the benefits is preferable than having everyone write
their own indexing code.

[...]
> Does anyone
> know how to correctly index a TrieDateField using Lucene API's?

Check the code for TrieDateField.createField().  The stored value is a
binary long.

-Yonik
http://www.lucidimagination.com


Indexing TrieDateField Using Lucene

2010-01-25 Thread brad anderson
Greetings,

I'm trying to create a faster index generator for testing purposes. Using
lucene has helped immensely to increase indexing speed. However, my schema
includes a TrieDateField. I do not know how to correctly index this field
type using Lucene's API's.

I've tried the following:
new DateField().toExternal(new Date()) //solrDAteField
DateTools.dateToString(new Date(), DateTools.Resolution.SECOND

In both cases I get the follwoing when I do a search


ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2010-01-26T00:54:26.584Z

The string value is different depending on which method I use. Does anyone
know how to correctly index a TrieDateField using Lucene API's?

Thanks for the help,
Brad


Reminder: Seattle Hadoop / HBase / Lucene / NoSQL meetup Jan 27th! Feat. Razorfish

2010-01-25 Thread Bradford Stephens
Greetings,

I'm in the Bay Area doing startup-stuff this week, so Nick Dimiduk
will be running this meetup again. You can reach him at
ndimi...@gmail.com and 614-657-0267

A friendly reminder that the Seattle Hadoop, NoSQL, etc. meetup is on
January 27th at University of Washington in the Allen Computer Science
Building, room 303.

I believe Razorfish will be giving a talk on how they use Hadoop.

Here's the new, shiny meetup.com link with more detail:
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup

-- 
http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


StreamingUpdateSolrServer seems to hang on indexing big batches

2010-01-25 Thread Jake Brownell
Hi,

I swapped our indexing process over to the streaming update server, but now I'm 
seeing places where our indexing code adds several documents, but eventually 
hangs. It hangs just before the completion message, which comes directly after 
sending to solr. I found this issue in jira

https://issues.apache.org/jira/browse/SOLR-1711

which may be what I'm seeing. If this is indeed what we're running up against 
is there any best practice to work around it?

Thanks,
Jake



Analysis tool vs search query

2010-01-25 Thread nonrenewable

Hi,

I've run into this issue that I have no way of resolving, since the analysis
tool doesn't show me there is an error. I copy the exact field value into
the analysis tool and i type in the exact query request i'm issuing and the
tool finds it a match. However running the query with that exact same
request doesn't return the item. 

I know the item is there, since I can find it based on another field. It
appears that the problem occurs when i add a second word in my query. So I
also tried replacing all whitespaces with _, just to make sure that there's
a mismatch there but there isn't. Here is my field type definition in case
i'm missing something
Thanks,
Tony


  






  
  







  


Example inputs for analysis:
Index value: Banana, Veggie
Query value: banana veggie

-- 
View this message in context: 
http://old.nabble.com/Analysis-tool-vs-search-query-tp27316047p27316047.html
Sent from the Solr - User mailing list archive at Nabble.com.



machine tags, copy fields and pattern tokenizers

2010-01-25 Thread straup

Hi,

I am trying to work out how to store, query and facet machine tags [1] 
in Solr using a combination of copy fields and pattern tokenizer factories.


I am still relatively new to Solr so despite feeling like I've gone over 
the docs, and friends, it's entirely possible I've missed something 
glaringly obvious.


The short version is: Faceting works. Yay! You can facet on the 
individual parts of a machine tag (namespace, predicate, value) and it 
does what you'd expect. For example:


?q=*:*&facet=true&facet.field=mt_namespace&rows=0

numFound:115
foo:65
dc:48
lastfm:2

The longer version is: Even though faceting seems to work I can't query 
(as in ?q=) on the individual fields.


For example, if a single "machinetag" (foo:bar=example) field is copied 
to "mt_namespace", "mt_predicate" and "mt_value" fields I still can't 
query for "?q=mt_namespace:foo".


It appears as though the entire machine tag is being copied to 
mt_namespace even though my reading of the docs is that is a attribute 
is present in a solr.PatternTokenizerFactory analyzer then only the 
matching capture group will be stored.


Is that incorrect?

I've included the field/fieldType definitions I'm using below. [2] Any 
help/suggestions would be appreciated.


Cheers,

[1] http://www.flickr.com/groups/api/discuss/72157594497877875/

[2]

stored="true" required="false" multiValued="true"/>


stored="true" required="false" multiValued="true" />


stored="true" required="false" multiValued="true" />


required="false" multiValued="true" />









  
pattern="([a-zA-Z[0-9]](?:\w+)?):.+" group="1" />

   



  
pattern="[a-zA-Z[0-9]](?:\w+)?:([a-zA-Z[0-9]](?:\w+)?)=.+" group="1" />

  



  
pattern="[a-zA-Z[0-9]](?:\w+)?:[a-zA-Z[0-9]](?:\w+)?=(.+)" group="1" />

  



Re: determine which value produced a hit in multivalued field type

2010-01-25 Thread Lance Norskog
Thanks Erik, I did not know about the order guarantee for indexed
multivalue fields.

Timothy, it could be more than one term matches the queries.
Highlighting will show you which terms matched your query. You'll have
to post-process the results.

On Mon, Jan 25, 2010 at 7:26 AM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS]  wrote:
> If a simple "no" is the answer I'd be glad if anyone could confirm.
>
> Thanks.
>
> -Original Message-
> From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] 
> [mailto:timothy.j.har...@nasa.gov]
> Sent: Friday, January 22, 2010 2:53 PM
> To: solr-user@lucene.apache.org
> Subject: determine which value produced a hit in multivalued field type
>
> Hi,
> If I have a multiValued field type of text, and I put values 
> [cat,dog,green,blue] in it.  Is there a way to tell when I execute a query 
> against that field for dog, that it was in the 1st element position for that 
> multiValued field?
>
> Thanks!
> Tim
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: date math help

2010-01-25 Thread Ahmet Arslan


---
> What not use separate facet query for each these? 
> &facet.query:[NOW-1DAY TO
> NOW]&facet.query:[NOW-7DAYS TO NOW]& etc.

Sorry I forgot to add the field name:

&facet.query=LastMod:[NOW-1DAY TO NOW]&facet.query=LastMod:[NOW-7DAYS TO NOW]&


  


Re: date math help

2010-01-25 Thread Ahmet Arslan
> If I wanted to return documents faceted by LastMod and I
> wanted them grouped
> by week what would the correct syntax be?
> 
> I would like to eventually format the results to display
> this:
> 
> Last day (facet count)
> Last Week (facet count)
> Last 2 Weeks (facet count)
> Last Month (facet count)
> Last 2 Months (facet count)
> Last 3 Months (facet count)

What not use separate facet query for each these? 
&facet.query:[NOW-1DAY TO NOW]&facet.query:[NOW-7DAYS TO NOW]& etc.


  


date math help

2010-01-25 Thread solrquestion6

If I wanted to return documents faceted by LastMod and I wanted them grouped
by week what would the correct syntax be?

I would like to eventually format the results to display this:

Last day (facet count)
Last Week (facet count)
Last 2 Weeks (facet count)
Last Month (facet count)
Last 2 Months (facet count)
Last 3 Months (facet count)

So I figured I can get a date facet results going back 3 months in 7 day
increments then map the results to my desired format.

facet=true
facet.date=LastMod
f.LastMod.facet.date.end=NOW/DAY+1DAY
f.LastMod.facet.date.gap=+7DAYS
f.LastMod.facet.date.start=NOW/DAY-84DAYS

(84 days is 12 weeks)

Am I going about this wrong or is there a better way to get my desired
result?


thanks!


-- 
View this message in context: 
http://old.nabble.com/date-math-help-tp27315500p27315500.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete by query

2010-01-25 Thread Lance Norskog
The problem is that negative queries are ignored by Lucene. Why this
is still true I have no idea.

On Mon, Jan 25, 2010 at 6:23 AM, Noam G.  wrote:
>
> Hi David,
>
> Thank you very much - that did the trick :-)
>
> Noam.
> --
> View this message in context: 
> http://old.nabble.com/Delete-by-query-tp27306968p27307336.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Default value attribute in RSS DIH

2010-01-25 Thread Lance Norskog
There does not seem to be on mentioned in the Wiki or the source. You
can do this with Javascript code in the DIH configuration, or by
setting the default in the schema.xml file.

On Sun, Jan 24, 2010 at 12:20 PM, David Stuart
 wrote:
> Hey All,
>
> Can anyone tell me what the attribute name is for defining a default value in 
> the field tag of the RSS data import handler??
>
> Basically I want to do something like
> 
>
>
> Any Ideas?
>
>
> Regards,
>
>
> Dave



-- 
Lance Norskog
goks...@gmail.com


RE: Solr vs. Compass

2010-01-25 Thread Minutello, Nick
 
Correct. Its not 2PC. It just makes the window for inconistancy quite
small ... without the user having to write anything.

-N



-Original Message-
From: Lukas Kahwe Smith [mailto:m...@pooteeweet.org] 
Sent: 25 January 2010 21:19
To: solr-user@lucene.apache.org
Subject: Re: Solr vs. Compass


On 25.01.2010, at 22:16, Minutello, Nick wrote:

> Sorry, you have completely lost me :/
> 
> In simple terms, there are times when you want the primary storage
> (database) and the Lucene index to be in synch - and updated
atomically.
> It all depends on the kind of application.


Sure. I guess Lucene doesnt support 2PhaseCommits yet?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org




=== 
 Please access the attached hyperlink for an important electronic 
communications disclaimer: 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
 
=== 
 


RE: Solr vs. Compass

2010-01-25 Thread Minutello, Nick
Maybe spend some time playing with Compass rather than speculating ;)

If it helps, on the project where I last used compass, we had what I
consider to be a small dataset - just a few million documents. Nothing
related to indexing/searching took more than a second or 2 - mostly it
was 10's or 100's of milliseconds. That app has been live almost 3
years.

Correct, compass does not implement sharding, etc - but I thought we
already established that.

Anyway, we have strayed well off the point of the original question 
so lets leave it at that.

-N


-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca] 
Sent: 25 January 2010 16:46
To: solr-user@lucene.apache.org
Subject: RE: Solr vs. Compass


> >> Even if "commit" takes 20 minutes?
> I've never seen a commit take 20 minutes... (anything taking that long

> is broken, perhaps in concept)


"index merge" can take from few minutes to few hours. That's why nothing
can beat SOLR Master/Slave and sharding for huge datasets. And reopening
of IndexReader after each commit may take at least few seconds (although
depends on usage patterns).

"IndexReader or IndexSearcher will only see the index as of the "point
in time" that it was opened. Any changes committed to the index after
the reader was opened are not visible until the reader is re-opened."


I am wondering how Compass opens new instance of IndexReader (after each
commit!) - is it really implemented? I can't believe! It will work
probably fine for small datasets (less than 100k), and 1 TPD
(transaction-per-day)...
 

Very expensive and unnatural ACID...


-Fuad



=== 
 Please access the attached hyperlink for an important electronic 
communications disclaimer: 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
 
=== 
 


Re: solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread Markus Jelsma
Hello Frank,

Answers are inline:

Frank van Lingen said:
> I recently started working with solr and find it easy to setup and
> tinker with.
>
> I now want to scale up my setup and was wondering if there is an
> application/component that can do the following (I was not able to find
> documentation on this on the solr site):
>
> -Can I send solr an xml document with a url (html, pdf, word, ppt,
> etc..) and solr indexes it after analyzing (can it analyze pdf and other
> documents?). Solr would use some generic basic fields like
> header and content when analyzing the files.

Yes you can! Solr has an integration with Tika [1], yet another Apache
Lucene project. It can index many different formats. Please see the Solr
Cell wiki for more information [2].
>
> -Can I send solr a site url and it indexes the whole site?

No you can't. But there is yet another fine Apache Lucene project called
Nutch [3]. It offers a very convenient API and is very flexible. Since
version 1.0 Nutch can integrate more easily with a standby Solr index, and
together with Tika you can index almost anything you want with the
greatest ease.

You can find information on Nutch [4], also, our friends at
LucidImagination have written a very decent article on this subject [5].
You will find what you're looking for.

Cheers


>
> If the answer to the above is yes; are there some examples? If the
> answer is no; Is there a simple (basic) extractor for html, pdf, word,
> etc.. files that would translates this in a basic xml document (e.g.
> with field names, url, header and content) that solr can ingest, or
> preferably an application that does this for a whole site?
>
> The idea is to configure solr for generic indexing and search of a
> website.
>
> Frank.

[1]: http://lucene.apache.org/tika/index.html
[2]: http://wiki.apache.org/solr/ExtractingRequestHandler
[3]: http://lucene.apache.org/nutch/
[4]: http://wiki.apache.org/nutch/RunningNutchAndSolr
[5]: http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/




Re: Solr vs. Compass

2010-01-25 Thread Lukas Kahwe Smith

On 25.01.2010, at 22:16, Minutello, Nick wrote:

> Sorry, you have completely lost me :/
> 
> In simple terms, there are times when you want the primary storage
> (database) and the Lucene index to be in synch - and updated atomically.
> It all depends on the kind of application.


Sure. I guess Lucene doesnt support 2PhaseCommits yet?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





RE: Solr vs. Compass

2010-01-25 Thread Minutello, Nick
Sorry, you have completely lost me :/

In simple terms, there are times when you want the primary storage
(database) and the Lucene index to be in synch - and updated atomically.
It all depends on the kind of application.

-N




-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca] 
Sent: 25 January 2010 16:06
To: solr-user@lucene.apache.org
Subject: RE: Solr vs. Compass

> >> Why to embed "indexing" as a transaction dependency? Extremely 
> >> weird
> idea.
> There is nothing weird about different use cases requiring different 
> approaches
> 
> If you're just thinking documents and text search ... then its less of

> an issue.
> If you have an online application where the indexing is being used to 
> drive certain features (not just search), then the transactionality is

> quite useful.


I mean:
- Primary Key Constraint in RDBMS is not the same as an index
- Index in RDBMS: data is still searchable, even if we don't have index

Are you sure that index in RDBMS is part of transaction in current
implementations of Oracle, IBM, SUN? I never heard such staff, there are
no such requirements for transactions. I am talking about transactions
and referential integrity, and not about indexed non-tokenized
single-valued field "Social Insurance Number". It could be done
asynchronously outside of transaction, I can't imagine use case when it
must be done inside transaction / failing transaction when it can't be
done.

"Primary Key Constraint" is different use case, it is not necessarily
indexing of data. Especially for Hibernate where we mostly use surrogate
auto-generated keys.

 
-Fuad



=== 
 Please access the attached hyperlink for an important electronic 
communications disclaimer: 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
 
=== 
 


Re: solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread mike anderson
I think you might be looking for Apache Tika.


On Mon, Jan 25, 2010 at 3:55 PM, Frank van Lingen wrote:

> I recently started working with solr and find it easy to setup and tinker
> with.
>
> I now want to scale up my setup and was wondering if there is an
> application/component that can do the following (I was not able to
> find documentation on this on the solr site):
>
> -Can I send solr an xml document with a url (html, pdf, word, ppt,
> etc..) and solr indexes it after analyzing (can it analyze pdf and
> other documents?). Solr would use some generic basic fields like
> header and content when analyzing the files.
>
> -Can I send solr a site url and it indexes the whole site?
>
> If the answer to the above is yes; are there some examples? If the
> answer is no; Is there a simple (basic) extractor for html, pdf, word,
> etc.. files that would translates this in a basic xml document (e.g.
> with field names, url, header and content) that solr can ingest, or
> preferably an application that does this for a whole site?
>
> The idea is to configure solr for generic indexing and search of a website.
>
> Frank.
>


Re: Lock problems: Lock obtain timed out

2010-01-25 Thread mike anderson
I am getting this exception as well, but disk space is not my problem. What
else can I do to debug this? The solr log doesn't appear to lend any other
clues..

Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990
Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: NativeFSLock@
/solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


Should I consider changing the lock timeout settings (currently set to
defaults)? If so, I'm not sure what to base these values on.

Thanks in advance,
mike


On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog  wrote:

> This will not ever work reliably. You should have 2x total disk space
> for the index. Optimize, for one, requires this.
>
> On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé 
> wrote:
> > Hi,
> >
> > It seems this situation is caused by some No space left on device
> exeptions:
> > SEVERE: java.io.IOException: No space left on device
> >at java.io.RandomAccessFile.writeBytes(Native Method)
> >at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
> >at
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
> >at
> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
> >
> >
> > I'd better try to set my maxMergeDocs and mergeFactor to more
> > adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
> > device, so I guess there's problem when solr tries to merge the index
> > bits being build.
> >
> > At the moment, they are set to   100 and
> > 2147483647
> >
> > Jerome.
> >
> > --
> > Jerome Eteve.
> > http://www.eteve.net
> > jer...@eteve.net
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Collating results from multiple indexes

2010-01-25 Thread Aaron McKee


Is there any somewhat convenient way to collate/integrate fields from 
separate indices during result writing, if the indices use the same 
unique keys? Basically, some sort of cross-index JOIN?


As a bit of background, I have a rather heavyweight dataset of every US 
business (~25m records, an on-disk index footprint of ~30g, and 5-10 
hours to fully index on a decent box). Given the size and relatively 
stability of the dataset, I generally only update this monthly. However, 
I have separate advertising-related datasets that need to be updated 
either hourly or daily (e.g. today's coupon, click revenue remaining, 
etc.) . These advertiser feeds reference the same keyspace that I use in 
the main index, but are otherwise significantly lighter weight. 
Importing and indexing them discretely only takes a couple minutes. 
Given that Solr/Lucene doesn't support field updating, without having to 
drop and re-add an entire document, it doesn't seem practical to 
integrate this data into the main index (the system would be under a 
constant state of churn, if we did document re-inserts, and the 
performance impact would probably be debilitating). It may be nice if 
this data could participate in filtering (e.g. only show advertisers), 
but it doesn't need to participate in scoring/ranking.


I'm guessing that someone else has had a similar need, at some point?  I 
can have our front-end query the smaller indices separately, using the 
keys returned by the primary index, but would prefer to avoid the extra 
sequential roundtrips. I'm hoping to also avoid a coding solution, if 
only to avoid the maintenance overhead as we drop in new builds of Solr, 
but that's also feasible.


Thank you for your insight,
Aaron



solr application for website crawling and indexing html, pdf, word, ... files

2010-01-25 Thread Frank van Lingen
I recently started working with solr and find it easy to setup and tinker with.

I now want to scale up my setup and was wondering if there is an
application/component that can do the following (I was not able to
find documentation on this on the solr site):

-Can I send solr an xml document with a url (html, pdf, word, ppt,
etc..) and solr indexes it after analyzing (can it analyze pdf and
other documents?). Solr would use some generic basic fields like
header and content when analyzing the files.

-Can I send solr a site url and it indexes the whole site?

If the answer to the above is yes; are there some examples? If the
answer is no; Is there a simple (basic) extractor for html, pdf, word,
etc.. files that would translates this in a basic xml document (e.g.
with field names, url, header and content) that solr can ingest, or
preferably an application that does this for a whole site?

The idea is to configure solr for generic indexing and search of a website.

Frank.


Re: Huge Index - RAM usage?

2010-01-25 Thread Erick Erickson
How much memory are you allocating for the  JVM? And
are you sure the memory isn't just memory hanging
around available for GCing?

An interesting test would be to restrict your available memory
allowed to the JVM and see if you bumped up against
that


Here's a blog post from Mark Miller about various tools:
http://www.lucidimagination.com/blog/2009/02/09/investigating-oom-and-other-jvm-issues/

Sorry
I cant' be more help today

Erick

On Mon, Jan 25, 2010 at 11:42 AM, Antonio Lobato wrote:

> Just indexing.  If I shutdown Solr, memory usage goes down to 200MB.  I've
> searched the mailing lists, but most situations are with people both
> searching and indexing.  I was under the impression that indexing shouldn't
> use up so much memory.  I'm trying to figure out where all the usage is
> coming from though.  Any ideas?
>
>
> On Jan 25, 2010, at 11:03 AM, Erick Erickson wrote:
>
>  Are you also searching on this machine or just indexing?
>>
>> I'll assume your certain that it's SOLR that's eating memory,
>> as in you stop the process and your usage drops way down.
>>
>> But if you search the user list for memory, you'll see this
>> kind of thing discussed a bunch of times, along with
>> suggestions for tracking it down, whether it's just
>> postponed GCing, etc.
>>
>> HTH
>> Erick
>>
>> On Mon, Jan 25, 2010 at 10:47 AM, Antonio Lobato > >wrote:
>>
>>  Hello everyone!
>>>
>>> I have a question about indexing a large dataset in Solr and ram usage.
>>>  I
>>> am currently indexing about 160 gigabytes of data to a dedicated indexing
>>> server.  The data is constantly being fed to Solr, 24/7.  The index grows
>>> as
>>> I prune away old data that is not needed, so the index size stays in the
>>> 150-170 gigabyte range.  However, RAM usage on this machine is off the
>>> wall.
>>> The usage grows to about 27 gigabytes of RAM over 2 days or so.  Is this
>>> normal behavior for Solr?
>>>
>>> Thanks!
>>> -Antonio
>>>
>>>
>


Re: wildcard search and hierarchical faceting

2010-01-25 Thread Erik Hatcher

There are some approaches outlined here that might be of interest:

http://wiki.apache.org/solr/HierarchicalFaceting


On Jan 24, 2010, at 2:54 AM, Andy wrote:


I'd like to provide a hierarchical faceting functionality.

An example would be location drill down such as USA -> New York ->  
New York City -> SoHo


The number of levels can be arbitrary. One way to handle this could  
be to use a special character as separator, store values such as  
"USA|New York|New York City|SoHo" and use wildcard search. So if  
"USA" has been selected, the fq would be USA*


I read somewhere that when using wildcard search, no stemming or  
tokenization will be performed. So "USA" will not match 'usa". Is  
there any way to work around that?


Or would you recommend a different way to handle hierarchical  
faceting?








Re: AW: Searching for empty fields possible?

2010-01-25 Thread Ahmet Arslan

> I'm not sure, theoretically fields with a null value
> (php-side) should end
> up not having the field. But then again i don't think it's
> relevant just
> yet. What bugs me is that if I add the -puid:[* TO *], all
> results for
> puid:[0 TO *] disappear, even though I am using "OR".

- operator does not work with OR operator as you think. 
Your query can be re-written as (puid:[0 TO *] OR (*:* -puid:[* TO *]))

This new query satisfies your needs? And more importantly does type="integer"  
supports correct numeric range queries? In Solr 1.4.0 range queries work 
correctly with type="tint".


  


Re: Wildcard Search and Filter in Solr

2010-01-25 Thread Ahmet Arslan

> Hi , 
> I m trying to use wildcard keywords in my search term and
> filter term . but
> i didnt get any results.
> Searched a lot but could not find any lead .
> Can someone help me in this.
> i m using solr 1.2.0 and have few records indexed with
> vendorName value as
> Intel
> 
> In solr admin interface i m trying to do the search like
> this 
> 
> http://localhost:8983/solr/select?indent=on&version=2.2&q=intel&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=
> 
> and i m getting the result properly 
> 
> but when i use q=inte* no records are returned.
> 
> the same is the case for Filter Query on using
> &fq=VendorName:"Intel" i get
> my results.
> 
> but on using &fq=VendorName:"Inte*" no results are
> returned.
> 
> I can guess i doing mistake in few obvious things , but
> could not figure it
> out ..
> Can someone pls help me out :) :)

If &q=intel returns documents while q=inte* does not, it means that fieldType 
of your defaultSearchField is reducing the token intel into something. 

Can you find out it by using /admin/anaysis.jsp what happens to "Intel intel" 
at index and query time?

What is your defaultSearchField? Is it VendorName?

It is expected that &fq=VendorName:Intel returns results while 
&fq=VendorName:Inte* does not. Because prefix queries are not analyzed.
 

But it is strange that q=inte* does not return anything. Maybe your index 
analyzer is reducing Intel into int or ıntel? 

I am not 100% sure but solr 1.2.0  may use default locale in lowercase 
operation. What is your default locale?

It is better to see what happens word Intel using analysis.jsp page.





DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-25 Thread Shah, Nirmal
Hi,

 

I am fairly new to Solr and would like to use the DIH to pull rich text
files (pdfs, etc) from BLOB fields in my database.

 

There was a suggestion made to use the FieldReaderDataSource with the
recently commited TikaEntityProcessor.  Has anyone accomplished this?

This is my configuration, and the resulting error - I'm not sure if I'm
using the FieldReaderDataSource correctly.  If anyone could shed light
on whether I am going the right direction or not, it would be
appreciated.

 

---Data-config.xml:



   

   

  

  

 



 

  

   



 

 

-Debug error: 





0

203







testdb-data-config.xml





full-import

debug









select id as name, attachment from testtable2

0:0:0.32

--- row #1-

java.math.BigDecimal:2

oracle.sql.BLOB:oracle.sql.b...@1c8e807

-





org.apache.solr.handler.dataimport.DataImportHandlerException: No
dataSource :f1 available for entity :253433571801723 Processing Document
# 1

at
org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
taImporter.java:279)

at
org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
.java:93)

at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
yProcessor.java:97)

at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
ProcessorWrapper.java:237)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:357)

at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
ava:383)

at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
:242)

at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
0)

at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
r.java:331)

at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
:389)

at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
ataImportHandler.java:203)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)

at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:338)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:241)

at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
dler.java:1089)

at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)

at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
16)

at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)

at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)

at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)

at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
Collection.java:211)

at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
a:114)

at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)

at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)

at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
ction.java:821)

at
org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)

at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)

at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)

at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
a:226)

at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
va:442)

 

Thanks,

Nirmal



Master Read Timeout

2010-01-25 Thread Giovanni Fernandez-Kincade
I have a slave that is pulling multiple cores from one master, and I'm very 
frequently seeing cases where the slave is getting timeouts when fetching from 
the master:

2010-01-25 11:00:22,819 [pool-3-thread-1] ERROR 
org.apache.solr.handler.SnapPuller - Master at: 
http://shredder:8080/solr/FilingsCore1/replication is not available. Index 
fetch failed. Exception: Read timed out
But I don't see any errors in the master's log, in fact it seems like the 
command succeeds:

2010-01-25 11:00:33,673 [http-8080-Processor9] INFO  
org.apache.solr.core.SolrCore - [FilingsCore1] webapp=/solr path=/replication 
params={command=indexversion&wt=javabin} status=0 QTime=0
We are doing a fair amount of bulk indexing on the master, but the machine is 
not particularly taxed (CPU is hovering around 60%, Disk Queues are low).

Is there any way to increase the Slave's timeout value? Are there any settings 
that might improve fetchIndex performance during indexing on the master?

Thanks in advance,
Gio.


RE: Solr vs. Compass

2010-01-25 Thread Fuad Efendi

> >> Even if "commit" takes 20 minutes?
> I've never seen a commit take 20 minutes... (anything taking that long
> is broken, perhaps in concept)


"index merge" can take from few minutes to few hours. That's why nothing can
beat SOLR Master/Slave and sharding for huge datasets. And reopening of
IndexReader after each commit may take at least few seconds (although
depends on usage patterns).

"IndexReader or IndexSearcher will only see the index as of the "point in
time" that it was opened. Any changes committed to the index after the
reader was opened are not visible until the reader is re-opened."


I am wondering how Compass opens new instance of IndexReader (after each
commit!) - is it really implemented? I can't believe! It will work probably
fine for small datasets (less than 100k), and 1 TPD (transaction-per-day)...
 

Very expensive and unnatural ACID...


-Fuad




Re: Huge Index - RAM usage?

2010-01-25 Thread Antonio Lobato
Just indexing.  If I shutdown Solr, memory usage goes down to 200MB.   
I've searched the mailing lists, but most situations are with people  
both searching and indexing.  I was under the impression that indexing  
shouldn't use up so much memory.  I'm trying to figure out where all  
the usage is coming from though.  Any ideas?


On Jan 25, 2010, at 11:03 AM, Erick Erickson wrote:


Are you also searching on this machine or just indexing?

I'll assume your certain that it's SOLR that's eating memory,
as in you stop the process and your usage drops way down.

But if you search the user list for memory, you'll see this
kind of thing discussed a bunch of times, along with
suggestions for tracking it down, whether it's just
postponed GCing, etc.

HTH
Erick

On Mon, Jan 25, 2010 at 10:47 AM, Antonio Lobato >wrote:



Hello everyone!

I have a question about indexing a large dataset in Solr and ram  
usage.  I
am currently indexing about 160 gigabytes of data to a dedicated  
indexing
server.  The data is constantly being fed to Solr, 24/7.  The index  
grows as
I prune away old data that is not needed, so the index size stays  
in the
150-170 gigabyte range.  However, RAM usage on this machine is off  
the wall.
The usage grows to about 27 gigabytes of RAM over 2 days or so.  Is  
this

normal behavior for Solr?

Thanks!
-Antonio





Re: LucidGaze, No Data

2010-01-25 Thread Mark Miller
Markus Jelsma wrote:
> Hi,
>
>
> Is the list without clue or should i mail Lucid directly?
>
>
> Cheers,
>
>
>   
>> I have installed and reconfigured everything according to the readme
>> supplied with the recent LucidGaze release. Files have been written in the
>> gaze directory in SOLR_HOME but the *.log.x.y files are all empty! The rrd
>> directory does contain something that is about 24MiB.
>>
>> In the end, i see no errors in Tomcat's logs but also no results in the web
>> application, all the handler's charts tell me "No Data".
>>
>> Anyone wit a clue on this?
>> 
>
>   
Hey Markus - I would say that emailing Lucid directly is the right
approach for help with LucidGaze.

You might include some more information if you could as well. Are you
trying Solr 1.3 or 1.4? The standard release or Lucid's certified
release? When you say Tomcat logs, do you mean
the Solr logs as well? (which go to std out by default). What version of
Tomat are you using?

What's the situation with the load your trying to measure? Is this on a
live server? Are you simulating the requests? How many, how fast? Are
you sure you are hitting the handlers you want monitored?

The more info you supply, the easier it will be to help you out. Hard to
go off, "it's not working" ;)

Thanks,

-- 
- Mark

http://www.lucidimagination.com





RE: Solr vs. Compass

2010-01-25 Thread Fuad Efendi
> >> Why to embed "indexing" as a transaction dependency? Extremely weird
> idea.
> There is nothing weird about different use cases requiring different
> approaches
> 
> If you're just thinking documents and text search ... then its less of
> an issue.
> If you have an online application where the indexing is being used to
> drive certain features (not just search), then the transactionality is
> quite useful.


I mean:
- Primary Key Constraint in RDBMS is not the same as an index
- Index in RDBMS: data is still searchable, even if we don't have index

Are you sure that index in RDBMS is part of transaction in current
implementations of Oracle, IBM, SUN? I never heard such staff, there are no
such requirements for transactions. I am talking about transactions and
referential integrity, and not about indexed non-tokenized single-valued
field "Social Insurance Number". It could be done asynchronously outside of
transaction, I can't imagine use case when it must be done inside
transaction / failing transaction when it can't be done.

"Primary Key Constraint" is different use case, it is not necessarily
indexing of data. Especially for Hibernate where we mostly use surrogate
auto-generated keys.

 
-Fuad




Re: Huge Index - RAM usage?

2010-01-25 Thread Erick Erickson
Are you also searching on this machine or just indexing?

I'll assume your certain that it's SOLR that's eating memory,
as in you stop the process and your usage drops way down.

But if you search the user list for memory, you'll see this
kind of thing discussed a bunch of times, along with
suggestions for tracking it down, whether it's just
postponed GCing, etc.

HTH
Erick

On Mon, Jan 25, 2010 at 10:47 AM, Antonio Lobato wrote:

> Hello everyone!
>
> I have a question about indexing a large dataset in Solr and ram usage.  I
> am currently indexing about 160 gigabytes of data to a dedicated indexing
> server.  The data is constantly being fed to Solr, 24/7.  The index grows as
> I prune away old data that is not needed, so the index size stays in the
> 150-170 gigabyte range.  However, RAM usage on this machine is off the wall.
> The usage grows to about 27 gigabytes of RAM over 2 days or so.  Is this
> normal behavior for Solr?
>
> Thanks!
> -Antonio
>


RE: Solr configuration issue for sorting on title field

2010-01-25 Thread EL KASMI Hicham
I see! Thanks a lot Eric.

==
Hicham El Kasmi
Université Libre de Bruxelles - Libraries
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS Belgium

Tel: + 32 2 650 25 30
Fax: + 32 2 650 23 91
== 


-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com] 
Envoyé : lundi 25 janvier 2010 16:43
À : solr-user@lucene.apache.org
Objet : Re: Solr configuration issue for sorting on title field

Well, stop doing that  The error message is  a bit misleading, and
was probably in response to the more-frequent error of tokenizing a field
then trying to sort on it.

The problem is that sorting on something with more than one token is
indeterminate. In this case, with more than one title is a different
manifestation of the underlying issue. Say you have three values
in a field, "a", "b" and "c". What does sorting on that field mean?
Just use the first token? Second? Any answer is wrong so it's
best to fail loudly.

By extension, more than one value (which you're getting from
multiple titles for at least some documents) will produce
incorrect (or at least puzzling) results sometime.

So storing exactly one title/document (probably in another field)
should cure this problem...

HTH
Erick

On Mon, Jan 25, 2010 at 10:30 AM, EL KASMI Hicham  wrote:

> Hi Eric,
>
> Yes, we're indexing more than one title per document (document's title, the
> title of the series or of the journal in which the document was published,
> etc...).
> Normally, each time we modified the SOLR config, we drop the old index and
> we create a new one.
>
> Thanks,
>
> ==
> Hicham El Kasmi
> Université Libre de Bruxelles - Libraries
> Av. F.D. Roosevelt 50, CP 180
> 1050 BRUSSELS Belgium
>
> Tel: + 32 2 650 25 30
> Fax: + 32 2 650 23 91
> ==
>
>
> -Message d'origine-
> De : Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Envoyé : lundi 25 janvier 2010 15:03
> À : solr-user@lucene.apache.org
> Objet : Re: Solr configuration issue for sorting on title field
>
> Are you sending in more than one title per document, by chance?
>
> Have you changed your configuration without reindexing the entire
> collection, possibly?
>
>Erik
>
>
> On Jan 25, 2010, at 8:32 AM, EL KASMI Hicham wrote:
>
> > Thanks Otis,
> >
> > The long message was an attempt to be as detailed as possible, so
> > that one could understand our tests. I'm afraid the problem we
> > wanted to describe didn't really come through.
> >
> > These are the relevant entries from our config:
> >
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > ==
> >  > termVectors="true" />
> > 
> >
> > 
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > ==
> >
> > We want to sort on titleStr; but we end up with the error message:
> >
> > "HTTP Status 500 - there are more terms than documents in field
> > "titleStr", but it's impossible to sort on tokenized fields".
> >
> > We don't understand this message, or what is wrong in our config. We
> > tried several other configs, as described in my first message, but
> > no positive result.
> >
> > Thanks for any clarification you can provide us.
> >
> > Hicham.
> >
> > ==
> > Hicham El Kasmi
> > Université Libre de Bruxelles - Libraries
> > Av. F.D. Roosevelt 50, CP 180
> > 1050 BRUSSELS Belgium
> >
> > Tel: + 32 2 650 25 30
> > Fax: + 32 2 650 23 91
> > ==
> >
> >
> > -Message d'origine-
> > De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Envoyé : vendredi 22 janvier 2010 6:17
> > À : solr-user@lucene.apache.org
> > Objet : Re: Solr configuration issue for sorting on title field
> >
> > Hi,
> >
> > Long message.  I skimmed through your configs.  It looks like your
> > main question is how can changing the field type (or, really,
> > turning off "multiValued" on a field cause the number of document in
> > your index to decrease, right?  Well, it can't or shouldn't.  I am
> > guessing you simply did something wrong, like not index all docs, or
> > got errors while indexing that you didn't notice or some such.
> >
> >
> > If all you changed is a field's type, this alone should not cause
> > your index to have fewer documents.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > - Original Message 
> >> From: EL KASMI Hicham 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thu, January 21, 2010 5:14:22 AM
> >> Subject: Solr configuration issue for sorting on title field
> >>
> >> Hello again,
> >>
> >> We have a problem with sorting on title field in Solr instance of our
> >> production repository, we get the error message:
> >>

Huge Index - RAM usage?

2010-01-25 Thread Antonio Lobato

Hello everyone!

I have a question about indexing a large dataset in Solr and ram  
usage.  I am currently indexing about 160 gigabytes of data to a  
dedicated indexing server.  The data is constantly being fed to Solr,  
24/7.  The index grows as I prune away old data that is not needed, so  
the index size stays in the 150-170 gigabyte range.  However, RAM  
usage on this machine is off the wall. The usage grows to about 27  
gigabytes of RAM over 2 days or so.  Is this normal behavior for Solr?


Thanks!
-Antonio


Re: Solr configuration issue for sorting on title field

2010-01-25 Thread Erick Erickson
Well, stop doing that  The error message is  a bit misleading, and
was probably in response to the more-frequent error of tokenizing a field
then trying to sort on it.

The problem is that sorting on something with more than one token is
indeterminate. In this case, with more than one title is a different
manifestation of the underlying issue. Say you have three values
in a field, "a", "b" and "c". What does sorting on that field mean?
Just use the first token? Second? Any answer is wrong so it's
best to fail loudly.

By extension, more than one value (which you're getting from
multiple titles for at least some documents) will produce
incorrect (or at least puzzling) results sometime.

So storing exactly one title/document (probably in another field)
should cure this problem...

HTH
Erick

On Mon, Jan 25, 2010 at 10:30 AM, EL KASMI Hicham  wrote:

> Hi Eric,
>
> Yes, we're indexing more than one title per document (document's title, the
> title of the series or of the journal in which the document was published,
> etc...).
> Normally, each time we modified the SOLR config, we drop the old index and
> we create a new one.
>
> Thanks,
>
> ==
> Hicham El Kasmi
> Université Libre de Bruxelles - Libraries
> Av. F.D. Roosevelt 50, CP 180
> 1050 BRUSSELS Belgium
>
> Tel: + 32 2 650 25 30
> Fax: + 32 2 650 23 91
> ==
>
>
> -Message d'origine-
> De : Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Envoyé : lundi 25 janvier 2010 15:03
> À : solr-user@lucene.apache.org
> Objet : Re: Solr configuration issue for sorting on title field
>
> Are you sending in more than one title per document, by chance?
>
> Have you changed your configuration without reindexing the entire
> collection, possibly?
>
>Erik
>
>
> On Jan 25, 2010, at 8:32 AM, EL KASMI Hicham wrote:
>
> > Thanks Otis,
> >
> > The long message was an attempt to be as detailed as possible, so
> > that one could understand our tests. I'm afraid the problem we
> > wanted to describe didn't really come through.
> >
> > These are the relevant entries from our config:
> >
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > ==
> >  > termVectors="true" />
> > 
> >
> > 
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > =
> > ==
> >
> > We want to sort on titleStr; but we end up with the error message:
> >
> > "HTTP Status 500 - there are more terms than documents in field
> > "titleStr", but it's impossible to sort on tokenized fields".
> >
> > We don't understand this message, or what is wrong in our config. We
> > tried several other configs, as described in my first message, but
> > no positive result.
> >
> > Thanks for any clarification you can provide us.
> >
> > Hicham.
> >
> > ==
> > Hicham El Kasmi
> > Université Libre de Bruxelles - Libraries
> > Av. F.D. Roosevelt 50, CP 180
> > 1050 BRUSSELS Belgium
> >
> > Tel: + 32 2 650 25 30
> > Fax: + 32 2 650 23 91
> > ==
> >
> >
> > -Message d'origine-
> > De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Envoyé : vendredi 22 janvier 2010 6:17
> > À : solr-user@lucene.apache.org
> > Objet : Re: Solr configuration issue for sorting on title field
> >
> > Hi,
> >
> > Long message.  I skimmed through your configs.  It looks like your
> > main question is how can changing the field type (or, really,
> > turning off "multiValued" on a field cause the number of document in
> > your index to decrease, right?  Well, it can't or shouldn't.  I am
> > guessing you simply did something wrong, like not index all docs, or
> > got errors while indexing that you didn't notice or some such.
> >
> >
> > If all you changed is a field's type, this alone should not cause
> > your index to have fewer documents.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > - Original Message 
> >> From: EL KASMI Hicham 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thu, January 21, 2010 5:14:22 AM
> >> Subject: Solr configuration issue for sorting on title field
> >>
> >> Hello again,
> >>
> >> We have a problem with sorting on title field in Solr instance of our
> >> production repository, we get the error message:
> >>
> >> "HTTP Status 500 - there are more terms than documents in field
> >> "titleStr", but it's impossible to sort on tokenized fields".
> >>
> >> After some googling and searching in this listserv, we found that a
> >> sorting field has to be untokenized but our sorting field "titleStr"
> >> which is a copy of the "title" field has a string type.
> >>
> >> What we did as configs in our schema.xml file :
> >>
> >> 1st config
> >> ++
> >>
> >>
> >> sortMissingLast="true" omit

RE: Solr configuration issue for sorting on title field

2010-01-25 Thread EL KASMI Hicham
Hi Eric,

Yes, we're indexing more than one title per document (document's title, the 
title of the series or of the journal in which the document was published, 
etc...).
Normally, each time we modified the SOLR config, we drop the old index and we 
create a new one.

Thanks,

==
Hicham El Kasmi
Université Libre de Bruxelles - Libraries
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS Belgium

Tel: + 32 2 650 25 30
Fax: + 32 2 650 23 91
== 


-Message d'origine-
De : Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Envoyé : lundi 25 janvier 2010 15:03
À : solr-user@lucene.apache.org
Objet : Re: Solr configuration issue for sorting on title field

Are you sending in more than one title per document, by chance?

Have you changed your configuration without reindexing the entire  
collection, possibly?

Erik


On Jan 25, 2010, at 8:32 AM, EL KASMI Hicham wrote:

> Thanks Otis,
>
> The long message was an attempt to be as detailed as possible, so  
> that one could understand our tests. I'm afraid the problem we  
> wanted to describe didn't really come through.
>
> These are the relevant entries from our config:
>
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ==
>  termVectors="true" />
> 
>
> 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ==
>
> We want to sort on titleStr; but we end up with the error message:
>
> "HTTP Status 500 - there are more terms than documents in field  
> "titleStr", but it's impossible to sort on tokenized fields".
>
> We don't understand this message, or what is wrong in our config. We  
> tried several other configs, as described in my first message, but  
> no positive result.
>
> Thanks for any clarification you can provide us.
>
> Hicham.
>
> ==
> Hicham El Kasmi
> Université Libre de Bruxelles - Libraries
> Av. F.D. Roosevelt 50, CP 180
> 1050 BRUSSELS Belgium
>
> Tel: + 32 2 650 25 30
> Fax: + 32 2 650 23 91
> ==
>
>
> -Message d'origine-
> De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Envoyé : vendredi 22 janvier 2010 6:17
> À : solr-user@lucene.apache.org
> Objet : Re: Solr configuration issue for sorting on title field
>
> Hi,
>
> Long message.  I skimmed through your configs.  It looks like your  
> main question is how can changing the field type (or, really,  
> turning off "multiValued" on a field cause the number of document in  
> your index to decrease, right?  Well, it can't or shouldn't.  I am  
> guessing you simply did something wrong, like not index all docs, or  
> got errors while indexing that you didn't notice or some such.
>
>
> If all you changed is a field's type, this alone should not cause  
> your index to have fewer documents.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> - Original Message 
>> From: EL KASMI Hicham 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, January 21, 2010 5:14:22 AM
>> Subject: Solr configuration issue for sorting on title field
>>
>> Hello again,
>>
>> We have a problem with sorting on title field in Solr instance of our
>> production repository, we get the error message:
>>
>> "HTTP Status 500 - there are more terms than documents in field
>> "titleStr", but it's impossible to sort on tokenized fields".
>>
>> After some googling and searching in this listserv, we found that a
>> sorting field has to be untokenized but our sorting field "titleStr"
>> which is a copy of the "title" field has a string type.
>>
>> What we did as configs in our schema.xml file :
>>
>> 1st config
>> ++
>>
>>
>> sortMissingLast="true" omitNorms="true"/>
>>
>>
>>
>> positionIncrementGap="100">
>>
>>
>>
>> version="icu4j" composed="false" remove_diacritics="true"
>> remove_modifiers="true" fold="true"/>
>>
>>
>> words="stopwords.txt"/>
>>
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0"/>
>>
>>
>> protected="protwords.txt"/>
>>
>>
>>
>>
>>
>> version="icu4j" composed="false" remove_diacritics="true"
>> remove_modifiers="true" fold="true"/>
>>
>>
>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>
>> words="stopwords.txt"/>
>>
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0"/>
>>
>>
>> protected="protwords.txt"/>
>>
>>
>>
>>
>> ===
>>
>>
>> termVectors="true"/>
>>
>>
>> =
>>
>>
>>
>>
>> As you can see, the title field has the termVectors property as  
>> true, we
>> drop it in the second attempt of our config
>>
>> 2end attempt
>> 
>>
>>
>> positionIncrementGap="100">
>>
>>
>>
>> version="icu4j" composed="false" remove_diacritics=

RE: determine which value produced a hit in multivalued field type

2010-01-25 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]
If a simple "no" is the answer I'd be glad if anyone could confirm.

Thanks.

-Original Message-
From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] 
[mailto:timothy.j.har...@nasa.gov] 
Sent: Friday, January 22, 2010 2:53 PM
To: solr-user@lucene.apache.org
Subject: determine which value produced a hit in multivalued field type

Hi,
If I have a multiValued field type of text, and I put values 
[cat,dog,green,blue] in it.  Is there a way to tell when I execute a query 
against that field for dog, that it was in the 1st element position for that 
multiValued field?

Thanks!
Tim



Re: LucidGaze, No Data

2010-01-25 Thread Markus Jelsma
Hi,


Is the list without clue or should i mail Lucid directly?


Cheers,


>I have installed and reconfigured everything according to the readme
> supplied with the recent LucidGaze release. Files have been written in the
> gaze directory in SOLR_HOME but the *.log.x.y files are all empty! The rrd
> directory does contain something that is about 24MiB.
>
>In the end, i see no errors in Tomcat's logs but also no results in the web
>application, all the handler's charts tell me "No Data".
>
>Anyone wit a clue on this?



RE: Delete by query

2010-01-25 Thread Noam G.

Hi David,

Thank you very much - that did the trick :-)

Noam.
-- 
View this message in context: 
http://old.nabble.com/Delete-by-query-tp27306968p27307336.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Delete by query

2010-01-25 Thread David.Dankwerth
solrServer.deleteByQuery("*:* AND
-version_uuid:e04534e2-28db-4420-a5f3-300477872c11"); 
Should do the trick.

-Original Message-
From: Noam G. [mailto:noam...@gmail.com] 
Sent: 25 January 2010 13:58
To: solr-user@lucene.apache.org
Subject: Delete by query


Hi,

I have an index with 3M docs.
Each doc has a field calld version_uuid.

I want to delete all docs that theire version_uuid is other then
'e04534e2-28db-4420-a5f3-300477872c11' (for example :-))

This is the query I submit to the SolrServer object using the
deleteByQuery
method:

"NOT version_uuid:e04534e2-28db-4420-a5f3-300477872c11"

I also tried:
"-version_uuid:e04534e2-28db-4420-a5f3-300477872c11"

But still all docs are there.

Any ideas? Am I doing somthing wrong?

Noam.

--
View this message in context:
http://old.nabble.com/Delete-by-query-tp27306968p27306968.html
Sent from the Solr - User mailing list archive at Nabble.com.

Visit our website at http://www.ubs.com

This message contains confidential information and is intended only
for the individual named.  If you are not the named addressee you
should not disseminate, distribute or copy this e-mail.  Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.

E-mails are not encrypted and cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses.  The sender
therefore does not accept liability for any errors or omissions in the
contents of this message which arise as a result of e-mail transmission.
If verification is required please request a hard-copy version.  This
message is provided for informational purposes and should not be
construed as a solicitation or offer to buy or sell any securities
or related financial instruments.

UBS Limited is a company registered in England & Wales under company
number 2035362, whose registered office is at 1 Finsbury Avenue,
London, EC2M 2PP, United Kingdom.

UBS AG (London Branch) is registered as a branch of a foreign company
under number BR004507, whose registered office is at
1 Finsbury Avenue, London, EC2M 2PP, United Kingdom.

UBS Clearing and Execution Services Limited is a company registered
in England & Wales under company number 03123037, whose registered
office is at 1 Finsbury Avenue, London, EC2M 2PP, United Kingdom.

UBS reserves the right to retain all messages. Messages are protected
and accessed only in legally justified cases.


Re: Solr configuration issue for sorting on title field

2010-01-25 Thread Erik Hatcher

Are you sending in more than one title per document, by chance?

Have you changed your configuration without reindexing the entire  
collection, possibly?


Erik


On Jan 25, 2010, at 8:32 AM, EL KASMI Hicham wrote:


Thanks Otis,

The long message was an attempt to be as detailed as possible, so  
that one could understand our tests. I'm afraid the problem we  
wanted to describe didn't really come through.


These are the relevant entries from our config:

= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
==
termVectors="true" />




= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
==


We want to sort on titleStr; but we end up with the error message:

"HTTP Status 500 - there are more terms than documents in field  
"titleStr", but it's impossible to sort on tokenized fields".


We don't understand this message, or what is wrong in our config. We  
tried several other configs, as described in my first message, but  
no positive result.


Thanks for any clarification you can provide us.

Hicham.

==
Hicham El Kasmi
Université Libre de Bruxelles - Libraries
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS Belgium

Tel: + 32 2 650 25 30
Fax: + 32 2 650 23 91
==


-Message d'origine-
De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Envoyé : vendredi 22 janvier 2010 6:17
À : solr-user@lucene.apache.org
Objet : Re: Solr configuration issue for sorting on title field

Hi,

Long message.  I skimmed through your configs.  It looks like your  
main question is how can changing the field type (or, really,  
turning off "multiValued" on a field cause the number of document in  
your index to decrease, right?  Well, it can't or shouldn't.  I am  
guessing you simply did something wrong, like not index all docs, or  
got errors while indexing that you didn't notice or some such.



If all you changed is a field's type, this alone should not cause  
your index to have fewer documents.


Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 

From: EL KASMI Hicham 
To: solr-user@lucene.apache.org
Sent: Thu, January 21, 2010 5:14:22 AM
Subject: Solr configuration issue for sorting on title field

Hello again,

We have a problem with sorting on title field in Solr instance of our
production repository, we get the error message:

"HTTP Status 500 - there are more terms than documents in field
"titleStr", but it's impossible to sort on tokenized fields".

After some googling and searching in this listserv, we found that a
sorting field has to be untokenized but our sorting field "titleStr"
which is a copy of the "title" field has a string type.

What we did as configs in our schema.xml file :

1st config
++


sortMissingLast="true" omitNorms="true"/>



positionIncrementGap="100">



version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>





version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>


protected="protwords.txt"/>




===


termVectors="true"/>


=




As you can see, the title field has the termVectors property as  
true, we

drop it in the second attempt of our config

2end attempt



positionIncrementGap="100">



version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>





version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>


protected="protwords.txt"/>




===




=




3rd attempt
+++
Create a new field type named 'text_exact' which doesn't use the
"WhitespaceTokenizer" tokenizer but instead uses the  
"KeywordTokenizer"

tokenizer.



positionIncrementGap="100">



version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>


words="stopwords.txt"/>

generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>





version="icu4j" composed="false" rem

Delete by query

2010-01-25 Thread Noam G.

Hi,

I have an index with 3M docs.
Each doc has a field calld version_uuid.

I want to delete all docs that theire version_uuid is other
then 'e04534e2-28db-4420-a5f3-300477872c11' (for example :-))

This is the query I submit to the SolrServer object using the deleteByQuery
method:

"NOT version_uuid:e04534e2-28db-4420-a5f3-300477872c11"

I also tried:
"-version_uuid:e04534e2-28db-4420-a5f3-300477872c11"

But still all docs are there.

Any ideas? Am I doing somthing wrong?

Noam.

-- 
View this message in context: 
http://old.nabble.com/Delete-by-query-tp27306968p27306968.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: understanding termVector output

2010-01-25 Thread Grant Ingersoll

On Jan 22, 2010, at 12:39 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote:

> Hi,
> I'm trying to see if I can use termVectors for a use case I have.  
> Essentially I want to know is: where in the indexed value does the query hit 
> occur?

Solr doesn't currently have support for Span Queries (which is what you really 
want here).  See https://issues.apache.org/jira/browse/SOLR-1337

>  I think either tv.positions or tv.offsets would provide that info but I 
> don't really grok the result.  Below I've pasted the URL and part of the 
> result.  What is  
> http://localhost:8080/solr/select?q=idxPartition:CONNECTED_ASSETS%20AND%20srcSpan:CR1434&rows=1&indent=on&qt=tvrh&tv.offsets=true&fl=srcSpan
> 
> 
> 
> |CR1434-Occ1|abcCR1434 is a token for searching with 
> WILDCI|testuser|System of 
> Registries|2010-01-12T23:00:00.000Z|2010-01-12T23:00:00.000Z|testuser|System 
> of Registries
> 
> 
> 
> 
> f57488c1d041a1de5bd6a70b09428d119ed1de29
> 
> 
> 
> 104
> 106
> 107
> 109
> 129
> 131
> 132
> 134
> 
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Wildcard Search and Filter in Solr

2010-01-25 Thread ashokcz

Hi , 
I m trying to use wildcard keywords in my search term and filter term . but
i didnt get any results.
Searched a lot but could not find any lead .
Can someone help me in this.

this is my schema.xml entries




and this is my field type definition for text field

 
  
  
  

  
  
  
   
   


 
  
  
  
  
  
  
   
  
 


i m using solr 1.2.0 and have few records indexed with vendorName value as
Intel

In solr admin interface i m trying to do the search like this 

http://localhost:8983/solr/select?indent=on&version=2.2&q=intel&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

and i m getting the result properly 

but when i use q=inte* no records are returned.

the same is the case for Filter Query on using &fq=VendorName:"Intel" i get
my results.

but on using &fq=VendorName:"Inte*" no results are returned.

I can guess i doing mistake in few obvious things , but could not figure it
out ..
Can someone pls help me out :) :)

Thanks All
---
Ashok

-- 
View this message in context: 
http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27306734.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr configuration issue for sorting on title field

2010-01-25 Thread EL KASMI Hicham
Thanks Otis,

The long message was an attempt to be as detailed as possible, so that one 
could understand our tests. I'm afraid the problem we wanted to describe didn't 
really come through.

These are the relevant entries from our config:


 
 




We want to sort on titleStr; but we end up with the error message:

"HTTP Status 500 - there are more terms than documents in field "titleStr", but 
it's impossible to sort on tokenized fields".

We don't understand this message, or what is wrong in our config. We tried 
several other configs, as described in my first message, but no positive result.

Thanks for any clarification you can provide us.

Hicham.

==
Hicham El Kasmi
Université Libre de Bruxelles - Libraries
Av. F.D. Roosevelt 50, CP 180
1050 BRUSSELS Belgium

Tel: + 32 2 650 25 30
Fax: + 32 2 650 23 91
== 


-Message d'origine-
De : Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Envoyé : vendredi 22 janvier 2010 6:17
À : solr-user@lucene.apache.org
Objet : Re: Solr configuration issue for sorting on title field

Hi,

Long message.  I skimmed through your configs.  It looks like your main 
question is how can changing the field type (or, really, turning off 
"multiValued" on a field cause the number of document in your index to 
decrease, right?  Well, it can't or shouldn't.  I am guessing you simply did 
something wrong, like not index all docs, or got errors while indexing that you 
didn't notice or some such.


If all you changed is a field's type, this alone should not cause your index to 
have fewer documents.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: EL KASMI Hicham 
> To: solr-user@lucene.apache.org
> Sent: Thu, January 21, 2010 5:14:22 AM
> Subject: Solr configuration issue for sorting on title field
> 
> Hello again,
> 
> We have a problem with sorting on title field in Solr instance of our
> production repository, we get the error message: 
> 
> "HTTP Status 500 - there are more terms than documents in field
> "titleStr", but it's impossible to sort on tokenized fields".
> 
> After some googling and searching in this listserv, we found that a
> sorting field has to be untokenized but our sorting field "titleStr"
> which is a copy of the "title" field has a string type.
> 
> What we did as configs in our schema.xml file :
> 
> 1st config
> ++
> 
> 
> sortMissingLast="true" omitNorms="true"/>
> 
> 
> 
> positionIncrementGap="100">
>   
> 
> 
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
> 
> 
> words="stopwords.txt"/>
> 
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
> 
> 
> protected="protwords.txt"/>
> 
>   
>   
> 
> 
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
> 
> 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> words="stopwords.txt"/>
> 
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
> 
> 
> protected="protwords.txt"/>
> 
>   
> 
> 
> ===
> 
>   
> termVectors="true"/>
>   
> 
> =
> 
> 
> 
> 
> As you can see, the title field has the termVectors property as true, we
> drop it in the second attempt of our config
> 
> 2end attempt
> 
> 
> 
> positionIncrementGap="100">
>   
> 
> 
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
> 
> 
> words="stopwords.txt"/>
> 
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
> 
> 
> protected="protwords.txt"/>
> 
>   
>   
> 
> 
> version="icu4j" composed="false" remove_diacritics="true"
> remove_modifiers="true" fold="true"/>
> 
> 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> words="stopwords.txt"/>
> 
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
> 
> 
> protected="protwords.txt"/>
> 
>   
> 
> 
> ===
> 
>   
>   
> 
> =
> 
> 
> 
> 
> 3rd attempt
> +++
> Create a new field type named 'text_exact' which doesn't use the
> "WhitespaceTokenizer" tokenizer but instead uses the "KeywordTokenizer"
> tokenizer.
> 
> 
> 
> positionIncrementGap="100">
>   
> 
> 
>

AW: Searching for empty fields possible?

2010-01-25 Thread Jan-Simon Winkelmann
> Are you indexing an empty value?  Or not indexing a field at all?
> -field:[* TO *] will match documents that do not have the field at all.

I'm not sure, theoretically fields with a null value (php-side) should end
up not having the field. But then again i don't think it's relevant just
yet. What bugs me is that if I add the -puid:[* TO *], all results for
puid:[0 TO *] disappear, even though I am using "OR".

Best,
Jan


> On Jan 25, 2010, at 7:02 AM, Jan-Simon Winkelmann wrote:

> > Hi there,
> >
> > i have a field defined in my schema as follows:
> >
> >  > required="false" />
> >
> > Valid values for the fields are (in theory) any unsigned integer  
> > value. Also
> > the field can be empty.
> >
> > My problem is, that if I have (puid:[0 TO *] OR -puid:[* TO *]) as  
> > filter
> > query i get 0 results, without the "-puid:[* TO *])" part i get  
> > about 6500
> > results. What am I doing wrong? I was under the impression i could  
> > find
> > empty fields with a "* TO *"
> > range?
> >
> > Thanks very much in advance!
> >
> > Best,
> > Jan
> >




Re: Searching for empty fields possible?

2010-01-25 Thread Erik Hatcher

Are you indexing an empty value?  Or not indexing a field at all?

-field:[* TO *] will match documents that do not have the field at all.

Erik


On Jan 25, 2010, at 7:02 AM, Jan-Simon Winkelmann wrote:


Hi there,

i have a field defined in my schema as follows:



Valid values for the fields are (in theory) any unsigned integer  
value. Also

the field can be empty.

My problem is, that if I have (puid:[0 TO *] OR -puid:[* TO *]) as  
filter
query i get 0 results, without the "-puid:[* TO *])" part i get  
about 6500
results. What am I doing wrong? I was under the impression i could  
find

empty fields with a "* TO *"
range?

Thanks very much in advance!

Best,
Jan





Searching for empty fields possible?

2010-01-25 Thread Jan-Simon Winkelmann
Hi there,

i have a field defined in my schema as follows:



Valid values for the fields are (in theory) any unsigned integer value. Also
the field can be empty.

My problem is, that if I have (puid:[0 TO *] OR -puid:[* TO *]) as filter
query i get 0 results, without the "-puid:[* TO *])" part i get about 6500
results. What am I doing wrong? I was under the impression i could find
empty fields with a "* TO *"
 range?

Thanks very much in advance!

Best,
Jan



Re: big index vs. lots of small ones

2010-01-25 Thread Thorsten Scherler
On Wed, 2010-01-20 at 08:38 -0800, Marc Sturlese wrote:
> Check out this patch witch solve the distributed IDF's problem:
> https://issues.apache.org/jira/browse/SOLR-1632
> I think it fixes what you are explaining. The price you pay is that there
> are 2 requests per shard. If I am not worng the first is to get term
> frequencies and needed info and the second one is the proper search request.
> The patch also includes caching for terms in the first request.
> 

Nice!

Thank you very much, Mark.

Como van las cosas en Barcelona?

salu2

> 
> Thorsten Scherler-3 wrote:
> > 
> > Hi all,
> > 
> > I have to do an analyses about following usecase.
> > 
> > I am working as consultant in a public company. We are talking about to
> > offer in the future each public institution its own search server
> > (probably) based on Apache Solr. However the user of our portal should
> > be able to search all indexes.
> > 
> > The problematic part for our customer is that a meta search on various
> > indexes which then later merges the response will change the scoring.
> > 
> > Imagine you have the two indexes
> > - public health department (A)
> > - press relations department (B)
> > 
> > Now you have 300 documents in A and only one in B about "influenza A".
> > The B server will return the only document in its index with a very high
> > score, since being the only one it gets a very high "base" score,
> > correct?
> > 
> > On the other hand A may have much more important documents but they will
> > not get the same "base" score.
> > 
> > Meaning on a merge most likely the document from Server B will be top of
> > the list.
> > 
> > To prevent this phenomenon we are looking into merging all the
> > standalone indexes in on big index but that will lead us in other
> > problems because it will become pretty big pretty fast.
> > 
> > So here my questions:
> > 
> > - What are other people doing to solve this problem?
> > - What is the best way with Solr to solve the problem of the "base"
> > scoring?
> > - What is the best way to have multiple indexes in solr?
> > - Is it possible to get rid of the "base" scoring in solr?
> > 
> > TIA for any informations.
> > 
> > salu2
> > -- 
> > Thorsten Scherler 
> > Open Source Java 
> > 
> > Sociedad Andaluza para el Desarrollo de la Sociedad 
> > de la Información, S.A.U. (SADESI)
> > 
> > 
> > 
> > 
> > 
> > 
> 
-- 
Thorsten Scherler 
Open Source Java 

Sociedad Andaluza para el Desarrollo de la Sociedad 
de la Información, S.A.U. (SADESI)






RE: Solr vs. Compass

2010-01-25 Thread Minutello, Nick

>> Why to embed "indexing" as a transaction dependency? Extremely weird idea.
There is nothing weird about different use cases requiring different 
approaches

If you're just thinking documents and text search ... then its less of an issue.
If you have an online application where the indexing is being used to drive 
certain features (not just search), then the transactionality is quite useful.

>> Even if "commit" takes 20 minutes?
>> It's their "selling point" nothing more.
"they" are not "selling" anything. You will find its an open-source project, 
and the main guy is quite a smart guy.
I've never seen a commit take 20 minutes... (anything taking that long is 
broken, perhaps in concept)

>> Also, note that Compass (Hibernate) ((RDBMS)) use specific "business domain 
>> model" terms 
>> with relationships; huge overhead to convert "relational" into 
>> "object-oriented" (why for? Any advantages?)
Perhaps the pros and cons of Object Relational Mapping are for another forum?

There is naturally some overhead in Compass's OSEM - but it makes your life 
easier if you work with the same domain model irrespective of whether something 
comes from the database or comes from the index. Typically we serve search 
results from the index - and when clicking on one of the results, we load from 
db to get the master copy. Moreover, the OSEM does an excellent job of 
flattening & indexing whatever object hierarchy exists into the flat lucene 
document - this is great for google-style searches where you want to find, e.g. 
some product using _anything_ you can remember about it. Using Lucene or Solr, 
you have to write the code that constructs the lucene document from the object 
& its relationships. Not an issue if you have a small number of entity types, 
but rather a pita if you have dozens... Or you eventually write some 
reflection-based thing.. (i.e. you begin to write a poor-mans implementation of 
compass OSEM)

As mentioned before, they address different kinds of problems

-Nick

 

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca] 
Sent: 23 January 2010 05:01
To: solr-user@lucene.apache.org
Subject: RE: Solr vs. Compass

Of course, I understand what "transaction" means; have you guys been thinking 
some about what may happen if we transfer $123.45 from one banking account to 
another banking account, and MySQL forgets to index "decimal" during 
transaction, or DBA was weird and forgot to create an index? Absolutely nothing.

Why to embed "indexing" as a transaction dependency? Extremely weird idea. But 
I understand some selling points...


SOLR: it is faster than Lucene. Filtered queries run faster than traditional 
"AND" queries! And this is real selling point.



Thanks,

Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay

Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search


> -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: January-22-10 11:23 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr vs. Compass
> 
> Yes, "transactional", I tried it: do we really need "transactional"?
> Even if "commit" takes 20 minutes?
> It's their "selling point" nothing more.
> HBase is not transactional, and it has specific use case; each tool 
> has specific use case... in some cases Compass is the best!
> 
> Also, note that Compass (Hibernate) ((RDBMS)) use specific "business 
> domain model" terms with relationships; huge overhead to convert 
> "relational" into "object-oriented" (why for? Any advantages?)... 
> Lucene does it behind-the-scenes: you don't have to worry that field 
> "USA" (3
> characters) is repeated in few millions documents, and field "Canada" 
> (6
> characters) in another few; no any "relational", it's done 
> automatically without any Compass/Hibernate/Table(s)
> 
> 
> Don't think "relational".
> 
> I wrote this 2 years ago:
> http://www.theserverside.com/news/thread.tss?thread_id=50711#272351
> 
> 
> Fuad Efendi
> +1 416-993-2060
> http://www.tokenizer.ca/
> 
> 
> > -Original Message-
> > From: Uri Boness [mailto:ubon...@gmail.com]
> > Sent: January-21-10 11:35 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr vs. Compass
> >
> > In addition, the biggest appealing feature in Compass is that it's 
> > transactional and therefore integrates well with your infrastructure 
> > (Spring/EJB, Hibernate, JPA, etc...). This obviously is nice for 
> > some systems (not very large scale ones) and the programming model 
> > is
> clean.
> > On the other hand, Solr scales much better and provides a load of 
> > functionality that otherwise you'll have to custom build on top of 
> > Compass/Lucene.
> >
> > Lukáš Vlček wrote:
> > > Hi,
> > >
> > > I think that these products do not compete directly that much, 
> > > each
> > fit
> > > different business case. Can you tell us more about our specific
> > situation?
> > > What do you need to search and where your data is? (DB, 
> > > Filesystem,
> > Web
> 

Re: Index gets deleted after commit?

2010-01-25 Thread Sven Maurmann

DIH is the DataImportHandler. Please consult the two URLs

  http://wiki.apache.org/solr/DataImportHandler

and

  http://wiki.apache.org/solr/DataImportHandlerFaq

for further information.

Cheers,
Sven

--On Monday, January 25, 2010 11:33:59 AM +0200 Bogdan Vatkov 
 wrote:



Hi Amit,

What is DIH? (I am Solr newbie).
In the mean time I resolved my issue - it was very stupid one - on
of the files in my folder with XMLs (that I send to Solr with the
SimplePostTool), and actually the latest created one (so it got
executed last each time I run the folder), contaoned
*:* :)

Best regards,
Bogdan

On Sun, Jan 24, 2010 at 6:25 AM, Amit Nithian 
wrote:


Are you using the DIH? If so, did you try setting clean=false in
the URL line? That prevents wiping out the index on load.

On Jan 23, 2010 4:06 PM, "Bogdan Vatkov" 
wrote:

After mass upload of docs in Solr I get some "REMOVING ALL
DOCUMENTS FROM INDEX" without any explanation.

I was running indexing w/ Solr for several weeks now and
everything was ok -
I indexed 22K+ docs using the SimplePostTool
I was first launching

*:*


then some 22K+ ...
with a finishing


But you can see from the log - right after the last commit I get
this strange REMOVING ALL...
I do not remember what I changed last but now I have this issue
that after the mass upload of docs the index gets completely
deleted.

why is this happening?


log after the last commit:

INFO: start

commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDele
tes=false) Jan 24, 2010 1:48:24 AM
org.apache.solr.core.SolrDeletionPolicy onCommit INFO:
SolrDeletionPolicy.onCommit: commits:num=2

commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/ind
ex,segFN=segments_fr,version=1260734716752,generation=567,filename
s=[segments_fr]

commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/ind
ex,segFN=segments_fs,version=1260734716753,generation=568,filename
s=[_gv.nrm, segments_fs, _gv.fdx, _gw.nrm, _gv.tii, _gv.prx,
_gv.tvf, _gv.tis, _gv.tvd, _gv.fdt, _gw.fnm, _gw.tis, _gw.frq,
_gv.fnm, _gw.prx, _gv.tvx, _gw.tii, _gv.frq]
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1260734716753
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
 INFO: Opening searc...@de26e52 main
Jan 24, 2010 1:48:24 AM
org.apache.solr.update.DirectUpdateHandler2 commit INFO:
end_commit_flush
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming searc...@de26e52 main from
searc...@4e8deb8a main

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions
=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumu
lative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming result for searc...@de26e52 main

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions
=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumu
lative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming searc...@de26e52 main from
searc...@4e8deb8a main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s
ize=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulati
ve_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan
24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s
ize=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulati
ve_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan
24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,eviction
s=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cum
ulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming result for searc...@de26e52 main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,eviction
s=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cum
ulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming searc...@de26e52 main from
searc...@4e8deb8a main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0
,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumula
tive_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming result for searc...@de26e52 main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0
,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumula
tive_hitratio=0.00,cumulative_in

Re: Index gets deleted after commit?

2010-01-25 Thread Bogdan Vatkov
Hi Amit,

What is DIH? (I am Solr newbie).
In the mean time I resolved my issue - it was very stupid one - on of the
files in my folder with XMLs (that I send to Solr with the  SimplePostTool),
and actually the latest created one (so it got executed last each time I run
the folder), contaoned *:* :)

Best regards,
Bogdan

On Sun, Jan 24, 2010 at 6:25 AM, Amit Nithian  wrote:

> Are you using the DIH? If so, did you try setting clean=false in the URL
> line? That prevents wiping out the index on load.
>
> On Jan 23, 2010 4:06 PM, "Bogdan Vatkov"  wrote:
>
> After mass upload of docs in Solr I get some "REMOVING ALL DOCUMENTS FROM
> INDEX" without any explanation.
>
> I was running indexing w/ Solr for several weeks now and everything was ok
> -
> I indexed 22K+ docs using the SimplePostTool
> I was first launching
>
> *:*
> 
>
> then some 22K+ ...
> with a finishing
> 
>
> But you can see from the log - right after the last commit I get this
> strange REMOVING ALL...
> I do not remember what I changed last but now I have this issue that after
> the mass upload of docs the index gets completely deleted.
>
> why is this happening?
>
>
> log after the last commit:
>
> INFO: start
>
> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
> Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy onCommit
> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>
> commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/index,segFN=segments_fr,version=1260734716752,generation=567,filenames=[segments_fr]
>
> commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/index,segFN=segments_fs,version=1260734716753,generation=568,filenames=[_gv.nrm,
> segments_fs, _gv.fdx, _gw.nrm, _gv.tii, _gv.prx, _gv.tvf, _gv.tis, _gv.tvd,
> _gv.fdt, _gw.fnm, _gw.tis, _gw.frq, _gv.fnm, _gw.prx, _gv.tvx, _gw.tii,
> _gv.frq]
> Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy
> updateCommits
> INFO: newest commit = 1260734716753
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening searc...@de26e52 main
> Jan 24, 2010 1:48:24 AM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
>
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for searc...@de26e52 main
>
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for searc...@de26e52 main
>
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for searc...@de26e52 main
>
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for searc...@de26e52 main
>
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Jan 24, 2010 1:48:24 AM org.apache.solr.core.QuerySenderListener
> newSearcher
> INFO: QuerySenderListener sending requests to

Re: Beyond Basic Faceted Search (SOLR-236|SOLR-64|SOLR-792)

2010-01-25 Thread David MARTIN
Hi Kelly,

Did you succeed in using these patches? It seems I've got the same need as
you : be able to collapse all products variations (SKU) under a single line
(the product).

As I'm beginning today with the field collapse patch, I'm still looking for
the best solution for this need.

Maybe someone here can give some tips to solve this - I suppose - common
need ?

David

On Thu, Jan 21, 2010 at 7:00 PM, Kelly Taylor  wrote:

>
> I'm currently using the latest SOLR-236 patch (12/24/2009) and
> field-collapsing seems to be giving me the desired results, but I'm
> wondering if I should focus more on a tree view of my catalog data instead,
> as described in "Beyond Basic Faceted Search"
>
> Is it possible that either or both of the patches for SOLR-792 or SOLR-64
> provide something like this? Below is a link to the paper, followed by an
> excerpt under the "CORRELATED FACETS" section.
>
> http://nadav.harel.org.il/papers/p33-ben-yitzhak.pdf
>
> Excerpt:
> "...model each product as a tree, in which the leaves represent specific
> instantiations, and where the attributes corresponding to each leaf are the
> union of attributes on the unique path from the root of the tree to the
> leaf. In other words, each node of the tree shares its attributes (text and
> associated metadata) with all its descendants. When we factor out common
> attributes of leaf nodes to intermediate nodes, this representation avoids
> significant duplication of text and metadata that are common to many
> variations of each product."
>
> -Kelly
> --
> View this message in context:
> http://old.nabble.com/Beyond-Basic-Faceted-Search-%28SOLR-236%7CSOLR-64%7CSOLR-792%29-tp27262017p27262017.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>