date:20110817

On Thu, Aug 18, 2011 at 10:43 AM, Cupbearer  wrote:
> I've now had several false starts on different versions of Linux.  openSUSE
> wouldn't load up on my older dell server box because it didn't like my raid
> controller and I've now been through CentOS 5.5, 5.6 and 6.0.
[...]

This is really the wrong forum for this kind of a question: You are
probably best off asking on a Linux-specific mailing list.

Having said that, we happily run Solr on Ubuntu, and Debian
servers. You will likely find Ubuntu easy to set up, though we
prefer to use Debian in production. CentOS has also worked
for us. From your description of the issues, it might well be
that you need to become familiar with the packaging system
used by the OS, or find someone to help you with the same.

Regards,
Gora

Re: ClassNotFoundException when trying to make spellcheck JaraWinkler working

2011-08-17 Thread Mike Mander


Hello Alexei Martchenko,

my search component looks like this:





textSpell





default
name
spellchecker





jarowinkler
spell

   org.apache.lucene.search.spell.JaroWinklerDistance

spellcheckerJaro










I realy only uncommented the jaro winkler stuff. I've found the Class in 
the solr.war.

So it should be in classpath.

Thanks
Mike


Hi Mike, is your config like this?
Is queryAnalyzerFieldType matching your type of field to be indexed?
Is the field correct?


textSpell

jarowinkler
sear_spellterms
false
true
org.apache.lucene.search.spell.JaroWinklerDistance
./spellchecker_jarowinkler



2011/8/17 Mike Mander


Hello,

i get a ClassNotFoundException for JaraWinklerDistance when i start the
solr example server.
I simply copied the server and uncommented the spellchecker in
example/conf/solr-config.xml
I did nothing else.

I already googled but didn't get a hint. Can someone help me please.

Thanks
Mike

Stacktrace:

C:\Users\m.mander\Desktop\**temp\apache-solr-3.3.0\**example>java -jar
start.jar
2011-08-17 14:55:20.379:INFO::Logging to STDERR via
org.mortbay.log.StdErrLog
2011-08-17 14:55:20.462:INFO::jetty-6.1-**SNAPSHOT
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
INFO: Solr home set to 'solr/'
17.08.2011 14:55:20 org.apache.solr.servlet.**SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer$Initializer
initialize
INFO: looking for solr.xml: C:\Users\m.mander\Desktop\**
temp\apache-solr-3.3.0\**example\solr\solr.xml
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or
JNDI)
17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer
INFO: New CoreContainer: solrHome=solr/ instance=22725577
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
INFO: Solr home set to 'solr/'
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
INFO: Solr home set to 'solr\.\'
17.08.2011 14:55:20 org.apache.solr.core.**SolrConfig initLibs
INFO: Adding specified lib dirs to ClassLoader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/asm-**3.1.jar' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/asm-**LICENSE-BSD_LIKE.txt' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/asm-**NOTICE.txt' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/**bcmail-jdk15-1.45.jar' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/**bcmail-LICENSE-BSD_LIKE.txt' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/**bcmail-NOTICE.txt' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/**bcprov-jdk15-1.45.jar' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/**bcprov-LICENSE-BSD_LIKE.txt' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/**bcprov-NOTICE.txt' to classloader
17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
0/contrib/extraction/lib/**boilerpipe-1.1.0.jar' to classloader
17.08.2011 14

DataImportHandler using new connection on each query

2011-08-17 Thread Kevin Osborn

I have a data import handler that is importing data in full mode from SQL 
Server. It has one main entity and three sub-entities. Against a good database, 
it appears to open 4 connections total. One for the main query and the other 3 
subqueries just re-use their connections. This works well enough.

However, I tested this against a slower SQL Server and I saw dramatically worse 
results. Instead of re-using their database, each of the sub-entities is 
recreating a connection each time the query runs. So, this is resulting in 
terrible performance. My guess is that it is some sort of timeout. The 
dataimporthandler is interpreting the slow connection as a dead connection, and 
re-creating the db connection. However, it is a slow connection (and does 
return data), but it is not a dead connection.

I tried to apply the SOLR-2233 patch to Solr 1.4.1, but that did not seem to 
have much of an effect.

Re: is it possible to remove response header from the JSON fromat?

2011-08-17 Thread Filype Pereira

I reckon you can use &omitHeader=true on the URL, hope that it helps

On Thu, Aug 18, 2011 at 5:21 PM, nagarjuna wrote:

> Thank u very much for ur reply   Erik Hatcher...
>  actually i thought to use my solr response in Jquery page.
>
> i have one sample url
>
> http://api.geonames.org/postalCodeSearchJSON?postalcode=9011&maxRows=10&username=demo
>
> http://api.geonames.org/postalCodeSearchJSON?postalcode=9011&maxRows=10&username=demo
> which will produce json response and i successfully got the data from that
> url but i am unable to get the data from my solr url which will give the
> following response
>
> {
>  "response":{"numFound":20,"start":0,"docs":[
>  {
>"keywords":"test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"Test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"Test"}]
>  }} so i need to format the above response as pure json response but i am
> unable to do that...so please help me
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/is-it-possible-to-remove-response-header-from-the-JSON-fromat-tp3261957p3263891.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: is it possible to remove response header from the JSON fromat?

2011-08-17 Thread nagarjuna

Thank u very much for ur reply   Erik Hatcher...
  actually i thought to use my solr response in Jquery page.

i have one sample url 
http://api.geonames.org/postalCodeSearchJSON?postalcode=9011&maxRows=10&username=demo
http://api.geonames.org/postalCodeSearchJSON?postalcode=9011&maxRows=10&username=demo
 
which will produce json response and i successfully got the data from that
url but i am unable to get the data from my solr url which will give the
following response

{
  "response":{"numFound":20,"start":0,"docs":[
  {
"keywords":"test"},
  {
"keywords":"test"},
  {
"keywords":"test"},
  {
"keywords":"Test"},
  {
"keywords":"test"},
  {
"keywords":"test"},
  {
"keywords":"test"},
  {
"keywords":"test"},
  {
"keywords":"test"},
  {
"keywords":"Test"}]
  }} so i need to format the above response as pure json response but i am
unable to do that...so please help me

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-it-possible-to-remove-response-header-from-the-JSON-fromat-tp3261957p3263891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Stable Linux Release

2011-08-17 Thread Cupbearer

I've now had several false starts on different versions of Linux.  openSUSE
wouldn't load up on my older dell server box because it didn't like my raid
controller and I've now been through CentOS 5.5, 5.6 and 6.0.  I really
thought I was just about to get there.  I get SolR to load up however the
basic repositories for php aren't loading up the right libraries to connect
to SolR.  I think it's the php-mysqlnd that isn't getting loaded that SolR
needs but I can't be for sure.

I have everything working on a xampp windows development box but I'm coming
up with a big 0 on the Linux server.  Anyone have any ideas on a stable
simple release of Linux to use, or do I just need to learn how to compile my
own php and keep my fingers crossed and hope I don't have to ever upgrade.

Thanks for any input on this.  I've been working on this project for quite a
while and I was so close just to have that part fail.

Jerry Craig
Cupbearer

-

Cupbearer 
Jerry E. Craig, Jr.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stable-Linux-Release-tp3263876p3263876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi

I forgot to add: company from UK, something "log" related (please have a
look at recent LucidImagination -managed Solr Revolution conference blogs;
company provides "log analyzer" service; http://loggly.com/) they have
16,000 cores per Solr instance (multi-tenancy); of course they have at
least 100k fields per instance they don't have any problem outside Amazon
;)))


-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca








On 11-08-17 11:08 PM, "Fuad Efendi"  wrote:

>more investigation and I see that I have 100+ dynamic fields for my
>documents, not the 10 fields I quoted earlier.  I also sort against
>those
>

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi

I agree with Yonik of course;
But

You should see OOM errors in this case. In case of "virtualization"
however it is unpredictable and if JVM doesn't have few bytes to output
OOM into log file (because we are catching "throwable" and trying to
generate HTTP 500 instead !!! Freaky)

Ok

Sorry for not contributing a patch


-Fuad (ZooKeeper)
http://www.OutsideIQ.com







On 11-08-17 6:01 PM, "Yonik Seeley"  wrote:

>On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy  wrote:
>> I've only set set minimum memory and have not set maximum memory.  I'm
>>doing
>> more investigation and I see that I have 100+ dynamic fields for my
>> documents, not the 10 fields I quoted earlier.  I also sort against
>>those
>> dynamic fields often,  I'm reading that this potentially uses a lot of
>> memory.  Could this be the cause of my problems and if so what options
>>do I
>> have to deal with this?
>
>Yes, that's most likely the problem.
>Sorting on an integer field causes a FieldCache entry with an
>int[maxDoc] (i.e. 4 bytes per document in the index, regardless of if
>it has a value for that field or not).
>Sorting on a string field is 4 bytes per doc in the index (the ords)
>plus the memory to store the actual unique string values.
>
>-Yonik
>http://www.lucidimagination.com
>
>
>
>> On Wed, Aug 17, 2011 at 2:46 PM, Markus Jelsma
>> wrote:
>>
>>> Keep in mind that a commit warms up another searcher and potentially
>>> doubling
>>> RAM consumption in the back ground due to cache warming queries being
>>> executed
>>> (newSearcher event). Also, where is your Xmx switch? I don't know how
>>>your
>>> JVM
>>> will behave if you set Xms > Xmx.
>>>
>>> 65m docs is quite a lot but it should run fine with 3GB heap
>>>allocation.
>>>
>>> It's a good practice to use a master for indexing without any caches
>>>and
>>> warm-
>>> up queries when you exceed a certain amount of documents, it will bite.
>>>
>>> > I have a large ec2 instance(7.5 gb ram), it dies every few hours
>>>with out
>>> > of heap memory issues.  I started upping the min memory required,
>>> > currently I use -Xms3072M .
>>> > I insert about 50k docs an hour and I currently have about 65 million
>>> docs
>>> > with about 10 fields each. Is this already too much data for one
>>>box? How
>>> > do I know when I've reached the limit of this server? I have no idea
>>>how
>>> > to keep control of this issue.  Am I just supposed to keep upping
>>>the min
>>> > ram used for solr? How do I know what the accurate amount of ram I
>>>should
>>> > be using is? Must I keep adding more memory as the index size grows,
>>>I'd
>>> > rather the query be a little slower if I can use constant memory and
>>>have
>>> > the search read from disk.
>>>
>>
>>
>>
>> --
>> - sent from my mobile
>> 6176064373
>>

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi

EC2 7.5Gb (large CPU instance, $0.68/hour) sucks. Unpredictably, there are
errors such as

User time: 0 seconds
Kernel time: 0 seconds
Real time: 600 seconds

How can "clock time" be higher in such extent? Only if _another_ user used
600 seconds CPU: _virtualization_


My client have had constant problems. We are moving to dedicated hardware
(25 times cheaper in average; Amazon sells 1 Tb of EBS for $100/month,
plus additional costs for I/O)


> I have a large ec2 instance(7.5 gb ram), it dies every few hours with out
> of heap memory issues.  I started upping the min memory required,
> currently I use -Xms3072M .



"Large CPU" instance is "virtualization" and behaviour is unpredictable.
Choose "cluster" instance with explicit Intel XEON CPU (instead of
"CPU-Units") and compare behaviour; $1.60/hour. Please share results.

Thanks,





-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca








On 11-08-17 5:56 PM, "Jason Toy"  wrote:

>I've only set set minimum memory and have not set maximum memory.  I'm
>doing
>more investigation and I see that I have 100+ dynamic fields for my
>documents, not the 10 fields I quoted earlier.  I also sort against those
>dynamic fields often,  I'm reading that this potentially uses a lot of
>memory.  Could this be the cause of my problems and if so what options do
>I
>have to deal with this?
>
>On Wed, Aug 17, 2011 at 2:46 PM, Markus Jelsma
>wrote:
>
>> Keep in mind that a commit warms up another searcher and potentially
>> doubling
>> RAM consumption in the back ground due to cache warming queries being
>> executed
>> (newSearcher event). Also, where is your Xmx switch? I don't know how
>>your
>> JVM
>> will behave if you set Xms > Xmx.
>>
>> 65m docs is quite a lot but it should run fine with 3GB heap allocation.
>>
>> It's a good practice to use a master for indexing without any caches and
>> warm-
>> up queries when you exceed a certain amount of documents, it will bite.
>>
>> > I have a large ec2 instance(7.5 gb ram), it dies every few hours with
>>out
>> > of heap memory issues.  I started upping the min memory required,
>> > currently I use -Xms3072M .
>> > I insert about 50k docs an hour and I currently have about 65 million
>> docs
>> > with about 10 fields each. Is this already too much data for one box?
>>How
>> > do I know when I've reached the limit of this server? I have no idea
>>how
>> > to keep control of this issue.  Am I just supposed to keep upping the
>>min
>> > ram used for solr? How do I know what the accurate amount of ram I
>>should
>> > be using is? Must I keep adding more memory as the index size grows,
>>I'd
>> > rather the query be a little slower if I can use constant memory and
>>have
>> > the search read from disk.
>>
>
>
>
>-- 
>- sent from my mobile
>6176064373

Re: hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters

2011-08-17 Thread Koji Sekiguchi


Alexei,

From the log, I think Solr couldn't find colored fragmentsBuilder defined in 
solrconfig.xml.
Can you check the following  setting in 
...
 in solrconfig.xml?

koji
--
Check out "Query Log Visualizer"
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

(11/08/16 8:51), Alexei Martchenko wrote:

I'm having some trouble trying to upgrade my old hightligher
from  format (1.4 version, default
config in the solr website) to the new Fast Vector highlighter.

I'm using SOLR 3.3.0 withLUCENE_33
in

In my solrconfig.xml i added these lines

in the default request handler:

true
true
true
colored

and


   
 
 
   


All I get is: ('grave' means severe)

15/08/2011 20:44:19 org.apache.solr.common.SolrException log
GRAVE: org.apache.solr.common.SolrException: Unknown fragmentsBuilder:
colored
 at
org.apache.solr.highlight.DefaultSolrHighlighter.getSolrFragmentsBuilder(DefaultSolrHighlighter.java:320)
 at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:508)

 at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:376)
 at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:116)
 at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Docs in http://wiki.apache.org/solr/HighlightingParameters say:

hl.fragmentsBuilder

Specify the name of
SolrFragmentsBuilder
. [image:] Solr3.1  This parameter
makes sense for
FastVectorHighlighter
  only.

SolrFragmentsBuilder
  respects
hl.tag.pre/post parameters:

Synonym and Whitespaces and optional TokenizerFactory

2011-08-17 Thread Will Milspec

Hi all,

This may be obvious. My question pertains to use of tokenizerFactory
together with SynonymFilterFactory. Which tokenizerFactory does one  use to
treat "synonyms with spaces" as one token,

Example these two entries are synonyms: "lms", "learning management system"

index time expansion would expand "lms" to these terms
   "lms"
   "learning management system"

i.e. not  like this:
   "lms"
   "learning"
   "management"
   "system"

Excerpt from the wiki article:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

The optional *tokenizerFactory* parameter names a tokenizer factory class to
analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319), which
can help with the synonym+stemming problem described in
http://search-lucene.com/m/hg9ri2mDvGk1 .


thanks,

will

Solr Join in 3.3.x

2011-08-17 Thread Cameron Hurst


Hello all,

I was looking into finding a way to do filtering of documents based on 
fields of other documents in the index. In particular I have a document 
that will update very frequently and hundreds that will very rarely 
change, but the rarely changing documents have a field that will change 
often that is denormalized from the frequently changing document. The 
brute force method I have is to reindex all the documents every time 
that field changes, but this at times is a huge load on my server at a 
critical time that I am trying to avoid.


To avoid this hit I was trying to implement patch SOLR-2272. This opens 
up a join feature to map fields of 1 document onto another (or so my 
understanding is). This would allow me to only update that 1 document 
and have the change applied to all others that rely on it. There is a 
number of spots that this patch fails to apply and I was wondering if 
anyone has tried to use join in 3.3 or any other released version of 
SOLR or if I the only way to do it is use 4.0.


Also while I found this patch, I am also open to any other ideas that 
people have on how to accomplish what I need, this just seemed like the 
most direct method.


Thanks for the help,

Cameron

Re: Highlighting does not works with uniqueField set


: I am new to solr. Am facing an issue wherein the highlighting of the 
: searchresults for matches is not working when I have set a unique field 
: as:
: 
: id
: 
: If this is commented then highlighting starts working. I need to have a 
: unique field. Could someone please explain this erratic behaviour. I am 
: setting this field while posting the documents to be indexed.

there is no reason why having a uniqueKey should prevent highlihting from 
working.  But in order to try and help you you need to provide a *lot* 
more info. for starters: explaining what you mean by "not working" ... are 
you getting an error? are you getting results you don't expect? what do 
your configs look like? what do your requests look like? etc...

Please consult this wiki page and then repost your question as a reply...

https://wiki.apache.org/solr/UsingMailingLists



-Hoss

Re: CoreContainer from CommonsHttpSolrServer

: I'm using Solr (with multiple cores) in a Webapp and access the differnt
: cores using CommonsHttpSolrServer. As I would like to know, which cores are
: configured and what there status is I would like to get an instance of
: CoreContainer.

CoreContainer is an internal API inside of Solr -- so you can't get access 
to that when talking to Solr over HTTP

: The site http://wiki.apache.org/solr/CoreAdmin tells me how to interact with
: the CoreAdminHandler via my browser. But I would like to get the information
: provided by the STATUS action in my java application. As CoreContainer

all of those commands are accessable over HTTP from any client, including 
SolrJ -- take a look at the CoreAdminRequest class (or you can role your 
own using SolrRequest and setting the request params directly)

https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/CoreAdminRequest.html

-Hoss

Re: return meta information with every search

: we have a fairly complex taxonomy in our search system. I want to store the
: taxonomy revision that was used to built the Solr index. This revision

based on your wording, it sounds like this is an index that you don't ever 
update incrementally, and just rebuild and deploy completlye new indexes 
periodically .. is that correct?

If that's the case, one very low tech solution would to update your 
an "invariant" param on your requestHandlers everytime you build a new 
version of the index -- the specific name of the param could be whatever 
you want, solr won't care about it, but then using echoParams=all it would 
be echoed back to the client.

A slightly higher-tech way would be to write yourselve a trivial little 
SearchComponent that could read the taxonomyId from some where on 
startup (a txt file in the data dir perhaps) and then add it directly to 
the response.

-Hoss

Re: solr keeps dying every few hours.

What can I do temporarily in this situation? It seems like I must eventually
move to a distributed  setup. I am sorting on dynamic float fields.

On Wed, Aug 17, 2011 at 3:01 PM, Yonik Seeley wrote:

> On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy  wrote:
> > I've only set set minimum memory and have not set maximum memory.  I'm
> doing
> > more investigation and I see that I have 100+ dynamic fields for my
> > documents, not the 10 fields I quoted earlier.  I also sort against those
> > dynamic fields often,  I'm reading that this potentially uses a lot of
> > memory.  Could this be the cause of my problems and if so what options do
> I
> > have to deal with this?
>
> Yes, that's most likely the problem.
> Sorting on an integer field causes a FieldCache entry with an
> int[maxDoc] (i.e. 4 bytes per document in the index, regardless of if
> it has a value for that field or not).
> Sorting on a string field is 4 bytes per doc in the index (the ords)
> plus the memory to store the actual unique string values.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> > On Wed, Aug 17, 2011 at 2:46 PM, Markus Jelsma
> > wrote:
> >
> >> Keep in mind that a commit warms up another searcher and potentially
> >> doubling
> >> RAM consumption in the back ground due to cache warming queries being
> >> executed
> >> (newSearcher event). Also, where is your Xmx switch? I don't know how
> your
> >> JVM
> >> will behave if you set Xms > Xmx.
> >>
> >> 65m docs is quite a lot but it should run fine with 3GB heap allocation.
> >>
> >> It's a good practice to use a master for indexing without any caches and
> >> warm-
> >> up queries when you exceed a certain amount of documents, it will bite.
> >>
> >> > I have a large ec2 instance(7.5 gb ram), it dies every few hours with
> out
> >> > of heap memory issues.  I started upping the min memory required,
> >> > currently I use -Xms3072M .
> >> > I insert about 50k docs an hour and I currently have about 65 million
> >> docs
> >> > with about 10 fields each. Is this already too much data for one box?
> How
> >> > do I know when I've reached the limit of this server? I have no idea
> how
> >> > to keep control of this issue.  Am I just supposed to keep upping the
> min
> >> > ram used for solr? How do I know what the accurate amount of ram I
> should
> >> > be using is? Must I keep adding more memory as the index size grows,
> I'd
> >> > rather the query be a little slower if I can use constant memory and
> have
> >> > the search read from disk.
> >>
> >
> >
> >
> > --
> > - sent from my mobile
> > 6176064373
> >
>



-- 
- sent from my mobile
6176064373

Re: Solr spellcheck and multiple collations

Thank you very much for this awesome config. I'm working on it as we speak.

2011/8/17 Herman Kiefus 

> If you only get one, best, collation then there is no point to my question;
> however, since you asked...
>
> The relevant sections:
>
> Solrconfig.xml -
>
> 
>
> textDictionary
>
> 
>default
>solr.IndexBasedSpellChecker
>TermsDictionary
>./spellchecker
>0.0
>score
> 
>
> Schema.xml -
>
>  positionIncrementGap="100" omitNorms="true">
>
>
> words="stopwords.txt"/>
> words="correctly_spelled_terms.txt" ignoreCase="true"/>
>
>
>
>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
>
>
>
> 
>
>  positionIncrementGap="100" omitNorms="true">
>
>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> words="stopwords.txt"/>
>
>
>
> 
>
>  indexed="false" stored="false" multiValued="true"/>
>  stored="false" multiValued="true"/>
>
> 
> 
> 
> 
> 
> 
>  dest="CorrectlySpelledTerms"/>
> 
> 
> 
>  dest="CorrectlySpelledTerms"/>
>
> 
> 
> 
> 
> 
> 
>
> -Original Message-
> From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
> Sent: Wednesday, August 17, 2011 5:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr spellcheck and multiple collations
>
> Can u show us how is your schema and config?
>
> I believe that's how collation is: the best match, only one.
>
> 2011/8/17 Herman Kiefus 
>
> > After a bit of work, we have 'spellchecking' up and going and we are
> > happy with the suggestions.  I have not; however, ever been able to
> > generate more than one collation query.  Is there something simple that I
> have overlooked?
> >
>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br|
> ale...@martchenko.com.br | (11)
> 5083.1018/5080.3535/5080.3533
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

RE: Solr spellcheck and multiple collations

I quickly went through what you've got from your last 2 posts and do not see 
any problems.  You might want to double-check that your client is translating 
the constant variable you've got for "spellcheck.maxCollationTries" correctly 
in your query, or if you've got it in the request handler config that its 
spelled out right in there.

The other thing, obviously, is you'll only get 1 collation if there is only 1 
combination from the individual words it suggested that returns hits.  You may 
need to play with different test queries to find one that can generate more 
than 1 good collation.  Also if you set spellcheck.maxCollationTries down to 
zero it will return all the possibilities (up to the spellcheck.maxCollation 
value), even the nonsensical ones.  That might be helpful to do for testing.

Also, these params are in solr 3.x and higher.  So it won't work in 1.4 without 
the SOLR-2010 patch.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com] 
Sent: Wednesday, August 17, 2011 4:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr spellcheck and multiple collations

Thanks James, here are the settings that only yield the one collation:

static int count = 10;
static bool onlyMorePopular = true;
static bool extendedResults = true;
static bool collate = true;
static int maxCollations = 10;
static int maxCollationTries = 100;
static int maxCollationEvaluations = 1;
static bool collateExtendedResults = true;
static float accuracy = 0.7f;

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Wednesday, August 17, 2011 5:48 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr spellcheck and multiple collations

Herman,

- Specify "spellcheck.maxCollations" with something higher than one to get more 
than 1 collation.  

- If you also want the spellchecker to test whether or not a particular 
collation will return hits, also specify "spellcheck.maxCollationTries"

- If you also want to know how many hits each collation will return, also 
specify "spellcheck.collateExtendedResults=true"

- See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollations 
for more information

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com] 
Sent: Wednesday, August 17, 2011 4:31 PM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck and multiple collations

After a bit of work, we have 'spellchecking' up and going and we are happy with 
the suggestions.  I have not; however, ever been able to generate more than one 
collation query.  Is there something simple that I have overlooked?

Re: 'Stable' 4.0 version

2011-08-17 Thread Tomás Fernández Löbbe

Version number's are confusing, no doubt's. 4.x is currently on trunk (and
was not released yet). 3.x is a maintained branch. There has been three
releases from 3.x branch: 3.1, 3.2 and 3.3.

Most of the spatial search stuff is available since 3.1 (includding geoflit,
geodist and the location field type),

If you decide to go with trunk (a.k.a, Solr 4), I would use a more recent
revision of it. Much has changed since December last year.

Tomás

On Wed, Aug 17, 2011 at 6:23 PM, Herman Kiefus wrote:

> I should say I'm running: Solr Specification Version:
> 4.0.0.2010.12.10.08.54.56 and by the looks of the version number I'm running
> something from Dec 12 of last year.
>
> Tomas: geofilt and geodist() are supported in 3.3?  Along with the location
> and point type?  Quite frankly, 1.3/1.4, 3.3, 4.0 all confuse me.  I just
> had our operations personnel install versions until I got the needed
> functionality.
>
> -Original Message-
> From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
> Sent: Wednesday, August 17, 2011 5:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: 'Stable' 4.0 version
>
> As far as I know, Solr's trunk is pretty stable, so you shoundl't have many
> problems with it if you test it correctly. Lucid's search platform is built
> upon the trunk (
> http://www.lucidimagination.com/products/lucidworks-search-platform/enterprise
> ).
> The one thing I would be concerned is with the index format. It might
> change in an incompatible way from one revision to the next one, so if
> rebuilding your index is complicated or takes too long this can be a
> problem.
>
> If your version election is based on the geospatial stuff, why don't you
> use Solr 3.3 release? It already contains those features.
>
> Tomás
>
> On Wed, Aug 17, 2011 at 4:58 PM, Jaeger, Jay - DOT  >wrote:
>
> > > geospatial requirements
> >
> > Looking at your email address, no surprise there.  8^)
> >
> > > What insight can you share (if any) regarding moving forward to a
> > > later
> > nightly build?
> >
> > I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the
> > time) during some testing, and it performed well -- but we were not
> > doing geospatial indexing with Solr.  Or are you referring to the
> > successor to Solr 3.3 at some future point in time (which I supposed
> > might also be called Solr 4 in the future -- won't that be confusing!)
> >
> > -Original Message-
> > From: Herman Kiefus [mailto:herm...@angieslist.com]
> > Sent: Wednesday, August 17, 2011 2:55 PM
> > To: solr-user@lucene.apache.org
> > Subject: 'Stable' 4.0 version
> >
> > My origination uses Solr 4 because of our geospatial requirements.
> > What insight can you share (if any) regarding moving forward to a
> > later nightly build?  Or, for those of you using 4.0 in a Production
> > setting, when is it that you move ahead?
> >
>

Re: solr keeps dying every few hours.

2011-08-17 Thread Yonik Seeley

On Wed, Aug 17, 2011 at 5:56 PM, Jason Toy  wrote:
> I've only set set minimum memory and have not set maximum memory.  I'm doing
> more investigation and I see that I have 100+ dynamic fields for my
> documents, not the 10 fields I quoted earlier.  I also sort against those
> dynamic fields often,  I'm reading that this potentially uses a lot of
> memory.  Could this be the cause of my problems and if so what options do I
> have to deal with this?

Yes, that's most likely the problem.
Sorting on an integer field causes a FieldCache entry with an
int[maxDoc] (i.e. 4 bytes per document in the index, regardless of if
it has a value for that field or not).
Sorting on a string field is 4 bytes per doc in the index (the ords)
plus the memory to store the actual unique string values.

-Yonik
http://www.lucidimagination.com



> On Wed, Aug 17, 2011 at 2:46 PM, Markus Jelsma
> wrote:
>
>> Keep in mind that a commit warms up another searcher and potentially
>> doubling
>> RAM consumption in the back ground due to cache warming queries being
>> executed
>> (newSearcher event). Also, where is your Xmx switch? I don't know how your
>> JVM
>> will behave if you set Xms > Xmx.
>>
>> 65m docs is quite a lot but it should run fine with 3GB heap allocation.
>>
>> It's a good practice to use a master for indexing without any caches and
>> warm-
>> up queries when you exceed a certain amount of documents, it will bite.
>>
>> > I have a large ec2 instance(7.5 gb ram), it dies every few hours with out
>> > of heap memory issues.  I started upping the min memory required,
>> > currently I use -Xms3072M .
>> > I insert about 50k docs an hour and I currently have about 65 million
>> docs
>> > with about 10 fields each. Is this already too much data for one box? How
>> > do I know when I've reached the limit of this server? I have no idea how
>> > to keep control of this issue.  Am I just supposed to keep upping the min
>> > ram used for solr? How do I know what the accurate amount of ram I should
>> > be using is? Must I keep adding more memory as the index size grows, I'd
>> > rather the query be a little slower if I can use constant memory and have
>> > the search read from disk.
>>
>
>
>
> --
> - sent from my mobile
> 6176064373
>

Re: solr keeps dying every few hours.

I've only set set minimum memory and have not set maximum memory.  I'm doing
more investigation and I see that I have 100+ dynamic fields for my
documents, not the 10 fields I quoted earlier.  I also sort against those
dynamic fields often,  I'm reading that this potentially uses a lot of
memory.  Could this be the cause of my problems and if so what options do I
have to deal with this?

On Wed, Aug 17, 2011 at 2:46 PM, Markus Jelsma
wrote:

> Keep in mind that a commit warms up another searcher and potentially
> doubling
> RAM consumption in the back ground due to cache warming queries being
> executed
> (newSearcher event). Also, where is your Xmx switch? I don't know how your
> JVM
> will behave if you set Xms > Xmx.
>
> 65m docs is quite a lot but it should run fine with 3GB heap allocation.
>
> It's a good practice to use a master for indexing without any caches and
> warm-
> up queries when you exceed a certain amount of documents, it will bite.
>
> > I have a large ec2 instance(7.5 gb ram), it dies every few hours with out
> > of heap memory issues.  I started upping the min memory required,
> > currently I use -Xms3072M .
> > I insert about 50k docs an hour and I currently have about 65 million
> docs
> > with about 10 fields each. Is this already too much data for one box? How
> > do I know when I've reached the limit of this server? I have no idea how
> > to keep control of this issue.  Am I just supposed to keep upping the min
> > ram used for solr? How do I know what the accurate amount of ram I should
> > be using is? Must I keep adding more memory as the index size grows, I'd
> > rather the query be a little slower if I can use constant memory and have
> > the search read from disk.
>



-- 
- sent from my mobile
6176064373

RE: Solr spellcheck and multiple collations

Thanks James, here are the settings that only yield the one collation:

static int count = 10;
static bool onlyMorePopular = true;
static bool extendedResults = true;
static bool collate = true;
static int maxCollations = 10;
static int maxCollationTries = 100;
static int maxCollationEvaluations = 1;
static bool collateExtendedResults = true;
static float accuracy = 0.7f;

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Wednesday, August 17, 2011 5:48 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr spellcheck and multiple collations

Herman,

- Specify "spellcheck.maxCollations" with something higher than one to get more 
than 1 collation.  

- If you also want the spellchecker to test whether or not a particular 
collation will return hits, also specify "spellcheck.maxCollationTries"

- If you also want to know how many hits each collation will return, also 
specify "spellcheck.collateExtendedResults=true"

- See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollations 
for more information

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com] 
Sent: Wednesday, August 17, 2011 4:31 PM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck and multiple collations

After a bit of work, we have 'spellchecking' up and going and we are happy with 
the suggestions.  I have not; however, ever been able to generate more than one 
collation query.  Is there something simple that I have overlooked?

RE: Solr spellcheck and multiple collations

If you only get one, best, collation then there is no point to my question; 
however, since you asked...

The relevant sections:

Solrconfig.xml -



textDictionary


default
solr.IndexBasedSpellChecker
TermsDictionary
./spellchecker
0.0
score


Schema.xml -



















































-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Wednesday, August 17, 2011 5:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr spellcheck and multiple collations

Can u show us how is your schema and config?

I believe that's how collation is: the best match, only one.

2011/8/17 Herman Kiefus 

> After a bit of work, we have 'spellchecking' up and going and we are 
> happy with the suggestions.  I have not; however, ever been able to 
> generate more than one collation query.  Is there something simple that I 
> have overlooked?
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | 
ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

RE: Solr spellcheck and multiple collations

Herman,

- Specify "spellcheck.maxCollations" with something higher than one to get more 
than 1 collation.  

- If you also want the spellchecker to test whether or not a particular 
collation will return hits, also specify "spellcheck.maxCollationTries"

- If you also want to know how many hits each collation will return, also 
specify "spellcheck.collateExtendedResults=true"

- See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollations 
for more information

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com] 
Sent: Wednesday, August 17, 2011 4:31 PM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck and multiple collations

After a bit of work, we have 'spellchecking' up and going and we are happy with 
the suggestions.  I have not; however, ever been able to generate more than one 
collation query.  Is there something simple that I have overlooked?

Re: solr keeps dying every few hours.

2011-08-17 Thread Markus Jelsma

Keep in mind that a commit warms up another searcher and potentially doubling 
RAM consumption in the back ground due to cache warming queries being executed 
(newSearcher event). Also, where is your Xmx switch? I don't know how your JVM 
will behave if you set Xms > Xmx.

65m docs is quite a lot but it should run fine with 3GB heap allocation.

It's a good practice to use a master for indexing without any caches and warm-
up queries when you exceed a certain amount of documents, it will bite.

> I have a large ec2 instance(7.5 gb ram), it dies every few hours with out
> of heap memory issues.  I started upping the min memory required,
> currently I use -Xms3072M .
> I insert about 50k docs an hour and I currently have about 65 million docs
> with about 10 fields each. Is this already too much data for one box? How
> do I know when I've reached the limit of this server? I have no idea how
> to keep control of this issue.  Am I just supposed to keep upping the min
> ram used for solr? How do I know what the accurate amount of ram I should
> be using is? Must I keep adding more memory as the index size grows, I'd
> rather the query be a little slower if I can use constant memory and have
> the search read from disk.

Re: solr keeps dying every few hours.

I've just checked my index size with  and my data folder is 16GB.  So if my
server only has 7.5 gb of ram, does that mean I can't reliably run solr on
this one box and its useless to optimize the box?
 If so it look like its time to start using a cluster?

On Wed, Aug 17, 2011 at 2:28 PM, Herman Kiefus wrote:

> While I can't be as specific as other here will be, we encountered the
> same/similar problem.  We simply loaded up our servers with 48GB and life is
> good.  I too would like to be a bit more proactive on the provisioning front
> and hopefully someone will come along and help us out.
>
> FWIW and I'm sure someone will correct me, but it seems as if the Java GC
> cannot keep up with cache allocation; in our case everything was fine until
> the nth query and then the box would go TU.  But leave it to Solr, it would
> simply 'restart' and start serving queries again.
>
> -Original Message-
> From: Jason Toy [mailto:jason...@gmail.com]
> Sent: Wednesday, August 17, 2011 5:15 PM
> To: solr-user@lucene.apache.org
> Subject: solr keeps dying every few hours.
>
> I have a large ec2 instance(7.5 gb ram), it dies every few hours with out
> of heap memory issues.  I started upping the min memory required, currently
> I use -Xms3072M .
> I insert about 50k docs an hour and I currently have about 65 million docs
> with about 10 fields each. Is this already too much data for one box? How do
> I know when I've reached the limit of this server? I have no idea how to
> keep control of this issue.  Am I just supposed to keep upping the min ram
> used for solr? How do I know what the accurate amount of ram I should be
> using is? Must I keep adding more memory as the index size grows, I'd rather
> the query be a little slower if I can use constant memory and have the
> search read from disk.
>



-- 
- sent from my mobile
6176064373

Re: Solr spellcheck and multiple collations

Can u show us how is your schema and config?

I believe that's how collation is: the best match, only one.

2011/8/17 Herman Kiefus 

> After a bit of work, we have 'spellchecking' up and going and we are happy
> with the suggestions.  I have not; however, ever been able to generate more
> than one collation query.  Is there something simple that I have overlooked?
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Solr spellcheck and multiple collations

After a bit of work, we have 'spellchecking' up and going and we are happy with 
the suggestions.  I have not; however, ever been able to generate more than one 
collation query.  Is there something simple that I have overlooked?

RE: solr keeps dying every few hours.

While I can't be as specific as other here will be, we encountered the 
same/similar problem.  We simply loaded up our servers with 48GB and life is 
good.  I too would like to be a bit more proactive on the provisioning front 
and hopefully someone will come along and help us out.

FWIW and I'm sure someone will correct me, but it seems as if the Java GC 
cannot keep up with cache allocation; in our case everything was fine until the 
nth query and then the box would go TU.  But leave it to Solr, it would simply 
'restart' and start serving queries again.

-Original Message-
From: Jason Toy [mailto:jason...@gmail.com] 
Sent: Wednesday, August 17, 2011 5:15 PM
To: solr-user@lucene.apache.org
Subject: solr keeps dying every few hours.

I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of 
heap memory issues.  I started upping the min memory required, currently I use 
-Xms3072M .
I insert about 50k docs an hour and I currently have about 65 million docs with 
about 10 fields each. Is this already too much data for one box? How do I know 
when I've reached the limit of this server? I have no idea how to keep control 
of this issue.  Am I just supposed to keep upping the min ram used for solr? 
How do I know what the accurate amount of ram I should be using is? Must I keep 
adding more memory as the index size grows, I'd rather the query be a little 
slower if I can use constant memory and have the search read from disk.

RE: 'Stable' 4.0 version

I should say I'm running: Solr Specification Version: 4.0.0.2010.12.10.08.54.56 
and by the looks of the version number I'm running something from Dec 12 of 
last year.

Tomas: geofilt and geodist() are supported in 3.3?  Along with the location and 
point type?  Quite frankly, 1.3/1.4, 3.3, 4.0 all confuse me.  I just had our 
operations personnel install versions until I got the needed functionality.

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] 
Sent: Wednesday, August 17, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: Re: 'Stable' 4.0 version

As far as I know, Solr's trunk is pretty stable, so you shoundl't have many 
problems with it if you test it correctly. Lucid's search platform is built 
upon the trunk ( 
http://www.lucidimagination.com/products/lucidworks-search-platform/enterprise
).
The one thing I would be concerned is with the index format. It might change in 
an incompatible way from one revision to the next one, so if rebuilding your 
index is complicated or takes too long this can be a problem.

If your version election is based on the geospatial stuff, why don't you use 
Solr 3.3 release? It already contains those features.

Tomás

On Wed, Aug 17, 2011 at 4:58 PM, Jaeger, Jay - DOT wrote:

> > geospatial requirements
>
> Looking at your email address, no surprise there.  8^)
>
> > What insight can you share (if any) regarding moving forward to a 
> > later
> nightly build?
>
> I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the
> time) during some testing, and it performed well -- but we were not 
> doing geospatial indexing with Solr.  Or are you referring to the 
> successor to Solr 3.3 at some future point in time (which I supposed 
> might also be called Solr 4 in the future -- won't that be confusing!)
>
> -Original Message-
> From: Herman Kiefus [mailto:herm...@angieslist.com]
> Sent: Wednesday, August 17, 2011 2:55 PM
> To: solr-user@lucene.apache.org
> Subject: 'Stable' 4.0 version
>
> My origination uses Solr 4 because of our geospatial requirements.  
> What insight can you share (if any) regarding moving forward to a 
> later nightly build?  Or, for those of you using 4.0 in a Production 
> setting, when is it that you move ahead?
>

solr keeps dying every few hours.

I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of
heap memory issues.  I started upping the min memory required, currently I
use -Xms3072M .
I insert about 50k docs an hour and I currently have about 65 million docs
with about 10 fields each. Is this already too much data for one box? How do
I know when I've reached the limit of this server? I have no idea how to
keep control of this issue.  Am I just supposed to keep upping the min ram
used for solr? How do I know what the accurate amount of ram I should be
using is? Must I keep adding more memory as the index size grows, I'd rather
the query be a little slower if I can use constant memory and have the
search read from disk.

Re: 'Stable' 4.0 version

2011-08-17 Thread Tomás Fernández Löbbe

As far as I know, Solr's trunk is pretty stable, so you shoundl't have many
problems with it if you test it correctly. Lucid's search platform is built
upon the trunk (
http://www.lucidimagination.com/products/lucidworks-search-platform/enterprise
).
The one thing I would be concerned is with the index format. It might change
in an incompatible way from one revision to the next one, so if rebuilding
your index is complicated or takes too long this can be a problem.

If your version election is based on the geospatial stuff, why don't you use
Solr 3.3 release? It already contains those features.

Tomás

On Wed, Aug 17, 2011 at 4:58 PM, Jaeger, Jay - DOT wrote:

> > geospatial requirements
>
> Looking at your email address, no surprise there.  8^)
>
> > What insight can you share (if any) regarding moving forward to a later
> nightly build?
>
> I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the
> time) during some testing, and it performed well -- but we were not doing
> geospatial indexing with Solr.  Or are you referring to the successor to
> Solr 3.3 at some future point in time (which I supposed might also be called
> Solr 4 in the future -- won't that be confusing!)
>
> -Original Message-
> From: Herman Kiefus [mailto:herm...@angieslist.com]
> Sent: Wednesday, August 17, 2011 2:55 PM
> To: solr-user@lucene.apache.org
> Subject: 'Stable' 4.0 version
>
> My origination uses Solr 4 because of our geospatial requirements.  What
> insight can you share (if any) regarding moving forward to a later nightly
> build?  Or, for those of you using 4.0 in a Production setting, when is it
> that you move ahead?
>

Re: Spell Checker

Config your xml properly, reload your core (or reload solr) then commit.
This spellchecker is configured to build on commit true. Everytime you commit something, it will
rebuild your dictionary based on the configuration you selected.

2011/8/17 naeluh 

> so I add spellcheck.build=true to solrconfig.xml  just anywhere and that
> will
> wrk?
>
> thks very much for your help
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262744.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Re: Spell Checker

No, if you are trying to build a suggester (what It seems to be) please read
the url I sent you.

You'll need to create the suggester itself 

and the url handler 

in your case, to work on that url, just rename it to 

2011/8/17 naeluh 

> so I add spellcheck.build=true to solrconfig.xml  just anywhere and that
> will
> wrk?
>
> thks very much for your help
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262744.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Re: suggester issues

I've been indexing and reindexing stuff here with Shingles. I don't believe
it's the best approach. Results are interesting, but I believe it's not what
the suggester is meant to be.

I tried















but I got compound words in the suggestion itself.

If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i
get



0
1




6
0
3

drivers
drivers nvidia
drivers intel
drivers nvidia geforce
drive
driver


drivers




but when i enter the second word,
http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n
it
scrambles everything



0
0




4
0
7

drivers
drivers nvidia
drivers intel
drivers nvidia geforce



10
8
9

nvidia
net
nvidia geforce
network
new
n
ninja


drivers nvidia




Although the collation seems fine for this, it's not exactly what suggester
is supposed to do.

Any thoughts?

2011/8/17 Alexei Martchenko 

> I have the very very very same problem. I could copy+paste your message as
> mine. I've discovered so far that bigger dictionaries work better for me,
> controlling threshold is much better than avoid indexing one or twio fields.
> Of course i'm still polishing this.
>
> At this very moment I was looking into Shingles, are you using them?
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
> How are your fields?
>
> 2011/8/17 Kuba Krzemień 
>
>> Hello, I am working on creating a auto-complete functionality for my
>> platform which indexes large ammounts of text (title + contents) - there is
>> too much data for a dictionary. I am using the latest version of Solr (3.3)
>> and I am trying to take advantage of the Suggester functionality.
>> Unfortunately so far the outcome isn't that great.
>>
>> The Suggester works only for single words or whole phrases (depends on the
>> tokenizer). When using the first option, I am unable to suggest any combined
>> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
>> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
>> worse, querying 'new AND y' gives the same results (also when using
>> collate), which means that the returned suggestion may give no results -
>> what makes sense separately often doesn't work combined. I need a way to
>> find only those suggestions, that will return results when doing a AND query
>> (for example 'new AND york', 'new AND year', as long as they give results
>> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>>
>> When I use the second tokenizer and the suggestions return phrases, for
>> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
>> nothing. Also, for 'y' I will get nothing, so the issue remains.
>>
>> If someone has some experience working with the Suggester, or if someone
>> has created a well working auto-suggester based on Solr, please help me.
>> I've been trying to find a sollution for this for quite some time.
>>
>> Yours sincerely,
>> Jackob K
>>
>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
> 5083.1018/5080.3535/5080.3533
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

RE: Solr Accent Insensitive and sensitive search

2011-08-17 Thread Michael Ryan

Are you using the same analyzer for both type="query" and type="index"? Can you 
show us the fieldType from your schema?

-Michael

Re: Error deploying solr.war into Jboss7

2011-08-17 Thread Erik Hatcher

For the record, I'm starting work now on moving the Velocity response writer 
back to a contrib module so these dependencies won't be embedded in the WAR 
file (after I make the commit this week some time, most likely).

Erik

On Aug 17, 2011, at 15:35 , Chris Hostetter wrote:

> 
> : Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException:
> : failed to process
> : 
> "/opt/jboss-as-web-7.0.0.Final/standalone/deployments/solr.war/WEB-INF/lib/velocity-tools-2.0.jar"
>   ...
> : Caused by: java.util.zip.ZipException: error in opening zip file
> :   at java.util.zip.ZipFile.open(Native Method) [:1.6.0_26]
> 
> That's weird.
> 
> : The only way I found that I could deploy the solr.war was to remove the
> : velocity-tools-2.0.jar from the solr.war/WEB-INF/lib directory.
> : I am not sure this is the right fix.  Should this be reported as a bug to
> : SOLR or to jBoss?
> : I have already posted this error to the jBoss community forum.
> 
> have you tried inspecting that velocity-tools-2.0.jar file with the "jar" 
> tool, or a regular zip program to see if there is infact a problem with 
> it?  did it get corrupted somehow?  does it match what you find if you 
> manually extract that file directly from the solr.war?
> 
> PS: please send replies to the solr-user@lucene mailing list instead of 
> general@lucene ... it is more appropriate for this type of question.
> 
> 
> 
> -Hoss

Solr Accent Insensitive and sensitive search

2011-08-17 Thread Denis WSRosa

Hi all!

I have configured my schema to use the solr.ASCIIFoldingFilterFactory
filter, this way I'm able to search a word like "ferias" and get "férias",
but when I try to search the exact word "férias" I got nothing as result.

Is there a way to configure both cases in the search?

Best Regards!

-- 
Denis Wilson Souza Rosa

Systems Architect
mobile: +55 11 8112 8284
email: deniswsr...@gmail.com / deniswsr...@hotmail.com

RE: 'Stable' 4.0 version

> geospatial requirements

Looking at your email address, no surprise there.  8^)

> What insight can you share (if any) regarding moving forward to a later 
> nightly build?  

I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the time) 
during some testing, and it performed well -- but we were not doing geospatial 
indexing with Solr.  Or are you referring to the successor to Solr 3.3 at some 
future point in time (which I supposed might also be called Solr 4 in the 
future -- won't that be confusing!)

-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com] 
Sent: Wednesday, August 17, 2011 2:55 PM
To: solr-user@lucene.apache.org
Subject: 'Stable' 4.0 version

My origination uses Solr 4 because of our geospatial requirements.  What 
insight can you share (if any) regarding moving forward to a later nightly 
build?  Or, for those of you using 4.0 in a Production setting, when is it that 
you move ahead?

Re: suggester issues

I have the very very very same problem. I could copy+paste your message as
mine. I've discovered so far that bigger dictionaries work better for me,
controlling threshold is much better than avoid indexing one or twio fields.
Of course i'm still polishing this.

At this very moment I was looking into Shingles, are you using them?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

How are your fields?

2011/8/17 Kuba Krzemień 

> Hello, I am working on creating a auto-complete functionality for my
> platform which indexes large ammounts of text (title + contents) - there is
> too much data for a dictionary. I am using the latest version of Solr (3.3)
> and I am trying to take advantage of the Suggester functionality.
> Unfortunately so far the outcome isn't that great.
>
> The Suggester works only for single words or whole phrases (depends on the
> tokenizer). When using the first option, I am unable to suggest any combined
> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
> worse, querying 'new AND y' gives the same results (also when using
> collate), which means that the returned suggestion may give no results -
> what makes sense separately often doesn't work combined. I need a way to
> find only those suggestions, that will return results when doing a AND query
> (for example 'new AND york', 'new AND year', as long as they give results
> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>
> When I use the second tokenizer and the suggestions return phrases, for
> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
> nothing. Also, for 'y' I will get nothing, so the issue remains.
>
> If someone has some experience working with the Suggester, or if someone
> has created a well working auto-suggester based on Solr, please help me.
> I've been trying to find a sollution for this for quite some time.
>
> Yours sincerely,
> Jackob K
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

'Stable' 4.0 version

My origination uses Solr 4 because of our geospatial requirements.  What 
insight can you share (if any) regarding moving forward to a later nightly 
build?  Or, for those of you using 4.0 in a Production setting, when is it that 
you move ahead?

Re: Spell Checker

2011-08-17 Thread naeluh

so I add spellcheck.build=true to solrconfig.xml  just anywhere and that will
wrk?

thks very much for your help

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262744.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spell Checker

Its not a file, it's a request handler. you add those in the solrconfig.xml

read here plz http://wiki.apache.org/solr/Suggester

2011/8/17 naeluh 

> Hi Dan,
>
> I saw this command -
>
>
> http://localhost:8983/solr/spell?q=ANYTHINGHERE&spellcheck=true&spellcheck.collate=true&spellcheck.build=true
>
> I tried to issue it and got  404 error that I did not have the path
> /solr/spell
> Should I add this file and what type of file is it.
>
> I got to via he post on Drupal - http://drupal.org/node/975132
>
> thanks !
>
> Nick
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262684.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

RE: Spell Checker

2011-08-17 Thread naeluh

Hi Dan, 

I saw this command -

http://localhost:8983/solr/spell?q=ANYTHINGHERE&spellcheck=true&spellcheck.collate=true&spellcheck.build=true

I tried to issue it and got  404 error that I did not have the path
/solr/spell 
Should I add this file and what type of file is it. 

I got to via he post on Drupal - http://drupal.org/node/975132

thanks !

Nick

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spell-Checker-tp1914336p3262684.html
Sent from the Solr - User mailing list archive at Nabble.com.

suggester issues

2011-08-17 Thread Kuba Krzemień

Hello, I am working on creating a auto-complete functionality for my platform 
which indexes large ammounts of text (title + contents) - there is too much 
data for a dictionary. I am using the latest version of Solr (3.3) and I am 
trying to take advantage of the Suggester functionality. Unfortunately so far 
the outcome isn't that great. 

The Suggester works only for single words or whole phrases (depends on the 
tokenizer). When using the first option, I am unable to suggest any combined 
queries. For example the suggestion for 'ne' will be 'new'. Suggestion for 'new 
y' will be two separate lists, one for 'new' and one for 'y'. Whats worse, 
querying 'new AND y' gives the same results (also when using collate), which 
means that the returned suggestion may give no results - what makes sense 
separately often doesn't work combined. I need a way to find only those 
suggestions, that will return results when doing a AND query (for example 'new 
AND york', 'new AND year', as long as they give results upon querying - 'new 
AND yeti' shouldn't be returned as a suggestion). 

When I use the second tokenizer and the suggestions return phrases, for 'ne' I 
will get 'new york' and 'new year', but for 'new y' I will get nothing. Also, 
for 'y' I will get nothing, so the issue remains. 

If someone has some experience working with the Suggester, or if someone has 
created a well working auto-suggester based on Solr, please help me. I've been 
trying to find a sollution for this for quite some time.

Yours sincerely,
Jackob K

RE: Return records based on aggregate functions?

Yes:

solrquery.add("group.main", true);
solrquery.add("group.format", "simple");

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Daniel Skiles [mailto:daniel.ski...@docfinity.com] 
Sent: Wednesday, August 17, 2011 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Return records based on aggregate functions?

For response option 1, would I add the group.main=true and
group.format=simple parameters to the SolrQuery object?

On Wed, Aug 17, 2011 at 3:09 PM, Dyer, James wrote:

> For the request end, you can just use something like:
>
> solrquery.add("group", true);
> ..etc..
>
> For the response, you have 3 options:
>
> 1. specify "group.main=true&group.format=simple" .  (note: When I tested
> this on a nightly build from back in February I noticed a significant
> performance impact from using these params although I imagine the version
> that is committed to 3.3 does not have this problem.)
>
> This will return your 1-document-per-group as if it is a regular
> non-grouped query and the response will come back just like any other query.
> (see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr
>  and the javadocs: 
> http://lucene.apache.org/solr/api/overview-summary.htmlthen scroll to the 
> solrj section.)
>
> 2. Full SolrJ support was just added to the 3.x branch so you'll have to
> use a nightly build (which ought to be stable & production-quality).  See
> https://issues.apache.org/jira/browse/SOLR-2637 for more information.
>  After building the solrj documentation, look for classes that start with
> "Group"
>
> 3. See this posting on how to parse the response "by-hand".  This is for a
> slightly older version of Field Collapsing than what was committed so it
> might not be 100% accurate.
> http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Return records based on aggregate functions?
>
> Woah.  That looks like exactly what I need.  Thanks you very much.  Is
> there
> any documentation for how to do that using the SolrJ API?
>
> On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James  >wrote:
>
> > Daniel,
> >
> > This looks like a good usecase for FieldCollapsing (see
> > http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something
> like:
> >
> > &group=true&group.field=documentId&group.limit=1&group.sort=version desc
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -Original Message-
> > From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
> > Sent: Wednesday, August 17, 2011 1:20 PM
> > To: solr-user@lucene.apache.org
> > Subject: Return records based on aggregate functions?
> >
> > I've recently started using Solr and I'm stumped by a problem I'm
> currently
> > encountering.  Given that I can't really find anything close to what I'm
> > trying to do on Google or the mailing lists, I figured I'd ask if anyone
> > here had suggestions on how to do it.
> >
> > I currently have a schema that looks more or less like this:
> >
> > uniqueId (string) -- Unique identifier for a record
> > documentId (string) -- Id of document represented by this record
> > contents (string) -- contents of file represented by this record
> > version (float) -- Numeric representation of the version of this document
> >
> >
> > What I'd like to do is submit a query to the server that returns records
> > that match against contents, but only if the record has a version field
> > that
> > is the largest value for all records that share the same documentId.
> >
> > In other words, I'd like to be able to only search the most recent
> version
> > of a document in some scenarios.
> >
> > Is this possible with Solr?  I'm at an early enough phase that I'm also
> > able
> > to modify my solr schema if necessary.
> >
> > Thank you,
> > Daniel
> >
>

Re: ANTLR SOLR query/filter parser


: I'm looking for an ANTLR parser that consumes solr queries and filters.  
: Before I write my own, thought I'd ask if anyone has one they are 
: willing to share or can point me to one?

I'm pretty sure that this will be imposisble to do in the general case -- 
arbitrary QParser instances (that support arbitrary syntax) can be 
registered in the solrconfig.xml and specified using either localparams or 
defType.  so even if you did write a parser that understood all of the
rules of all of hte default QParsers, and even if you made your parser 
smart enough to know how to look at other params (ie: defType, or 
variable substitution of "type") to understand which subset of parse rules 
to use, that still might give false positives or false failures if hte 
user registered their own QParser using a new name (or changed the 
names used in registrating existing parsers)

The main question i have is: why are you looking for an ANTLR paser to do 
this?  what is your goal?

https://people.apache.org/~hossman/#xyproblem
Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss

Re: Return records based on aggregate functions?

2011-08-17 Thread Daniel Skiles

For response option 1, would I add the group.main=true and
group.format=simple parameters to the SolrQuery object?

On Wed, Aug 17, 2011 at 3:09 PM, Dyer, James wrote:

> For the request end, you can just use something like:
>
> solrquery.add("group", true);
> ..etc..
>
> For the response, you have 3 options:
>
> 1. specify "group.main=true&group.format=simple" .  (note: When I tested
> this on a nightly build from back in February I noticed a significant
> performance impact from using these params although I imagine the version
> that is committed to 3.3 does not have this problem.)
>
> This will return your 1-document-per-group as if it is a regular
> non-grouped query and the response will come back just like any other query.
> (see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr
>  and the javadocs: 
> http://lucene.apache.org/solr/api/overview-summary.htmlthen scroll to the 
> solrj section.)
>
> 2. Full SolrJ support was just added to the 3.x branch so you'll have to
> use a nightly build (which ought to be stable & production-quality).  See
> https://issues.apache.org/jira/browse/SOLR-2637 for more information.
>  After building the solrj documentation, look for classes that start with
> "Group"
>
> 3. See this posting on how to parse the response "by-hand".  This is for a
> slightly older version of Field Collapsing than what was committed so it
> might not be 100% accurate.
> http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Return records based on aggregate functions?
>
> Woah.  That looks like exactly what I need.  Thanks you very much.  Is
> there
> any documentation for how to do that using the SolrJ API?
>
> On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James  >wrote:
>
> > Daniel,
> >
> > This looks like a good usecase for FieldCollapsing (see
> > http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something
> like:
> >
> > &group=true&group.field=documentId&group.limit=1&group.sort=version desc
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -Original Message-
> > From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
> > Sent: Wednesday, August 17, 2011 1:20 PM
> > To: solr-user@lucene.apache.org
> > Subject: Return records based on aggregate functions?
> >
> > I've recently started using Solr and I'm stumped by a problem I'm
> currently
> > encountering.  Given that I can't really find anything close to what I'm
> > trying to do on Google or the mailing lists, I figured I'd ask if anyone
> > here had suggestions on how to do it.
> >
> > I currently have a schema that looks more or less like this:
> >
> > uniqueId (string) -- Unique identifier for a record
> > documentId (string) -- Id of document represented by this record
> > contents (string) -- contents of file represented by this record
> > version (float) -- Numeric representation of the version of this document
> >
> >
> > What I'd like to do is submit a query to the server that returns records
> > that match against contents, but only if the record has a version field
> > that
> > is the largest value for all records that share the same documentId.
> >
> > In other words, I'd like to be able to only search the most recent
> version
> > of a document in some scenarios.
> >
> > Is this possible with Solr?  I'm at an early enough phase that I'm also
> > able
> > to modify my solr schema if necessary.
> >
> > Thank you,
> > Daniel
> >
>

RE: Most current tik jar files that work with Solr 1.4.1

> What is the latest version of Tika that I can use with Solr 1.4.1?  it 
> comes packaged with 0.4.  I tried 0.8 and it no workie.

When I was testing Tika last year, I used Solr build 1271 to get the most 
recent Tika I could get my hands on at the time.  That was before Solr 3.1, so 
I expect it was 1.4.1 - I downloaded it 4 days after I downloaded 1.4.1 on 
10/8/2010.   A look inside confirms that was Tika 0.8.

Although it was a nightly build, we found it perfectly stable in operation for 
what we were doing.

You might take the same approach.

RE: Return records based on aggregate functions?

For the request end, you can just use something like:

solrquery.add("group", true);
..etc..

For the response, you have 3 options:

1. specify "group.main=true&group.format=simple" .  (note: When I tested this 
on a nightly build from back in February I noticed a significant performance 
impact from using these params although I imagine the version that is committed 
to 3.3 does not have this problem.)

This will return your 1-document-per-group as if it is a regular non-grouped 
query and the response will come back just like any other query.  
(see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr 
 and the javadocs: http://lucene.apache.org/solr/api/overview-summary.html then 
scroll to the solrj section.) 

2. Full SolrJ support was just added to the 3.x branch so you'll have to use a 
nightly build (which ought to be stable & production-quality).  See 
https://issues.apache.org/jira/browse/SOLR-2637 for more information.  After 
building the solrj documentation, look for classes that start with "Group"

3. See this posting on how to parse the response "by-hand".  This is for a 
slightly older version of Field Collapsing than what was committed so it might 
not be 100% accurate.  
http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Daniel Skiles [mailto:daniel.ski...@docfinity.com] 
Sent: Wednesday, August 17, 2011 1:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Return records based on aggregate functions?

Woah.  That looks like exactly what I need.  Thanks you very much.  Is there
any documentation for how to do that using the SolrJ API?

On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James wrote:

> Daniel,
>
> This looks like a good usecase for FieldCollapsing (see
> http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something like:
>
> &group=true&group.field=documentId&group.limit=1&group.sort=version desc
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:20 PM
> To: solr-user@lucene.apache.org
> Subject: Return records based on aggregate functions?
>
> I've recently started using Solr and I'm stumped by a problem I'm currently
> encountering.  Given that I can't really find anything close to what I'm
> trying to do on Google or the mailing lists, I figured I'd ask if anyone
> here had suggestions on how to do it.
>
> I currently have a schema that looks more or less like this:
>
> uniqueId (string) -- Unique identifier for a record
> documentId (string) -- Id of document represented by this record
> contents (string) -- contents of file represented by this record
> version (float) -- Numeric representation of the version of this document
>
>
> What I'd like to do is submit a query to the server that returns records
> that match against contents, but only if the record has a version field
> that
> is the largest value for all records that share the same documentId.
>
> In other words, I'd like to be able to only search the most recent version
> of a document in some scenarios.
>
> Is this possible with Solr?  I'm at an early enough phase that I'm also
> able
> to modify my solr schema if necessary.
>
> Thank you,
> Daniel
>

Re: Random + Boost?


This is the type of problem that's fun to think about...

: As for RandomSortField + function queries... I'm not sure I understand how I
: can use that to achieve what I need :-/

the RandomSortField was designed for simple sorting, ie...

  sort=random_1234 desc

...but it can also be used as the input to a function, and (as of 
recently) you cna sort on functions. so you could do something like...

  sort=product(price,random_3245) desc

  (https://wiki.apache.org/solr/FunctionQuery)

...which would cause the documents to be semi-randomly sorted, with higher 
priced products skewed to be more likely to appera higher up in the 
results.

So in your case, with your classifications of documents (A, B, C, etc...) 
if you can index a numeric value with each document indicating how much 
you want to "bias" the sort in favor of documents of that type (ie: the 
percentages you mentioned) you could use them that way.

but thta would require you to index those biases in advance.

another strategy you could use is to take advantage of the "map" function 
... assign simple numeic ids to each of your classificaitons (A=1, B=2, 
etc..) and index tose numberic ids as some field "code", and then at query 
time you can use the map function to translate them to your bias values...

  sort=product(map(map(map(code,1,1,50),2,2,30),3,3,40),random_3245) desc

...that would give "A" docs a bias of 50, "B" docs a bias of "30", "C" 
docs a bias of "40, etc...

With Solr 4.x, there will also be functions that let you get the docfreq 
of a term in the index, so you could use inverse functions to make the 
bias multipliers driven directly by how common a doc class is used ... but 
based on your description, it sounds like you want this to be more user 
driven anyway...

: > > was thinking I would essentially "boost" types B, C, D, E,
: > > F until all types
: > > are approximately evenly represented in the random
: > > assortment. (Or
: > > alternatively, if the user has an affinity for type B
: > > documents, further
: > > boost type B documents so that they're more likely to be
: > > represented than
: > > other types).

-Hoss

Most current tik jar files that work with Solr 1.4.1

2011-08-17 Thread Tod

What is the latest version of Tika that I can use with Solr 1.4.1?  it 
comes packaged with 0.4.  I tried 0.8 and it no workie.

Re: Terms + Query?

2011-08-17 Thread Darren Govoni


Thanks. I will try that.

For future feature, it would be good to do filter queries on /terms if 
the Solr gods are listening! Hehe.



On 08/17/2011 02:25 PM, Tomás Fernández Löbbe wrote:

I think not, but if what you could get a similar result by using faceting on
the field and set the parameter facet.mincount=1. It will be slower than the
TermsComponent.

On Wed, Aug 17, 2011 at 1:19 PM, Darren Govoni  wrote:


Hi,
  Is it possible to restrict the /terms component output to the results of a
query?

thanks,
Darren

Re: Return records based on aggregate functions?

2011-08-17 Thread Daniel Skiles

Woah.  That looks like exactly what I need.  Thanks you very much.  Is there
any documentation for how to do that using the SolrJ API?

On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James wrote:

> Daniel,
>
> This looks like a good usecase for FieldCollapsing (see
> http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something like:
>
> &group=true&group.field=documentId&group.limit=1&group.sort=version desc
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:20 PM
> To: solr-user@lucene.apache.org
> Subject: Return records based on aggregate functions?
>
> I've recently started using Solr and I'm stumped by a problem I'm currently
> encountering.  Given that I can't really find anything close to what I'm
> trying to do on Google or the mailing lists, I figured I'd ask if anyone
> here had suggestions on how to do it.
>
> I currently have a schema that looks more or less like this:
>
> uniqueId (string) -- Unique identifier for a record
> documentId (string) -- Id of document represented by this record
> contents (string) -- contents of file represented by this record
> version (float) -- Numeric representation of the version of this document
>
>
> What I'd like to do is submit a query to the server that returns records
> that match against contents, but only if the record has a version field
> that
> is the largest value for all records that share the same documentId.
>
> In other words, I'd like to be able to only search the most recent version
> of a document in some scenarios.
>
> Is this possible with Solr?  I'm at an early enough phase that I'm also
> able
> to modify my solr schema if necessary.
>
> Thank you,
> Daniel
>

RE: Return records based on aggregate functions?

Daniel,

This looks like a good usecase for FieldCollapsing (see 
http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something like:

&group=true&group.field=documentId&group.limit=1&group.sort=version desc

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Daniel Skiles [mailto:daniel.ski...@docfinity.com] 
Sent: Wednesday, August 17, 2011 1:20 PM
To: solr-user@lucene.apache.org
Subject: Return records based on aggregate functions?

I've recently started using Solr and I'm stumped by a problem I'm currently
encountering.  Given that I can't really find anything close to what I'm
trying to do on Google or the mailing lists, I figured I'd ask if anyone
here had suggestions on how to do it.

I currently have a schema that looks more or less like this:

uniqueId (string) -- Unique identifier for a record
documentId (string) -- Id of document represented by this record
contents (string) -- contents of file represented by this record
version (float) -- Numeric representation of the version of this document


What I'd like to do is submit a query to the server that returns records
that match against contents, but only if the record has a version field that
is the largest value for all records that share the same documentId.

In other words, I'd like to be able to only search the most recent version
of a document in some scenarios.

Is this possible with Solr?  I'm at an early enough phase that I'm also able
to modify my solr schema if necessary.

Thank you,
Daniel

Re: Terms + Query?

2011-08-17 Thread Tomás Fernández Löbbe

I think not, but if what you could get a similar result by using faceting on
the field and set the parameter facet.mincount=1. It will be slower than the
TermsComponent.

On Wed, Aug 17, 2011 at 1:19 PM, Darren Govoni  wrote:

> Hi,
>  Is it possible to restrict the /terms component output to the results of a
> query?
>
> thanks,
> Darren
>

Return records based on aggregate functions?

2011-08-17 Thread Daniel Skiles

I've recently started using Solr and I'm stumped by a problem I'm currently
encountering.  Given that I can't really find anything close to what I'm
trying to do on Google or the mailing lists, I figured I'd ask if anyone
here had suggestions on how to do it.

I currently have a schema that looks more or less like this:

uniqueId (string) -- Unique identifier for a record
documentId (string) -- Id of document represented by this record
contents (string) -- contents of file represented by this record
version (float) -- Numeric representation of the version of this document


What I'd like to do is submit a query to the server that returns records
that match against contents, but only if the record has a version field that
is the largest value for all records that share the same documentId.

In other words, I'd like to be able to only search the most recent version
of a document in some scenarios.

Is this possible with Solr?  I'm at an early enough phase that I'm also able
to modify my solr schema if necessary.

Thank you,
Daniel

RE: SolR : eDismax does not always use the defaultOperator "AND"

Valentin,

There is currently an open issue about this:  
https://issues.apache.org/jira/browse/SOLR-2649 .  I ran into this also and 
ended up telling all of the application developers to always insert "AND" 
between every user keywords.  I was using an older version of edismax but the 
person who opened the issue reported this for 3.3.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Valentin [mailto:igorlacro...@gmail.com] 
Sent: Wednesday, August 17, 2011 9:05 AM
To: solr-user@lucene.apache.org
Subject: Re: SolR : eDismax does not always use the defaultOperator "AND"

I had put mm at 4<75%, so if i understand it well, for 4 words or less, it
has to match all the words.

In my tests, i did it with 3 words, so i don't understand the results...


thanks for your answer,

Valentin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-eDismax-does-not-always-use-the-defaultOperator-AND-tp3261500p3261703.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr UIMA integration problem

2011-08-17 Thread solr nps

Thanks for the reply.

I changed it to 'coveredText' as you recommended but that did not help, I
got the same error message.

solrconfig.xml now looks like the following


  
org.apache.uima.alchemy.ts.concept.ConceptFS

  coveredText
  concept

  
  
org.apache.uima.SentenceAnnotation

  coveredText
  sentence

  


Let me know if you need anything more from me for debugging.

Thanks


On Wed, Aug 17, 2011 at 5:12 AM, Tommaso Teofili
wrote:

> At a first glance I think the problem is in the 'feature' element which is
> set to 'title'.
> The 'feature' element should contain a UIMA Feature of the type defined in
> element 'type'; for example for SentenceAnnotation [1] defined in HMM
> Tagger
> has 'only' the default features of a UIMA Annotation: begin, end and
> coveredText.
> So I think you should change the 'feature' elements' values to
> 'coveredText'
> which contains the text covered by the specified UIMA annotation.
> Hope this helps,
> Tommaso
>
>
> [1] :
>
> http://svn.apache.org/repos/asf/uima/addons/trunk/Tagger/src/main/java/org/apache/uima/SentenceAnnotation.java
>
> 2011/8/17 solr nps 
>
> > Hello,
> >
> > I am using Solr 3.3. I have been following instructions at
> >
> >
> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt
> >
> > My setup looks like the following.
> >
> > solr lib directory contains the following jars
> >
> > apache-solr-uima-3.3.0.jar
> > commons-digester-2.0.jar
> > uima-an-alchemy-2.3.1-SNAPSHOT-r1062868.jar
> > uima-an-calais-2.3.1-SNAPSHOT-r1062868.jar
> > uima-an-tagger-2.3.1-SNAPSHOT-r1062868.jar
> > uima-an-wst-2.3.1-SNAPSHOT-r1076132.jar
> > uimaj-core-2.3.1.jar
> >
> >
> > solr_config.xml has the following changes.
> >
> >  
> > > class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
> >  
> >
> >  MY_KEY
> >  MY_KEY
> >  MY_KEY
> >  MY_KEY
> >  MY_KEY
> >  MY_SECOND_KEY
> >
> > >
> >
> name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml
> >false
> >title
> >
> >  false
> >*  *
> > *title*
> > *  *
> >
> >
> >  
> > > name="name">org.apache.uima.alchemy.ts.concept.ConceptFS
> >
> >  *title*
> > *  concept*
> >
> >  
> >  
> >org.apache.uima.SentenceAnnotation
> >
> > * title*
> > *  sentence*
> >
> >  
> >
> >  
> >
> >
> > and
> >
> > 
> >
> >  uima
> >
> >  
> >
> > I am trying to index a simple document which looks like the following
> >
> > 
> > 
> > 1456780001
> > Canon powershow camera 9000
> > 
> > 
> >
> >
> > I am using curl to post this document on the /update end point and I am
> > getting the following error
> >
> > *org.apache.solr.common.SolrException: processing error: null.*
> title=Canon
> > powershow camera 9000,  text="Canon powershow camera 9000..."
> > at
> >
> >
> org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:107)
> > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
> > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
> > at
> >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> > at
> >
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> > at
> >
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
> > at
> >
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> > at
> >
> >
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
> > at
> >
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
> > at
> >
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
> > at
> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
> > at
> >
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java

magento-solr integration

2011-08-17 Thread Harsha Vardhan Muthyala

Hi,
 
Not sure if this is right forum to ask a question regarding installation.
But the admin page on solr screen pointed to this address.
 
We have followed the installation procedure mentioned at 
http://www.summasolutions.net/blogposts/magento-apache-solr-set.
 
The solr server seems to work fine on its own but the “Test
Connection” functionality fails without any log output. Any help you could
provide that would help debug this issue is greatly appreciated.
 
Thanks,
Harsha.

Re: SolR : eDismax does not always use the defaultOperator "AND"

2011-08-17 Thread Valentin

I had put mm at 4<75%, so if i understand it well, for 4 words or less, it
has to match all the words.

In my tests, i did it with 3 words, so i don't understand the results...


thanks for your answer,

Valentin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-eDismax-does-not-always-use-the-defaultOperator-AND-tp3261500p3261703.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceted Search Patent Lawsuit

Hi,

Sorry for the top-quote: On a
mobile.

A discussion on the evils of
patents aside, shirley library
catalogues are prior art. I
remember such systems giving
lists of matches by category, if
maybe not counts.

Will look at the patent applications, but sheesh,
what a waste of time and resources.

Regards,
Gora
On 17-Aug-2011 8:36 PM, "Walter Underwood"  wrote:
> I have no plan to look at the patents, but there is some serious prior art
in faceted search. First, faceted classification for libraries was invented
by S. R. Ranganathan in 1933. Computer search for libraries dates from the
1960's, probably. Combining the two is obvious, even back then.
>
> wunder
>
> On Aug 17, 2011, at 7:55 AM, Matt Shields wrote:
>
>> On Wed, Aug 17, 2011 at 8:51 AM, LaMaze Johnson  wrote:
>>
>>>
>>> Paul Libbrecht-4 wrote:

 Robert,

 I believe, precisely, the objective of such a thread is to be helped by
 knowledgeable techies into being able to do what you say.

 If Johnson gave only 3 lines of details, such as claimed patent URLs or
 dates, we might easily be able to tell him the pointer of a publication
 that would discourage such patent-trolling.

 paul

>>>
>>> I'm sorry. I assumed the information I provided would lead any
resourceful
>>> techie to the details. At any rate here is some additional information:
>>>
>>> Patent Claim:
>>>
>>>
http://www.google.com/patents?id=oFwIEBAJ&printsec=frontcover&dq=6275821&hl=en&ei=57ZLTs7jHs3HsQKWvsjcCA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCkQ6AEwAA
>>>
>>> Background:
>>> http://www.ndcalblog.com/2011/07/ebay-prevails-in-limiting-patent.html
>>>
>>> Ebay/Microsoft suit:
>>>
>>>
http://www.whda.com/blog/wp-content/uploads/2011/05/2011.5.12-MO-Ebay-v.-Parts-River.pdf
>>>
>>> Adobe suit:
>>> http://news.priorsmart.com/adobe-systems-v-kelora-systems-l4in/
>>>
>>> Consequently, they have been known to act on these threats. I don't
think
>>> it would be prudent to ignore them. At any rate, lawyers will be
involved
>>> and they aren't cheap. Until the suits have been played out with the
likes
>>> of eBay, Microsoft, and Adobe, potentially anyone who uses faceted
search
>>> systems could potentially be at risk.
>>>
>>> Take it for what it's worth. Don't shoot the messenger.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
http://lucene.472066.n3.nabble.com/Faceted-Search-Patent-Lawsuit-Please-Read-tp3259475p3261514.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>> I, for one, am grateful for you posting this information. Thank you
>>
>> Matthew Shields
>> Owner
>> BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
>> Managed Services
>> www.beantownhost.com
>> www.sysadminvalley.com
>> www.jeeprally.com
>

Terms + Query?

2011-08-17 Thread Darren Govoni


Hi,
  Is it possible to restrict the /terms component output to the results 
of a query?


thanks,
Darren

Re: is it possible to remove response header from the JSON fromat?

2011-08-17 Thread Erik Hatcher

What do you mean you don't want to "display" it?   Generally you'd just 
navigate to solr_response['response'] to ignore the header and just deal with 
the main body.

But, there is an omitHeader parameter - 


Erik


On Aug 17, 2011, at 11:19 , nagarjuna wrote:

> Hi every body
>  i have the following response fromat
> 
>{
>  "responseHeader":{
>"status":0,
>"QTime":47,
>"params":{
>  "fl":"keywords",
>  "indent":"on",
>  "start":"0",
>  "q":"test",
>  "version":"2.2",
>  "rows":"30"}},
>  "response":{"numFound":9,"start":0,"docs":[
>{
>"keywords":"test"},
>  {
>"keywords":"test"},
>  {
>"keywords":"Test"},
>  {
>"keywords":"Test"},
>  {
>"keywords":"Test"},
>{
>"keywords":"test"},
>  {
>"keywords":"testing"},
>  {
>"keywords":"testing"},
>  {
>"keywords":"test iphone android"}]
>  }}
> 
> from the above response fromat i need to remove the responseHeader...
> i.e  {
>  "responseHeader":{
>"status":0,
>"QTime":47,
>"params":{
>  "fl":"keywords",
>  "indent":"on",
>  "start":"0",
>  "q":"test",
>  "version":"2.2",
>  "rows":"30"}},
> the above part i dont want to display ...please help me to do this
> 
> Thanks in advance
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/is-it-possible-to-remove-response-header-from-the-JSON-fromat-tp3261957p3261957.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceted Search Patent Lawsuit

2011-08-17 Thread Darren Govoni

patent rights only last 17 years then it is public domain.

On 08/17/2011 11:05 AM, Walter Underwood wrote:

I have no plan to look at the patents, but there is some serious prior art in
faceted search. First, faceted classification for libraries was invented by S.
R. Ranganathan in 1933. Computer search for libraries dates from the 1960's,
probably. Combining the two is obvious, even back then.

wunder

On Aug 17, 2011, at 7:55 AM, Matt Shields wrote:

On Wed, Aug 17, 2011 at 8:51 AM, LaMaze Johnson wrote:

Paul Libbrecht-4 wrote:

Robert,

I believe, precisely, the objective of such a thread is to be helped by
knowledgeable techies into being able to do what you say.

If Johnson gave only 3 lines of details, such as claimed patent URLs or
dates, we might easily be able to tell him the pointer of a publication
that would discourage such patent-trolling.

paul

I'm sorry. I assumed the information I provided would lead any resourceful
techie to the details. At any rate here is some additional information:

Patent Claim:

http://www.google.com/patents?id=oFwIEBAJ&printsec=frontcover&dq=6275821&hl=en&ei=57ZLTs7jHs3HsQKWvsjcCA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCkQ6AEwAA

Background:
http://www.ndcalblog.com/2011/07/ebay-prevails-in-limiting-patent.html

Ebay/Microsoft suit:

http://www.whda.com/blog/wp-content/uploads/2011/05/2011.5.12-MO-Ebay-v.-Parts-River.pdf

Adobe suit:
http://news.priorsmart.com/adobe-systems-v-kelora-systems-l4in/

Consequently, they have been known to act on these threats. I don't think
it would be prudent to ignore them. At any rate, lawyers will be involved
and they aren't cheap. Until the suits have been played out with the likes
of eBay, Microsoft, and Adobe, potentially anyone who uses faceted search
systems could potentially be at risk.

Take it for what it's worth. Don't shoot the messenger.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Faceted-Search-Patent-Lawsuit-Please-Read-tp3259475p3261514.html
Sent from the Solr - User mailing list archive at Nabble.com.

I, for one, am grateful for you posting this information. Thank you

Matthew Shields
Owner
BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
Managed Services
www.beantownhost.com
www.sysadminvalley.com
www.jeeprally.com

Re: Faceted Search Patent Lawsuit

2011-08-17 Thread Walter Underwood

I have no plan to look at the patents, but there is some serious prior art in 
faceted search. First, faceted classification for libraries was invented by S. 
R. Ranganathan in 1933. Computer search for libraries dates from the 1960's, 
probably. Combining the two is obvious, even back then.

wunder

On Aug 17, 2011, at 7:55 AM, Matt Shields wrote:

> On Wed, Aug 17, 2011 at 8:51 AM, LaMaze Johnson  wrote:
> 
>> 
>> Paul Libbrecht-4 wrote:
>>> 
>>> Robert,
>>> 
>>> I believe, precisely, the objective of such a thread is to be helped by
>>> knowledgeable techies into being able to do what you say.
>>> 
>>> If Johnson gave only 3 lines of details, such as claimed patent URLs or
>>> dates, we might easily be able to tell him the pointer of a publication
>>> that would discourage such patent-trolling.
>>> 
>>> paul
>>> 
>> 
>> I'm sorry. I assumed the information I provided would lead any resourceful
>> techie to the details.  At any rate here is some additional information:
>> 
>> Patent Claim:
>> 
>> http://www.google.com/patents?id=oFwIEBAJ&printsec=frontcover&dq=6275821&hl=en&ei=57ZLTs7jHs3HsQKWvsjcCA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCkQ6AEwAA
>> 
>> Background:
>> http://www.ndcalblog.com/2011/07/ebay-prevails-in-limiting-patent.html
>> 
>> Ebay/Microsoft suit:
>> 
>> http://www.whda.com/blog/wp-content/uploads/2011/05/2011.5.12-MO-Ebay-v.-Parts-River.pdf
>> 
>> Adobe suit:
>> http://news.priorsmart.com/adobe-systems-v-kelora-systems-l4in/
>> 
>> Consequently, they have been known to act on these threats.  I don't think
>> it would be prudent to ignore them.  At any rate, lawyers will be involved
>> and they aren't cheap.  Until the suits have been played out with the likes
>> of eBay, Microsoft, and Adobe, potentially anyone who uses faceted search
>> systems could potentially be at risk.
>> 
>> Take it for what it's worth.  Don't shoot the messenger.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Faceted-Search-Patent-Lawsuit-Please-Read-tp3259475p3261514.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> I, for one, am grateful for you posting this information.  Thank you
> 
> Matthew Shields
> Owner
> BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
> Managed Services
> www.beantownhost.com
> www.sysadminvalley.com
> www.jeeprally.com

Re: SIREn with Solr

2011-08-17 Thread Stéphane Campinas


On 17/08/11 11:53, marotosg wrote:

Anyone has any experience with this plugin?.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SIREn-with-Solr-tp3261260p3261260.html
Sent from the Solr - User mailing list archive at Nabble.com.

Hi,

What would you like to know ?
I am working on it with the main developer, Renaud Delbru [1]

Best,

[1] renaud.del...@deri.org

--
Campinas Stéphane

Re: Faceted Search Patent Lawsuit

2011-08-17 Thread Matt Shields

On Wed, Aug 17, 2011 at 8:51 AM, LaMaze Johnson  wrote:

>
> Paul Libbrecht-4 wrote:
> >
> > Robert,
> >
> > I believe, precisely, the objective of such a thread is to be helped by
> > knowledgeable techies into being able to do what you say.
> >
> > If Johnson gave only 3 lines of details, such as claimed patent URLs or
> > dates, we might easily be able to tell him the pointer of a publication
> > that would discourage such patent-trolling.
> >
> > paul
> >
>
> I'm sorry. I assumed the information I provided would lead any resourceful
> techie to the details.  At any rate here is some additional information:
>
> Patent Claim:
>
> http://www.google.com/patents?id=oFwIEBAJ&printsec=frontcover&dq=6275821&hl=en&ei=57ZLTs7jHs3HsQKWvsjcCA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCkQ6AEwAA
>
> Background:
> http://www.ndcalblog.com/2011/07/ebay-prevails-in-limiting-patent.html
>
> Ebay/Microsoft suit:
>
> http://www.whda.com/blog/wp-content/uploads/2011/05/2011.5.12-MO-Ebay-v.-Parts-River.pdf
>
> Adobe suit:
> http://news.priorsmart.com/adobe-systems-v-kelora-systems-l4in/
>
> Consequently, they have been known to act on these threats.  I don't think
> it would be prudent to ignore them.  At any rate, lawyers will be involved
> and they aren't cheap.  Until the suits have been played out with the likes
> of eBay, Microsoft, and Adobe, potentially anyone who uses faceted search
> systems could potentially be at risk.
>
> Take it for what it's worth.  Don't shoot the messenger.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Faceted-Search-Patent-Lawsuit-Please-Read-tp3259475p3261514.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

I, for one, am grateful for you posting this information.  Thank you

Matthew Shields
Owner
BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
Managed Services
www.beantownhost.com
www.sysadminvalley.com
www.jeeprally.com

RE: Exact matching on names?

2011-08-17 Thread Olson, Ron

Thank you Sujit and Rob for your help; I took the "easy" way and created a new 
field type that is identical to text, but with the stemmer removed. This seems, 
so far, to work exactly as needed.

To help anyone else who comes across this issue, this is the field type I used:

-Original Message-
From: Sujit Pal [mailto:sujit@comcast.net]
Sent: Tuesday, August 16, 2011 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching on names?

Hi Ron,

There was a discussion about this some time back, which I implemented
(with great success btw) in my own code...basically you store both the
analyzed and non-analyzed versions (use string type) in the index, then
send in a query like this:

+name:clarke name_s:"clarke"^100

The name field is text so it will analyze down "clarke" to "clark" but
it will match both "clark" and "clarke" and the second clause would
boost the entry with "clarke" up to the top, which you then select with
rows=1.

-sujit

On Tue, 2011-08-16 at 10:20 -0500, Olson, Ron wrote:
> Hi all-
>
> I'm missing something fundamental yet I've been unable to find the definitive 
> answer for exact name matching. I'm indexing names using the standard "text" 
> field type and my search is for the name "clarke". My results include 
> "clark", which is incorrect, it needs to match clarke exactly (case 
> insensitive).
>
> I tried textType but that doesn't work because I believe it needs to be 
> *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, 
> and clark", etc.
>
> Thanks for any help,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.

Re: Solr 1.4.1 vs 3.3 (Speed)

I'm doing the exact same migration... what I've accomplished so far

   1. In solrconfig.xml i
   put LUCENE_33 in the first line in
   the  branch. Warnings go like crazy if you don't do that.
   2. Highlighter shows a deprecated warning, i'm still working on that. It
   works, but I'd like to use the new fastvectorhighlight wich i'm strugglin'
   to death right now
   3. All my speed measures are doing exact the same. sometimes we lose
   60ms, sometimes we gain 60ms, so it's about average. I'll rebuild the index
   from scratch to see differences maybe today or later this week
   4. Since i had to turned termVectors="true" termPositions="true"
   termOffsets="true" in 3 fileds to use fastvectorhighlight, i expect speed
   gains in HL


2011/8/17 Samarendra Pratap 

> Hi we are planning to migrate from solr 1.4.1 to solr 3.3 and I am doing a
> manual performance comparison.
>
> We have setup two different solr installations (1.4.1 and 3.3) on different
> ports.
>  1. Both have same index (old lucene format index) of around 20 GB with 10
> million documents and 60 fields (40 fields with indexed="true").
>  2. Both processes have  max 4GB memory allocated (-Xms2048m -Xmx4096m)
>  3. Both installation are on same server (8 processor Intel(R) Core(TM) i7
> CPU 930 @ 2.80GHz, 8GB RAM, 64 bit linux system)
>  4. We are running solr 1.4.1 with collapsing patch
> (SOLR-236-1_4_1.patch
> ).
>
>  When I pass exactly similar query to both the servers one by one solr
> 1.4.1
> is more efficient than solr 3.3.
>  Before I convert the index into LUCENE_33 format I thought it would be
> good
> to take the expert advice.
>
>  Is there something which I should look into deeply? Or could this be
> effect
> of old index format with new version and should be ignored?
>
>  When I used "debugQuery=true", it clearly shows
> that org.apache.solr.handler.component.CollapseComponent (solr 1.4.1)
> noticeably taking less time
> than org.apache.solr.handler.component.QueryComponent (solr 3.3).
>
>  I am testing this against simple queries without any faceting,
> highlighting, collapsing etc. (*
>
> http://xxx.xxx:8983/solr/select/?q=Packaging%20Material,%20Supplies&qt=dismax&qf=category
>
> ^4.0&qf=keywords^2.0&qf=title^2.0&qf=smalldesc&qf=companyname&qf=usercategory&qf=usrpcatdesc&qf=city&qs=10&pf=category^4.0&pf=keywords^3&pf=title^3&pf=smalldesc^1.5&pf=companyname&pf=usercategory&pf=usrpcatdesc&pf=city&ps=0&bq=type:[149%20TO%201500]^3&start=0&rows=50&fl=title,smalldesc,id&debugQuery=true
> *)
>
>  Any insights by the experts would be greatly appreciated!
>
>  Thanks in advance.
>
> --
> Regards,
> Samar
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

RE: Solr 1.4.1 vs 3.3 (Speed)

It would perhaps help if you reported what you mean by "noticeably less time".  
What were  your timings?  Did you run the tests multiple times?

One thing to watch for in testing:  Solr performance is greatly affected by the 
OS file system cache.  So make sure when testing that you use the same 
searches, and that you run your tests enough times (or not) so that the OS file 
system cache is populated (or not).

So if, for example, you ran your Solr 1.4 test against your production server 
(which would have the file system cache populated), but ran your Solr 3.3 test 
from  a "cold" start, you would indeed get very different search results.

That said, in my testing I have noticed that Solr 3.3 seems to be noticeably 
slower than Solr 3.2 which seems to be about the same as Solr 3.1 which was a 
little slower than Solr 1.4.  

So, I offer the test results below -- with the caveat that I didn't always 
record all the parameters of the test, and didn't always worry about having 
only one thing changing between tests -- my goal at the time was to confirm 
that performance was adequate for our particular need.  Also, in the WebSphere 
7 tests, I probably also had myEclipse running (that I used to build an EAR to 
feed to WAS 7), so there was less file system cache available to it.

(In the table below, each thread runs 100 queries.  One query term (using a 
last and first name) was set to the specified fuzz)

FuzzThreads   Time/request   Rate/HourRelease   Container
 04   0.1690,0001.4   Jetty
 04   0.3837,8943.1   Jetty

???   4   0.6721,4923.1   WebSphere 7   [Didn't 
record the fuzz factor, but think it was 0.50]
???   4   0.6621,8183.2   WebSphere 7   [But 
used the same one here.]
???   4   1.6814,6933.3   WebSphere 7   [And, I 
believe, the same one here]
 

0.80  4   0.2856,4701.4   Jetty
0.80  4   0.3442,3543.1   Jetty



-Original Message-
From: Samarendra Pratap [mailto:samarz...@gmail.com] 
Sent: Wednesday, August 17, 2011 4:05 AM
To: solr-user@lucene.apache.org
Subject: Solr 1.4.1 vs 3.3 (Speed)

Hi we are planning to migrate from solr 1.4.1 to solr 3.3 and I am doing a
manual performance comparison.

We have setup two different solr installations (1.4.1 and 3.3) on different
ports.
 1. Both have same index (old lucene format index) of around 20 GB with 10
million documents and 60 fields (40 fields with indexed="true").
 2. Both processes have  max 4GB memory allocated (-Xms2048m -Xmx4096m)
 3. Both installation are on same server (8 processor Intel(R) Core(TM) i7
CPU 930 @ 2.80GHz, 8GB RAM, 64 bit linux system)
 4. We are running solr 1.4.1 with collapsing patch
(SOLR-236-1_4_1.patch
).

 When I pass exactly similar query to both the servers one by one solr 1.4.1
is more efficient than solr 3.3.
 Before I convert the index into LUCENE_33 format I thought it would be good
to take the expert advice.

 Is there something which I should look into deeply? Or could this be effect
of old index format with new version and should be ignored?

 When I used "debugQuery=true", it clearly shows
that org.apache.solr.handler.component.CollapseComponent (solr 1.4.1)
noticeably taking less time
than org.apache.solr.handler.component.QueryComponent (solr 3.3).

 I am testing this against simple queries without any faceting,
highlighting, collapsing etc. (*
http://xxx.xxx:8983/solr/select/?q=Packaging%20Material,%20Supplies&qt=dismax&qf=category
^4.0&qf=keywords^2.0&qf=title^2.0&qf=smalldesc&qf=companyname&qf=usercategory&qf=usrpcatdesc&qf=city&qs=10&pf=category^4.0&pf=keywords^3&pf=title^3&pf=smalldesc^1.5&pf=companyname&pf=usercategory&pf=usrpcatdesc&pf=city&ps=0&bq=type:[149%20TO%201500]^3&start=0&rows=50&fl=title,smalldesc,id&debugQuery=true
*)

 Any insights by the experts would be greatly appreciated!

 Thanks in advance.

-- 
Regards,
Samar

RE: embeded solrj doesn't refresh index

2011-08-17 Thread mmg2

I'm having the same problem: I import my data using the DataImportHandler.
When the DIH runs, I see the changes in the index file. However, when I
query the index using SolrJ, the new results don't show up. I have to
restart my server to see the results using SolrJ. This is how I use solrj:

private static final SolrServer solrServer = initSolrServer();

private static SolrServer initSolrServer() {
try {
CoreContainer.Initializer initializer = new 
CoreContainer.Initializer();
coreContainer = initializer.initialize();
EmbeddedSolrServer server = new 
EmbeddedSolrServer(coreContainer, "");
return server;
} catch (Exception ex) {
logger.log(Level.SEVERE, "Error initializing SOLR 
server", ex);
return null;
}
}


Is it wrong to declare the SolrServer as static final? Should I create a new
EmbeddedSolrServer for each query?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/embeded-solrj-doesn-t-refresh-index-tp3184321p3261772.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ClassNotFoundException when trying to make spellcheck JaraWinkler working

Hi Mike, is your config like this?
Is queryAnalyzerFieldType matching your type of field to be indexed?
Is the field correct?


textSpell

jarowinkler
sear_spellterms
false
true
org.apache.lucene.search.spell.JaroWinklerDistance
./spellchecker_jarowinkler



2011/8/17 Mike Mander 

> Hello,
>
> i get a ClassNotFoundException for JaraWinklerDistance when i start the
> solr example server.
> I simply copied the server and uncommented the spellchecker in
> example/conf/solr-config.xml
> I did nothing else.
>
> I already googled but didn't get a hint. Can someone help me please.
>
> Thanks
> Mike
>
> Stacktrace:
>
> C:\Users\m.mander\Desktop\**temp\apache-solr-3.3.0\**example>java -jar
> start.jar
> 2011-08-17 14:55:20.379:INFO::Logging to STDERR via
> org.mortbay.log.StdErrLog
> 2011-08-17 14:55:20.462:INFO::jetty-6.1-**SNAPSHOT
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> locateSolrHome
> INFO: JNDI not configured for solr (NoInitialContextEx)
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> locateSolrHome
> INFO: solr home defaulted to 'solr/' (could not find system property or
> JNDI)
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader 
> INFO: Solr home set to 'solr/'
> 17.08.2011 14:55:20 org.apache.solr.servlet.**SolrDispatchFilter init
> INFO: SolrDispatchFilter.init()
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> locateSolrHome
> INFO: JNDI not configured for solr (NoInitialContextEx)
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> locateSolrHome
> INFO: solr home defaulted to 'solr/' (could not find system property or
> JNDI)
> 17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer$Initializer
> initialize
> INFO: looking for solr.xml: C:\Users\m.mander\Desktop\**
> temp\apache-solr-3.3.0\**example\solr\solr.xml
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> locateSolrHome
> INFO: JNDI not configured for solr (NoInitialContextEx)
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> locateSolrHome
> INFO: solr home defaulted to 'solr/' (could not find system property or
> JNDI)
> 17.08.2011 14:55:20 org.apache.solr.core.**CoreContainer 
> INFO: New CoreContainer: solrHome=solr/ instance=22725577
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader 
> INFO: Solr home set to 'solr/'
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader 
> INFO: Solr home set to 'solr\.\'
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrConfig initLibs
> INFO: Adding specified lib dirs to ClassLoader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/asm-**3.1.jar' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/asm-**LICENSE-BSD_LIKE.txt' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/asm-**NOTICE.txt' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/**bcmail-jdk15-1.45.jar' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/**bcmail-LICENSE-BSD_LIKE.txt' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/**bcmail-NOTICE.txt' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/**bcprov-jdk15-1.45.jar' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/**bcprov-LICENSE-BSD_LIKE.txt' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/**bcprov-NOTICE.txt' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extraction/lib/**boilerpipe-1.1.0.jar' to classloader
> 17.08.2011 14:55:20 org.apache.solr.core.**SolrResourceLoader
> replaceClassLoader
> INFO: Adding 'file:/C:/Users/m.mander/**Desktop/temp/apache-solr-3.3.**
> 0/contrib/extract

Re: Filtering results based on a set of values for a field

2011-08-17 Thread Tomas Zerolo

On Tue, Aug 16, 2011 at 07:56:51AM +, tomas.zer...@axelspringer.de wrote:
> Hello, Solrs
> 
> we are trying to filter out documents written by (one or more of) the authors 
> from
> a mediumish list (~2K). The document set itself is in the millions.

[...]

Sorry. Forgot to say that we are using SOLR 1.4 (yet). But any pointers, even if
they are 3.x only are highly appreciated.

Regards
-- tomás

Re: SolR : eDismax does not always use the defaultOperator "AND"

2011-08-17 Thread Shawn Heisey


On 8/17/2011 6:46 AM, Valentin wrote:

I set the defaultOperator at "AND" in shema.xml :

I use the defType=eDismax in my query. It works very well, but when I want
to use "AND" or "OR" operators, it doesn't use "AND" as the default operator
for the blanks I left without operators.

Examples:

field1:a field2:b does the same thing than field1:a AND field2:b : OK

field1:a OR field2:b : OK, I have all the results that I want

but

field1:a (field2:b OR field2:c) does the same thing than field1:a OR
(field2:b OR field2:c) : that's not OK

How can I force him to use "AND" as the default operator even in that case ?


The dismax and edismax parsers do not respect the default operator in 
quite the way you might expect.  They give you much more control than 
simply choosing one or the other.  The value of the mm parameter gives 
you very fine-grained control over how much of your query must match the 
document.  This is particularly useful if the user enters a lot of terms 
- you might want only 75% of them to be absolutely required.


http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

If you simply want to duplicate "AND", set mm=100% either in your query 
URL or in your solrconfig.xml request handler definition.  According to 
the wiki, if you are using 3.1 or later, you could leave mm out entirely 
and it will respect the default operator you've defined in your config, 
but if you specify a value for mm, it will be used instead.


Shawn

RE: master unreachable - attempting simple replication

I'd suggest looking at the logs of the master to see if the request is getting 
thru or not, or if there are any errors logged there.  If the master has a 
replication config error, it might show up there.

We just went thru some master/slave troubleshooting.  Here are some things that 
you might look at:

Does slave replication work when it is local to the master?  If not, debug that 
first.

The browser might be using an http proxy.  I don't know of Solr pays attention 
to the http_proxy environment variable.

Your master might require a user ID, password (and realm), but you haven't 
provided it in the slave config.

What container is your master running in?  Under some containers (e.g. 
WebSphere) you need to add some stuff to web.xml, or replication will fail with 
a 404 -- but work OK from a browser AFTER accessing the admin page, because the 
container does not know about /replication or /core-name/replication because 
there isn't anything for that in web.xml.  Once you authenticate, this problem 
disappears.  (This was what we ran into, solution documented earlier this week 
on the list).

-Original Message-
From: Martin Ostrovsky [mailto:martin.ostrov...@gmail.com] 
Sent: Tuesday, August 16, 2011 12:27 PM
To: solr-user@lucene.apache.org
Subject: master unreachable - attempting simple replication 

I've got a master set up on a public IP and I'm using my laptop as the slave, 
just trying to get a simple replication going. When I'm on my slave machine and 
I look at the replication tab of the admin, it says my master is unreachable, 
however, I can hit the master's replication handler using the public IP through 
a browser.

I thought it might be a DNS issue so instead of using the domain name, I 
switched to the raw IP, still no luck, says master is unreachable.

Definitely not firewall rules either.

Where can I look to see what's causing the failure?

Thanks.

ClassNotFoundException when trying to make spellcheck JaraWinkler working

2011-08-17 Thread Mike Mander


Hello,

i get a ClassNotFoundException for JaraWinklerDistance when i start the 
solr example server.
I simply copied the server and uncommented the spellchecker in 
example/conf/solr-config.xml

I did nothing else.

I already googled but didn't get a hint. Can someone help me please.

Thanks
Mike

Stacktrace:

C:\Users\m.mander\Desktop\temp\apache-solr-3.3.0\example>java -jar start.jar
2011-08-17 14:55:20.379:INFO::Logging to STDERR via 
org.mortbay.log.StdErrLog

2011-08-17 14:55:20.462:INFO::jetty-6.1-SNAPSHOT
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or 
JNDI)

17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to 'solr/'
17.08.2011 14:55:20 org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or 
JNDI)
17.08.2011 14:55:20 org.apache.solr.core.CoreContainer$Initializer 
initialize
INFO: looking for solr.xml: 
C:\Users\m.mander\Desktop\temp\apache-solr-3.3.0\example\solr\solr.xml

17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: JNDI not configured for solr (NoInitialContextEx)
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: solr home defaulted to 'solr/' (could not find system property or 
JNDI)

17.08.2011 14:55:20 org.apache.solr.core.CoreContainer 
INFO: New CoreContainer: solrHome=solr/ instance=22725577
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to 'solr/'
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to 'solr\.\'
17.08.2011 14:55:20 org.apache.solr.core.SolrConfig initLibs
INFO: Adding specified lib dirs to ClassLoader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/asm-3.1.jar' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/asm-LICENSE-BSD_LIKE.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/asm-NOTICE.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/bcmail-jdk15-1.45.jar' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/bcmail-LICENSE-BSD_LIKE.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/bcmail-NOTICE.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/bcprov-jdk15-1.45.jar' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/bcprov-LICENSE-BSD_LIKE.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/bcprov-NOTICE.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/boilerpipe-1.1.0.jar' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/boilerpipe-LICENSE-ASL.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/boilerpipe-NOTICE.txt' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Adding 
'file:/C:/Users/m.mander/Desktop/temp/apache-solr-3.3.0/contrib/extraction/lib/commons-compress-1.1.jar' 
to classloader
17.08.2011 14:55:20 org.apache.solr.core.SolrResourceLoader 
replaceClassLoader
INFO: Add

Re: Faceted Search Patent Lawsuit

2011-08-17 Thread LaMaze Johnson

Paul Libbrecht-4 wrote:
> 
> Robert,
> 
> I believe, precisely, the objective of such a thread is to be helped by
> knowledgeable techies into being able to do what you say.
> 
> If Johnson gave only 3 lines of details, such as claimed patent URLs or
> dates, we might easily be able to tell him the pointer of a publication
> that would discourage such patent-trolling.
> 
> paul
> 

I'm sorry. I assumed the information I provided would lead any resourceful
techie to the details.  At any rate here is some additional information:

Patent Claim:
http://www.google.com/patents?id=oFwIEBAJ&printsec=frontcover&dq=6275821&hl=en&ei=57ZLTs7jHs3HsQKWvsjcCA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCkQ6AEwAA

Background:
http://www.ndcalblog.com/2011/07/ebay-prevails-in-limiting-patent.html

Ebay/Microsoft suit:
http://www.whda.com/blog/wp-content/uploads/2011/05/2011.5.12-MO-Ebay-v.-Parts-River.pdf

Adobe suit: http://news.priorsmart.com/adobe-systems-v-kelora-systems-l4in/

Consequently, they have been known to act on these threats.  I don't think
it would be prudent to ignore them.  At any rate, lawyers will be involved
and they aren't cheap.  Until the suits have been played out with the likes
of eBay, Microsoft, and Adobe, potentially anyone who uses faceted search
systems could potentially be at risk.

Take it for what it's worth.  Don't shoot the messenger.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-Search-Patent-Lawsuit-Please-Read-tp3259475p3261514.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-17 Thread Ian Connor

That is a good suggestion. At the very least I can catch this error and
create a new connection when I see this - thanks.

On Sun, Aug 14, 2011 at 3:46 PM, Erik Hatcher wrote:

> Does instantiating a Solr::Connection for each request make things better?
>
>Erik
>
> On Aug 14, 2011, at 11:34 , Ian Connor wrote:
>
> > It is nothing special - just like this:
> >
> >conn   = Solr::Connection.new("http://#{LOCAL_SHARD}";,
> > {:timeout => 1000, :autocommit => :on})
> >options[:shards] = HA_SHARDS
> >response = conn.query(query, options)
> >
> > Where LOCAL_SHARD points to a haproxy of a single shard and HA_SHARDS is
> an
> > array of 18 shards (via haproxy).
> >
> > Ian.
> >
> > On Mon, Aug 8, 2011 at 12:50 PM, Erik Hatcher  >wrote:
> >
> >> Ian -
> >>
> >> What does your solr-ruby using code look like?
> >>
> >> Solr::Connection is light-weight, so you could just construct a new one
> of
> >> those for each request.  Are you keeping an instance around?
> >>
> >> Erik
> >>
> >>
> >> On Aug 8, 2011, at 12:03 , Ian Connor wrote:
> >>
> >>> Hi,
> >>>
> >>> I have seen some of these errors come through from time to time. It
> looks
> >>> like:
> >>>
> >>> /usr/lib/ruby/1.8/net/http.rb:1060:in
> >>> `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post'
> >>>
> >>>
> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in
> >>> `post'
> >>>
> >>>
> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:151:in
> >>> `send'
> >>>
> >>>
> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:174:in
> >>> `create_and_send_query'
> >>>
> >>>
> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:92:in
> >>> `query'
> >>>
> >>> It is as if the http object has gone away. Would it be good to create a
> >> new
> >>> one inside of the connection or is something more serious going on?
> >>> ubuntu 10.04
> >>> passenger 3.0.8
> >>> rails 2.3.11
> >>>
> >>> --
> >>> Regards,
> >>>
> >>> Ian Connor
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Ian Connor
> > 1 Leighton St #723
> > Cambridge, MA 02141
> > Call Center Phone: +1 (714) 239 3875 (24 hrs)
> > Fax: +1(770) 818 5697
> > Skype: ian.connor
>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor

Index directories on slaves

2011-08-17 Thread Ian Connor

Hi,

We have noticed that many index.* directories are appearing on slaves (some
more than others).

e.g. ls shows

index/index.20110101021510/ index.20110105030400/
index.20110106040701/ index.20110130031416/
index.20101222081713/ index.20110101034500/ index.20110105075100/
index.20110107085605/ index.20110812153349/
index.20101231011754/ index.20110105022600/ index.20110106024902/
index.20110108014100/ index.20110814204200/

Are this harmful, should I clean them out. I see a command for backup
cleanup but am not sure the best way to clean these up (apart from removing
all index* and getting a fresh replica).

We have also seen on the latest 3.4 build that replicas are getting 1000s of
files even though the masters have less than a 100 each. It seems as though
they are not deleting after some replications and not sure if this is also
related. We are trying to monitor this to see if we can find out how to
reproduce it or at least the conditions that tend to reproduce it.

-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor

SolR : eDismax does not always use the defaultOperator "AND"

2011-08-17 Thread Valentin

I set the defaultOperator at "AND" in shema.xml : 

I use the defType=eDismax in my query. It works very well, but when I want
to use "AND" or "OR" operators, it doesn't use "AND" as the default operator
for the blanks I left without operators.

Examples:

field1:a field2:b does the same thing than field1:a AND field2:b : OK

field1:a OR field2:b : OK, I have all the results that I want

but

field1:a (field2:b OR field2:c) does the same thing than field1:a OR
(field2:b OR field2:c) : that's not OK

How can I force him to use "AND" as the default operator even in that case ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-eDismax-does-not-always-use-the-defaultOperator-AND-tp3261500p3261500.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Unable to get multicore working

Glad to hear it.

BTW, I highly recommend the following documents on the web:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
The tutorial at http://lucene.apache.org/solr/tutorial.html
And, of course, the multi-core document at http://wiki.apache.org/solr/CoreAdmin

The book from PACKT Publishing "Solr 1.4 Enterprise Search Server" by David 
Smiley and Eric Pugh is also a nice handy reference to stuff you otherwise have 
to go looking here and there on the web site and Wiki for.

Solr and Lucene are deep subjects of which I have only scratched the surface.

Best of luck!



-Original Message-
From: David Sauve [mailto:dnsa...@gmail.com] 
Sent: Tuesday, August 16, 2011 5:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Unable to get multicore working

Ok. Fixed that too, now. The schema didn't define "long".

Looks like everything is a-okay, now. Thanks for the help. You guys saved me 
from the insane asylum. 

On Tuesday, 16 August, 2011 at 2:32 PM, Jaeger, Jay - DOT wrote:

>  That said, the logs are showing a different error now. Excellent! The site 
> schemas are loading!
> 
> Great!
> 
>  "SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'long' 
> specified on field area_id"
> 
> Go have a look at your conf/schema.xml. 
> 
> Is the following line present?? Does your field definition for area_id follow 
> it?
> 
>  omitNorms="true" positionIncrementGap="0"/>
> 
> Look at the file with an XML editor. Perhaps an edit to some earlier portion 
> of the schema is messing up this part of the schema?
> 
> 
> -Original Message-
> From: David Sauve [mailto:dnsa...@gmail.com] 
> Sent: Tuesday, August 16, 2011 4:24 PM
> To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
> Subject: Re: Unable to get multicore working
> 
> I updated my `solr.xml` as follow:
> 
> 
> 
> 
>  dataDir="/home/webteam/preview/data" />
>  dataDir="/home/webteam/staging/data" />
>  dataDir="/home/webteam/live/data" />
> 
> 
> 
> 
> and I'm still seeing the same 404 when I true to view /solr/admin/ or 
> /solr/live/admin/
> 
> That said, the logs are showing a different error now. Excellent! The site 
> schemas are loading!
> 
> Looks like the site schemas have an issue:
> 
> "SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'long' 
> specified on field area_id"
> 
> Errr. Why would `long` be an invalid type? 
> 
> 
> On Tuesday, 16 August, 2011 at 2:06 PM, Jaeger, Jay - DOT wrote:
> 
> > Whoops: That was Solr 4.0 (which pre-dates 3.1).
> > 
> > I doubt very much that the release matters, though: I expect the behavior 
> > would be the same.
> > 
> > -Original Message-
> > From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
> > Sent: Tuesday, August 16, 2011 4:04 PM
> > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) 
> > (mailto:solr-user@lucene.apache.org)
> > Subject: RE: Unable to get multicore working
> > 
> > I tried on my own test environment -- pulling out the default core 
> > parameter out, under Solr 3.1 
> > 
> > I got exactly your symptom: an error 404. 
> > 
> >  HTTP ERROR 404
> >  Problem accessing /solr/admin/index.jsp. Reason: 
> > 
> >  missing core name in path
> > 
> > The log showed:
> > 
> > 2011-08-16 16:00:12.469:WARN::/solr/admin/
> > java.lang.IllegalStateException: STREAM
> >  at org.mortbay.jetty.Response.getWriter(Response.java:616)
> >  at org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:187)
> >  at 
> > org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:180)
> >  at 
> > org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:237)
> >  at 
> > org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:173)
> >  at 
> > org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:124)
> > 
> > (etc.)
> > 
> > Adding the defaultCoreName fixed it.
> > 
> > I expect this is indeed your problem.
> > 
> > -Original Message-
> > From: David Sauve [mailto:dnsa...@gmail.com] 
> > Sent: Tuesday, August 16, 2011 3:50 PM
> > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) 
> > (mailto:solr-user@lucene.apache.org)
> > Subject: Re: Unable to get multicore working
> > 
> > Nope. Only thing in the log:
> > 
> > 1 [main] INFO org.mortbay.log - Logging to 
> > org.slf4j.impl.SimpleLogger(org.mortbay.log) via org.mortbay.log.Slf4jLog
> > 173 [main] INFO org.mortbay.log - Redirecting stderr/stdout to 
> > /var/log/jetty/2011_08_16.stderrout.log
> > 
> > 
> > 
> > 
> > On Tuesday, 16 August, 2011 at 1:45 PM, Alexei Martchenko wrote:
> > 
> > > Is your solr.xml in usr/share/jetty/solr/solr.xml?
> > > 
> > > lets try this xml instead
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Can you see the logs? You should see something like this
> > > 
> > > 16/08/2011 17:30:55 org.apache.solr.core.SolrResourceLoader 
> > > *INFO: Solr home set to 'solr/'*
> > > 16/08/2011 17

Re: Solr UIMA integration problem

2011-08-17 Thread Tommaso Teofili

At a first glance I think the problem is in the 'feature' element which is
set to 'title'.
The 'feature' element should contain a UIMA Feature of the type defined in
element 'type'; for example for SentenceAnnotation [1] defined in HMM Tagger
has 'only' the default features of a UIMA Annotation: begin, end and
coveredText.
So I think you should change the 'feature' elements' values to 'coveredText'
which contains the text covered by the specified UIMA annotation.
Hope this helps,
Tommaso


[1] :
http://svn.apache.org/repos/asf/uima/addons/trunk/Tagger/src/main/java/org/apache/uima/SentenceAnnotation.java

2011/8/17 solr nps 

> Hello,
>
> I am using Solr 3.3. I have been following instructions at
>
> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/solr/contrib/uima/README.txt
>
> My setup looks like the following.
>
> solr lib directory contains the following jars
>
> apache-solr-uima-3.3.0.jar
> commons-digester-2.0.jar
> uima-an-alchemy-2.3.1-SNAPSHOT-r1062868.jar
> uima-an-calais-2.3.1-SNAPSHOT-r1062868.jar
> uima-an-tagger-2.3.1-SNAPSHOT-r1062868.jar
> uima-an-wst-2.3.1-SNAPSHOT-r1076132.jar
> uimaj-core-2.3.1.jar
>
>
> solr_config.xml has the following changes.
>
>  
> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
>  
>
>  MY_KEY
>  MY_KEY
>  MY_KEY
>  MY_KEY
>  MY_KEY
>  MY_SECOND_KEY
>
>
> name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml
>false
>title
>
>  false
>*  *
> *title*
> *  *
>
>
>  
> name="name">org.apache.uima.alchemy.ts.concept.ConceptFS
>
>  *title*
> *  concept*
>
>  
>  
>org.apache.uima.SentenceAnnotation
>
> * title*
> *  sentence*
>
>  
>
>  
>
>
> and
>
> 
>
>  uima
>
>  
>
> I am trying to index a simple document which looks like the following
>
> 
> 
> 1456780001
> Canon powershow camera 9000
> 
> 
>
>
> I am using curl to post this document on the /update end point and I am
> getting the following error
>
> *org.apache.solr.common.SolrException: processing error: null.* title=Canon
> powershow camera 9000,  text="Canon powershow camera 9000..."
> at
>
> org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:107)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at
>
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279)
> at
>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
> at
>
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> Caused by: org.apache.solr.uima.processor.exception.FieldMappingException
> *at
>
> org.apache.solr.uima.processor.UIMAToSolrMapper.map(UIMAToSolrMapper.java:83)
> *
> * at
>
> org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:85)
> *
> * ... 23 more*
>
> What could be the problem?
>
> Thanks for your time
>

Re: Faceted Search Patent Lawsuit

2011-08-17 Thread Paul Libbrecht

Le 17 août 2011 à 13:01, Robert Muir a écrit :

>> On Tue, Aug 16, 2011 at 03:58:29PM -0400, Grant Ingersoll wrote:
>>> I know you mean well and are probably wondering what to do next [...]
>> 
>> Still, a short heads-up like Johnson's would seem OK?
>> After all, this is of concern to us all.
> 
> nothing to be concerned about, just a stupid patent troll, and I feel
> like this thread feeds him.
> 
> just dont give this "company" any money, do your homework first

Robert,

I believe, precisely, the objective of such a thread is to be helped by 
knowledgeable techies into being able to do what you say.

If Johnson gave only 3 lines of details, such as claimed patent URLs or dates, 
we might easily be able to tell him the pointer of a publication that would 
discourage such patent-trolling.

paul

Re: Faceted Search Patent Lawsuit - Please Read

2011-08-17 Thread Robert Muir

On Wed, Aug 17, 2011 at 3:12 AM, Tomas Zerolo
 wrote:
> On Tue, Aug 16, 2011 at 03:58:29PM -0400, Grant Ingersoll wrote:
>> I know you mean well and are probably wondering what to do next [...]
>
> Still, a short heads-up like Johnson's would seem OK?
>
> After all, this is of concern to us all.
>

nothing to be concerned about, just a stupid patent troll, and I feel
like this thread feeds him.

just dont give this "company" any money, do your homework first

-- 
lucidimagination.com

Re: exceeded limit of maxWarmingSearchers ERROR

2011-08-17 Thread Naveen Gupta

Hi Nagendra,

Thanks a lot .. i will start working on NRT today.. meanwhile old settings
(increased warmSearcher in Master) have not given me trouble till now ..

but NRT will be more suitable to us ... Will work on that one and will
analyze the performance and share with you.

Thanks
Naveen

2011/8/17 Nagendra Nagarajayya 

> Naveen:
>
> See below:
>
>> *NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a
>>
>> document to become searchable*. Any document that you add through update
>> becomes  immediately searchable. So no need to commit from within your
>> update client code.  Since there is no commit, the cache does not have to
>> be
>> cleared or the old searchers closed or  new searchers opened, and warmed
>> (error that you are facing).
>>
>>
>> Looking at the link which you mentioned is clearly what we wanted. But the
>> real thing is that you have "RA does need a commit for  a document to
>> become
>> searchable" (please take a look at bold sentence) .
>>
>>
> Yes, as said earlier you do not need a commit. A document becomes
> searchable as soon as you add it. Below is an example of adding a document
> with curl (this from the wiki at http://solr-ra.tgels.com/wiki/**
> en/Near_Real_Time_Search_ver_**3.x
> ):
>
> curl "http://localhost:8983/solr/**update/csv?stream.file=/tmp/**
> x1.csv&encapsulator=%1f
> "
>
>
> There is no commit included. The contents of the document become
> immediately searchable.
>
>
>  In future, for more loads, can it cater to Master Slave (Replication) and
>> etc to scale and perform better? If yes, we would like to go for NRT and
>> looking at the performance described in the article is acceptable. We were
>> expecting the same real time performance for a single user.
>>
>>
> There are no changes to Master/Slave (replication) process. So any changes
> you have currently will work as before or if you enable replication later,
> it should still work as without NRT.
>
>
>  What about multiple users, should we wait for 1-2 secs before calling the
>> curl request to make SOLR perform better. Or internally it will handle
>> with
>> multiple request (multithreaded and etc).
>>
>
> Again for updating documents, you do not have to change your current
> process or code. Everything remains the same, except that if you were
> including commit, you do not include commit in your update statements. There
> is no change to the existing update process so internally it will not queue
> or multi-thread updates. It is as in existing Solr functionality, there no
> changes to the existing setup.
>
> Regarding perform better, in the Wiki paper  every update through curl adds
> (streams) 500 documents. So you could take this approach. (this was
> something that I chose randomly to test the performance but seems to be
> good)
>
>
>  What would be doc size (10,000 docs) to allow JVM perform better? Have you
>> done any kind of benchmarking in terms of multi threaded and multi user
>> for
>> NRT and also JVM tuning in terms of SOLR sever performance. Any kind of
>> performance analysis would help us to decide quickly to switch over to
>> NRT.
>>
>>
> The performance discussed in the wiki paper uses the MBArtists index. The
> MBArtists index is the index used as one of the examples in the book, Solr
> 1.4 Enterprise Search Server. You can download and build this index if you
> have the book or can also download the contents from musicbrainz.org.
>  Each doc maybe about 100 bytes and has about 7 fields. Performance with
> wikipedia's xml dump, commenting out skipdoc field (include redirects) in
> the dataconfig.xml [ dataimport handler ], the update performance is about
> 15000 docs / sec (100 million docs), with the skipdoc enabled (does not skip
> redirects), the performance is about 1350 docs / sec [ time spent mostly
> converting validating/xml  than actual update ] (about 11 million docs ).
>  Documents in wikipedia can be quite big, at least avg size of about
> 2500-5000 bytes or more.
>
> I would suggest that you download and give NRT with Apache Solr 3.3 and
> RankingAlgorithm a try and get a feel of it as this would be the best way to
> see how your config works with it.
>
>
>  Questions in terms for switching over to NRT,
>>
>>
>> 1.Should we upgrade to SOLR 4.x ?
>>
>> 2. Any benchmarking (10,000 docs/secs).  The question here is more
>> specific
>>
>> the detail of individual doc (fields, number of fields, fields size,
>> parameters affecting performance with faceting or w/o faceting)
>>
>
> Please see the MBArtists index as discussed above.
>
>
>
>  3. What about multiple users ?
>>
>> A user in real time might be having an large doc size of .1 million. How
>> to
>> break and analyze which one is better (though it is our task to do). But
>> still any kind of break up will help us. Imagine a user inbox.
>>
>>
> You maybe able to

RE: Unable to get multicore working

2011-08-17 Thread Gaurav Shingala


can you please try persistent="true" in solr tag as per my knowledge it will 
solve your 404 - Not found error.

Regards,
Gaurav

> Date: Tue, 16 Aug 2011 12:44:45 -0700
> From: dnsa...@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Unable to get multicore working
> 
>  I've been trying (unsuccessfully) to get multicore working for about a day 
> and a half now I'm nearly at wits end and unsure what to do anymore. **Any** 
> help would be appreciated. 
> 
> I've installed Solr using the solr-jetty packages on Ubuntu 10.04. The 
> default Solr install seems to work fine.
> 
> Now, I want to add three cores: live, staging, preview to be used for the 
> various states of the site.
> 
> I've created a `solr.xml` file as follows and symlinked it in to 
> /usr/share/solr: 
> 
> 
> 
> 
>  dataDir="/home/webteam/preview/data" />
>  dataDir="/home/webteam/staging/data" />
>  dataDir="/home/webteam/live/data" />
> 
> 
> 
> Now, when I try to view any cores, I get a 404 - Not found. In fact, I can't 
> even view /solr/admin/ anymore after installing that `solr.xml` file.
> 
> Also, /solr/admin/cores returns an XML file, but it looks to me like there's 
> no cores listed. The output:
> 
> 
> 
> 0
> 0
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Finally, looking through the logs produced by Jetty doesn't seem to reveal 
> any clues about what is wrong. There doesn't seem to be any errors in there, 
> except the 404s.
> 
> Long story short. I'm stuck. Any suggestions on where to go with this?
> 
> David 
>

Solr 1.4.1 vs 3.3 (Speed)

2011-08-17 Thread Samarendra Pratap

Hi we are planning to migrate from solr 1.4.1 to solr 3.3 and I am doing a
manual performance comparison.

We have setup two different solr installations (1.4.1 and 3.3) on different
ports.
 1. Both have same index (old lucene format index) of around 20 GB with 10
million documents and 60 fields (40 fields with indexed="true").
 2. Both processes have  max 4GB memory allocated (-Xms2048m -Xmx4096m)
 3. Both installation are on same server (8 processor Intel(R) Core(TM) i7
CPU 930 @ 2.80GHz, 8GB RAM, 64 bit linux system)
 4. We are running solr 1.4.1 with collapsing patch
(SOLR-236-1_4_1.patch
).

 When I pass exactly similar query to both the servers one by one solr 1.4.1
is more efficient than solr 3.3.
 Before I convert the index into LUCENE_33 format I thought it would be good
to take the expert advice.

 Is there something which I should look into deeply? Or could this be effect
of old index format with new version and should be ignored?

 When I used "debugQuery=true", it clearly shows
that org.apache.solr.handler.component.CollapseComponent (solr 1.4.1)
noticeably taking less time
than org.apache.solr.handler.component.QueryComponent (solr 3.3).

 I am testing this against simple queries without any faceting,
highlighting, collapsing etc. (*
http://xxx.xxx:8983/solr/select/?q=Packaging%20Material,%20Supplies&qt=dismax&qf=category
^4.0&qf=keywords^2.0&qf=title^2.0&qf=smalldesc&qf=companyname&qf=usercategory&qf=usrpcatdesc&qf=city&qs=10&pf=category^4.0&pf=keywords^3&pf=title^3&pf=smalldesc^1.5&pf=companyname&pf=usercategory&pf=usrpcatdesc&pf=city&ps=0&bq=type:[149%20TO%201500]^3&start=0&rows=50&fl=title,smalldesc,id&debugQuery=true
*)

 Any insights by the experts would be greatly appreciated!

 Thanks in advance.

-- 
Regards,
Samar

Re: Solr 3.3: DIH configuration for Oracle

2011-08-17 Thread Alexey Serba

Why do you need to collect both primary keys T1_ID_RECORD and
T2_ID_RECORD in your delta query. Isn't T2_ID_RECORD primary key value
enough to get all data from both tables? (you have table1-table2
relation as 1-N, right?)

On Thu, Aug 11, 2011 at 12:52 AM, Eugeny Balakhonov  wrote:
> Hello, all!
>
>
>
> I want to create a good DIH configuration for my Oracle database with deltas
> support. Unfortunately I am not able to do it well as DIH has the strange
> restrictions.
>
> I want to explain a problem on a simple example. In a reality my database
> has very difficult structure.
>
>
>
> Initial conditions: Two tables with following easy structure:
>
>
>
> Table1
>
> -          ID_RECORD    (Primary key)
>
> -          DATA_FIELD1
>
> -          ..
>
> -          DATA_FIELD2
>
> -          LAST_CHANGE_TIME
>
> Table2
>
> -          ID_RECORD    (Primary key)
>
> -          PARENT_ID_RECORD (Foreign key to Table1.ID_RECORD)
>
> -          DATA_FIELD1
>
> -          ..
>
> -          DATA_FIELD2
>
> -          LAST_CHANGE_TIME
>
>
>
> In performance reasons it is necessary to do selection of the given tables
> by means of one request (via inner join).
>
>
>
> My db-data-config.xml file:
>
>
>
> 
>
> 
>
>     password=""/>
>
>    
>
>        
>            query="select * from TABLE1 t1 inner join TABLE2 t2 on
> t1.ID_RECORD = t2.PARENT_ID_RECORD"
>
>            deltaQuery="select t1.ID_RECORD T1_ID_RECORD, t1.ID_RECORD
> T2_ID_RECORD
>
>                               from TABLE1 t1 inner join TABLE2 t2 on
> t1.ID_RECORD = t2.PARENT_ID_RECORD
>
>                               where TABLE1.LAST_CHANGE_TIME >
> to_date('${dataimporter.last_index_time}', '-MM-DD HH24:MI:SS')
>
>                               or TABLE2.LAST_CHANGE_TIME >
> to_date('${dataimporter.last_index_time}', '-MM-DD HH24:MI:SS')"
>
>            deltaImportQuery="select * from TABLE1 t1 inner join TABLE2 t2
> on t1.ID_RECORD = t2.PARENT_ID_RECORD
>
>            where t1.ID_RECORD = ${dataimporter.delta.T1_ID_RECORD} and
> t2.ID_RECORD = ${dataimporter.delta.T2_ID_RECORD}"
>
>        />
>
>    
>
> 
>
>
>
> In result I have following error:
>
>
>
> java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
> declared primary key pk='T1_ID_RECORD, T2_ID_RECORD'
>
>
>
> I have analyzed the source code of DIH. I found that in the DocBuilder class
> collectDelta() method works with value of entity attribute "pk" as with
> simple string. But in my case this is array with two values: T1_ID_RECORD,
> T2_ID_RECORD
>
>
>
> What do I do wrong?
>
>
>
> Thanks,
>
> Eugeny
>
>
>
>

Unable to get multicore working

2011-08-17 Thread David Sauve

 I've been trying (unsuccessfully) to get multicore working for about a day and 
a half now I'm nearly at wits end and unsure what to do anymore. **Any** help 
would be appreciated. 

I've installed Solr using the solr-jetty packages on Ubuntu 10.04. The default 
Solr install seems to work fine.

Now, I want to add three cores: live, staging, preview to be used for the 
various states of the site.

I've created a `solr.xml` file as follows and symlinked it in to 
/usr/share/solr: 










Now, when I try to view any cores, I get a 404 - Not found. In fact, I can't 
even view /solr/admin/ anymore after installing that `solr.xml` file.

Also, /solr/admin/cores returns an XML file, but it looks to me like there's no 
cores listed. The output:



0
0









Finally, looking through the logs produced by Jetty doesn't seem to reveal any 
clues about what is wrong. There doesn't seem to be any errors in there, except 
the 404s.

Long story short. I'm stuck. Any suggestions on where to go with this?

David

master unreachable - attempting simple replication

2011-08-17 Thread Martin Ostrovsky

I've got a master set up on a public IP and I'm using my laptop as the slave, 
just trying to get a simple replication going. When I'm on my slave machine and 
I look at the replication tab of the admin, it says my master is unreachable, 
however, I can hit the master's replication handler using the public IP through 
a browser.

I thought it might be a DNS issue so instead of using the domain name, I 
switched to the raw IP, still no luck, says master is unreachable.

Definitely not firewall rules either.

Where can I look to see what's causing the failure?

Thanks.

Re: Periodic search in date field

2011-08-17 Thread slaava

Thanks for quick reply!

Yes, this is my backup solution. But I prefer some one-query approach -
there could be many results so I want use SolrQuery.start() and
SolrQuery.rows() and show persons in table with paging. 

Are you sure mod() function is supported now? It isn't included in Math.*
function list here  http://wiki.apache.org/solr/FunctionQuery#Math..2A
http://wiki.apache.org/solr/FunctionQuery#Math..2A . I'm not on my
job-computer now so I couldn't test it. But if there really modulus is, it
will be great!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Periodic-search-in-date-field-tp3260793p3260896.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using Solr for indexing and searching files in a directory

On Wed, Aug 17, 2011 at 12:20 PM, Jagdish Kumar
 wrote:
>
> Hi All
>
> I have this requirement of indexing and searching files (txt, doc,pdf) on my 
> disk using Solr Search which I have installed.
> I am unable to find a relevant tutorial for the same, I would be thankfull if 
> anyone of you can actually help me out with the specific steps required.

One way is to use Nutch to crawl the filesystem, and the other
is to write a simple shell script yourself. Tika can be used to
digest .doc/.pdf documents.

Regards,
Gora

Re: Faceted Search Patent Lawsuit - Please Read

2011-08-17 Thread Tomas Zerolo

On Tue, Aug 16, 2011 at 03:58:29PM -0400, Grant Ingersoll wrote:
> I know you mean well and are probably wondering what to do next [...]

Still, a short heads-up like Johnson's would seem OK?

After all, this is of concern to us all.

Regards
-- tomás

Re: Problems generating war distribution using ant

2011-08-17 Thread arian487

Stupid me.  The output file was named something else.  I really need to make
a proper servlet mapping.  Works now :D

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-generating-war-distribution-using-ant-tp3260070p3260843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Periodic search in date field