RE: fmap.content - copying to two fields possible?

2010-05-03 Thread Naga Darbha
I've tried  and it is working fine.  But, still would like to know 
whether I can specify two fields against fmap.content.

regards,
Naga


-Original Message-
From: Naga Darbha [mailto:ndar...@opentext.com] 
Sent: Tuesday, May 04, 2010 12:20 PM
To: solr-user@lucene.apache.org
Subject: fmap.content - copying to two fields possible?

Hi,

I want to copy contents of a file (extracted using "ExtractingRequestHandler") 
to two fileds A and B.  Currently I have configured it with:

  

  
  A

  

If I want to copy the contents of the file to even B field, what is the option? 
 Can I specify two fields against fmap.content?

regards,
Naga


fmap.content - copying to two fields possible?

2010-05-03 Thread Naga Darbha
Hi,

I want to copy contents of a file (extracted using "ExtractingRequestHandler") 
to two fileds A and B.  Currently I have configured it with:

  

  
  A

  

If I want to copy the contents of the file to even B field, what is the option? 
 Can I specify two fields against fmap.content?

regards,
Naga


RE: Facets vs TermV's

2010-05-03 Thread Villemos, Gert
... I meant; the terms component is faster than using facets. Both of cause 
provide the autocomplete.
 



From: Villemos, Gert [mailto:gert.ville...@logica.com]
Sent: Tue 5/4/2010 8:30 AM
To: solr-user@lucene.apache.org
Subject: RE: Facets vs TermV's



I found a thread ones (sorry; cant remember where) which stated that the issue 
is performance; the terms component is faster than the autocomplete.

I'm no expert but I guess its a question of when the auto complete index gets 
build. Where as the terms component likely builds it at storage time, the facet 
component builds it at retrieval time.

Me thinks...

G.



From: Darren Govoni [mailto:dar...@ontrenet.com]
Sent: Tue 5/4/2010 3:53 AM
To: solr-user@lucene.apache.org
Subject: Facets vs TermV's



Hi,
  I spent a lot of time on the Wiki and am working with facets and tv's,
but I'm still confused about something.

Basically, what is the difference between issuing a facet field query
that returns facets with counts,
and a query with term vectors that also returns document frequency
counts for terms in a field?

They seem almost similar, but I'm missing something I think. i.e. when
is best to use one over the other?
I know this is Solr 101, but just want to understand it fully.

Thanks for any quick tips again.

thanks.




Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.





Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



RE: Facets vs TermV's

2010-05-03 Thread Villemos, Gert
I found a thread ones (sorry; cant remember where) which stated that the issue 
is performance; the terms component is faster than the autocomplete.
 
I'm no expert but I guess its a question of when the auto complete index gets 
build. Where as the terms component likely builds it at storage time, the facet 
component builds it at retrieval time.
 
Me thinks...
 
G.



From: Darren Govoni [mailto:dar...@ontrenet.com]
Sent: Tue 5/4/2010 3:53 AM
To: solr-user@lucene.apache.org
Subject: Facets vs TermV's



Hi,
  I spent a lot of time on the Wiki and am working with facets and tv's,
but I'm still confused about something.

Basically, what is the difference between issuing a facet field query
that returns facets with counts,
and a query with term vectors that also returns document frequency
counts for terms in a field?

They seem almost similar, but I'm missing something I think. i.e. when
is best to use one over the other?
I know this is Solr 101, but just want to understand it fully.

Thanks for any quick tips again.

thanks.




Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: solrDynamicMbeans access

2010-05-03 Thread Chris Hostetter

: i need to access the solr mbeans displayed in jconsole to access the
: attributes of solr using codes( java)
...
:   MBeanServerConnection mbs = conn.getMBeanServerConnection();
...
: now how do i create a solrMbean object to check on its attributes.

I'm not overly familiar with using programatic JMX to remote monitor 
remote applications, but i don't believe you wnat to "create" any MBeans 
... i believe you wnat to "query" that MBeanServerConnection for MBeans or 
"getMBeanInfo" for a given object name.



-Hoss



Re: Solr Dismax query - prefix matching

2010-05-03 Thread Chris Hostetter

: example: If I have a field called 'booktitle' with the actual values as
: 'Code Complete', 'Coding standard 101', then I'd like to search for the
: query string 'cod' and have the dismax match against both the book
: titles since 'cod' is a prefix match for 'code' and 'coding'. 

it doesn't sound like you really want prefix queries ... it sounds like 
you want stemming.  It's hard to tell because you only gave one example, 
so consider whether you want the book "codependents of agony" to match a 
search for "code" ... if hte answer is "yes" then what you are looking for 
is preix matching, if the answer is "no" then you should probably read up 
on stemming (which can work with the dismax parsing, by configuring it in 
the analyzer for your fields)


-Hoss



Re: run on reboot on windows

2010-05-03 Thread Lance Norskog
There are a few programs that wrap any java app as a service.

http://en.wikipedia.org/wiki/Service_wrapper

On Mon, May 3, 2010 at 6:58 AM, Dave Searle  wrote:
> I don't think jetty can be installed as a service. You'd need to
> create a bat file and put that in the win startup registry.
>
> Sent from my iPhone
>
> On 3 May 2010, at 11:26, "Frederico Azeiteiro"   > wrote:
>
>> Hi Ahmed,
>>
>> I need to achieve that also. Do you manage to install it as service
>> and
>> start solr with Jetty?
>> After installing and start jetty as service how do you start solr?
>>
>> Thanks,
>> Frederico
>>
>> -Original Message-
>> From: S Ahmed [mailto:sahmed1...@gmail.com]
>> Sent: segunda-feira, 3 de Maio de 2010 01:05
>> To: solr-user@lucene.apache.org
>> Subject: Re: run on reboot on windows
>>
>> Thanks, for some reason I was looking for a solution outside of
>> jetty/tomcat, when that was the obvious way to get things restarted :)
>>
>> On Sun, May 2, 2010 at 7:53 PM, Dave Searle
>> wrote:
>>
>>> Tomcat is installed as a service on windows. Just go into service
>>> control panel and set startup type to automatic
>>>
>>> Sent from my iPhone
>>>
>>> On 3 May 2010, at 00:43, "S Ahmed"  wrote:
>>>
 its not tomcat/jetty that's the issue, its how to get things to re-
 start on
 a windows server (tomcat and jetty don't run as native windows
 services) so
 I am a little confused..thanks.

 On Sun, May 2, 2010 at 7:37 PM, caman
 wrote:

>
> Ahmed,
>
>
>
> Best is if you take a look at the documentation of jetty or tomcat.
> SOLR
> can
> run on any web container, it's up to you how you  configure your
>> web
> container to run
>
>
>
> Thanks
>
> Aboxy
>
>
>
>
>
>
>
>
>
>
>
> From: S Ahmed [via Lucene]
>
>> [mailto:ml-node+772174-2097041460-124...@n3.nabble.com> %2B772174-
>> 2097041460-124...@n3.nabble.com>
>>>  %2b772174-2097041460-124...@n3.nabble.com>
> ]
> Sent: Sunday, May 02, 2010 4:33 PM
> To: caman
> Subject: Re: run on reboot on windows
>
>
>
> By default it uses Jetty, so your saying Tomcat on windows server
> 2008/
> IIS7
>
> runs as a native windows service?
>
> On Sun, May 2, 2010 at 12:46 AM, Dave Searle <[hidden email]>wrote:
>
>
>> Set tomcat6 service to auto start on boot (if running tomat)
>>
>> Sent from my iPhone
>>
>> On 2 May 2010, at 02:31, "S Ahmed" <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to get Solr to run on windows, such that if it reboots
>>> the Solr
>>> service will be running.
>>>
>>> How can I do this?
>>
>
>
>
> _
>
> View message @
>
>
>>>
>> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772
>> 174
> .
> html
> To start a new topic under Solr - User, email
>
>> ml-node+472068-464289649-124...@n3.nabble.com> %2B472068-464289649
>> -124...@n3.nabble.com>
>>>  %2b472068-464289649-124...@n3.nabble.com>
> To unsubscribe from Solr - User, click
> < (link removed)
> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.
>
>
>
>
> --
> View this message in context:
>
>>>
>> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772
>> 178.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>>>
>



-- 
Lance Norskog
goks...@gmail.com


Facets vs TermV's

2010-05-03 Thread Darren Govoni
Hi,
  I spent a lot of time on the Wiki and am working with facets and tv's,
but I'm still confused about something.

Basically, what is the difference between issuing a facet field query
that returns facets with counts,
and a query with term vectors that also returns document frequency
counts for terms in a field? 

They seem almost similar, but I'm missing something I think. i.e. when
is best to use one over the other?
I know this is Solr 101, but just want to understand it fully.

Thanks for any quick tips again. 

thanks.


Re: Solr commit issue

2010-05-03 Thread Lance Norskog
This could be caused by HTTP caching. Solr's example solrconfig.xml
comes with HTTP caching turned on, and this causes lots of beginners
to have problems. The code to turn it off is commented in
solrconfig.xml. Notice that the default is to have caching on, so to
turn it off you have to have the XML that turns it off.


On Sat, May 1, 2010 at 8:01 PM, Indika Tantrigoda  wrote:
> Thanks for the reply.
> Here is another thread I found similar to this
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg28236.html
>
> From what I understand the IndexReaders get reopened after a commit.
>
> Regards,
> Indika
>
> On 2 May 2010 00:29, Erick Erickson  wrote:
>
>> The underlying IndexReader must be reopened. If you're
>> searching for a document with a searcher that was opened
>> before the document was indexed, it won't show up on the
>> search results.
>>
>> I'm guessing that your statement that when you search
>> for it with some test is coincidence, but that's just a guess.
>>
>> HTH
>> Erick
>>
>> On Sat, May 1, 2010 at 1:07 PM, Indika Tantrigoda > >wrote:
>>
>> > Hi all,
>> >
>> > I've been working with Solr for a few weeks and have gotten SolrJ
>> > to connect to it, index, search documents.
>> >
>> > However I am having an issue when a document is committed.
>> > When a document is committed it does not show in the search results if I
>> do
>> > a *:* search,
>> > but if I search for it with some text then it is shown in the results.
>> > Only when another document is committed, the previous document is found
>> > when
>> > I do a *:* search
>> >
>> > Is this because of the SolrJ client or do I have to pass additional
>> > parameters to commit() ?
>> >
>> > Thanks in advance.
>> >
>> > Regards,
>> > Indika
>> >
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Auto-commit does not work

2010-05-03 Thread Darren Govoni
I think his point was, _what_ determines if its a misconfiguration? It
can't be Solr because, like he said, a plugin may require it.
If there is no such plugin, then what shall be the handler of it
properly? nothingergo its ignored.



On Mon, 2010-05-03 at 19:34 +0200, Andreas Jung wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Chris Hostetter wrote:
> > : right - and Solr should not swallow errors in the configuration :-)
> > 
> > If you have an error in a *known* config declaration, solr will complain 
> > about it -- but solr can't complain just because you declare extra stuff 
> > in your conig files that it doens't know anything about -- some other 
> > plugin might care about it (or it might be there because you wanted 
> > special syntax for your own documentation purposes)
> 
> I don't care about if this config is a core configuration or a
> configuration of some plugin. Such kind of misconfiguration should be
> handled properly - this is the minimum one can expect.
> 
> - -aj
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.10 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkvfCQ4ACgkQCJIWIbr9KYz22QCfeVNIwJt0f5+XfnV1qvsZ0HJm
> He0AoL5lIoEiUyhUINXpA2rcDB8bgsAy
> =skdf
> -END PGP SIGNATURE-




Re: Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-03 Thread Mark Miller

On 5/3/10 9:06 AM, Markus Fischer wrote:

Hi,

we recently began having trouble with our Solr 1.4 instance. We've about
850k documents in the index which is about 1.2GB in size; the JVM which
runs tomcat/solr (no other apps are deployed) has been given 2GB.

We've a forum and run a process every minute which indexes the new
messages. The number of messages updated are from 0 to 20 messages
average. The commit takes about 1 or two minutes, but usually when it
finished a few seconds later the next batch of documents is processed
and the story starts again.

So actually it's like Solr is running commits all day long and CPU usage
ranges from 80% to 120%.

This continuous CPU usage caused ill effects on other services running
on the same machine.

Our environment is being providing by a company purely using VMWare
infrastructure, the Solr index itself is on an NSF for which we get some
33MB/s throughput.

So, an easy solution would be to just put more resources into it, e.g. a
separate machine. But before I make the decision I'd like to find out
whether the app behaves properly under this circumstances or if its
possible to shorten the commit time down to a few seconds so the CPU is
not drained that long.

thanks for any pointers,

- Markus



That is certainly not a normal commit time for an index of that size.

Note that Solr 1.4 can have issues when working on NFS, but I don't know 
that it would have anything to do with this.


Are you using the simple lock factory rather than the default native 
lock factory? (as you should do when running on NFS)


--
- Mark

http://www.lucidimagination.com


Re: Problem with pdf, upgrading Cell

2010-05-03 Thread Grant Ingersoll
Little more info... Seems to be a classloading issue.  The tests pass, but they 
aren't loading the Tika libraries via the Solr ResourceLoader, whereas the 
example is.  Marc, one thing to try is to unjar the Solr WAR file and put the 
Tika libs in there, as I bet it will then work.  Note, however, I haven't tried 
this.

On May 3, 2010, at 6:24 PM, Grant Ingersoll wrote:

> I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track this.  
> It is indeed a bug somewhere (still investigating).  It seems that Tika is 
> now picking an EmptyParser implementation when trying to determine which 
> parser to use, despite the fact that it properly identifies the MIME Type.
> 
> -Grant
> 
> On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote:
> 
>> I'm investigating.
>> 
>> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote:
>> 
>>> 
>>> Hi,
>>> Grant, i confirm what Praveen has said, any PDF i try does not work with 
>>> the new Tika and SVN versions. :(
>>> Marc
>>> 
 From: sagar...@opentext.com
 To: solr-user@lucene.apache.org
 Date: Mon, 3 May 2010 13:05:24 +0530
 Subject: RE: Problem with pdf, upgrading Cell
 
 Hello,
 
 Please let me know if anybody figured out a way out of this issue. 
 
 Thanks,
 Sandhya
 
 -Original Message-
 From: Praveen Agrawal [mailto:pkal...@gmail.com] 
 Sent: Friday, April 30, 2010 11:14 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with pdf, upgrading Cell
 
 Grant,
 You can try any of the sample pdfs that come in /docs folder of Solr 1.4
 dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
 metadata i.e. stream_size, content_type apart from my own literals are
 indexed, and content is missing..
 
 
 On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll 
 wrote:
 
> Praveen and Marc,
> 
> Can you share the PDF (feel free to email my private email) that fails in
> Solr?
> 
> Thanks,
> Grant
> 
> 
> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
> 
>> 
>> Hi
>> Nope i didn't get it to work... Just like you, command line version of
> tika extracts correctly the content, but once included in Solr, no content
> is extracted.
>> What i tried until now is:- Updating the tika libraries inside Solr 1.4
> public version, no luck there.- Downloading the latest SVN version, 
> compiled
> it, and started from a simple schema, still no luck.- Getting other 
> versions
> compiled on hudson (nightly builds), and testing them also, still no
> extraction.
>> I sent a mail on the developpers mailing list but they told me i should
> just mail here, hope some developper reads this because it's quite an
> important feature of Solr and somehow it got broke between the 1.4 
> release,
> and the last version on the svn.
>> Marc
>> _
>> Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
> dans HOTMAIL !
>> http://www.windowslive.fr/hotmail/agregation/
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
>>>   
>>> _
>>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement sur 
>>> votre téléphone!
>>> http://www.messengersurvotremobile.com/?d=Hotmail
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene: 
>> http://www.lucidimagination.com/search
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene: 
> http://www.lucidimagination.com/search
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Problem with pdf, upgrading Cell

2010-05-03 Thread Grant Ingersoll
I've opened https://issues.apache.org/jira/browse/SOLR-1902 to track this.  It 
is indeed a bug somewhere (still investigating).  It seems that Tika is now 
picking an EmptyParser implementation when trying to determine which parser to 
use, despite the fact that it properly identifies the MIME Type.

-Grant

On May 3, 2010, at 5:36 PM, Grant Ingersoll wrote:

> I'm investigating.
> 
> On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote:
> 
>> 
>> Hi,
>> Grant, i confirm what Praveen has said, any PDF i try does not work with the 
>> new Tika and SVN versions. :(
>> Marc
>> 
>>> From: sagar...@opentext.com
>>> To: solr-user@lucene.apache.org
>>> Date: Mon, 3 May 2010 13:05:24 +0530
>>> Subject: RE: Problem with pdf, upgrading Cell
>>> 
>>> Hello,
>>> 
>>> Please let me know if anybody figured out a way out of this issue. 
>>> 
>>> Thanks,
>>> Sandhya
>>> 
>>> -Original Message-
>>> From: Praveen Agrawal [mailto:pkal...@gmail.com] 
>>> Sent: Friday, April 30, 2010 11:14 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Problem with pdf, upgrading Cell
>>> 
>>> Grant,
>>> You can try any of the sample pdfs that come in /docs folder of Solr 1.4
>>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
>>> metadata i.e. stream_size, content_type apart from my own literals are
>>> indexed, and content is missing..
>>> 
>>> 
>>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll wrote:
>>> 
 Praveen and Marc,
 
 Can you share the PDF (feel free to email my private email) that fails in
 Solr?
 
 Thanks,
 Grant
 
 
 On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
 
> 
> Hi
> Nope i didn't get it to work... Just like you, command line version of
 tika extracts correctly the content, but once included in Solr, no content
 is extracted.
> What i tried until now is:- Updating the tika libraries inside Solr 1.4
 public version, no luck there.- Downloading the latest SVN version, 
 compiled
 it, and started from a simple schema, still no luck.- Getting other 
 versions
 compiled on hudson (nightly builds), and testing them also, still no
 extraction.
> I sent a mail on the developpers mailing list but they told me i should
 just mail here, hope some developper reads this because it's quite an
 important feature of Solr and somehow it got broke between the 1.4 release,
 and the last version on the svn.
> Marc
> _
> Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
 dans HOTMAIL !
> http://www.windowslive.fr/hotmail/agregation/
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
>>
>> _
>> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement sur 
>> votre téléphone!
>> http://www.messengersurvotremobile.com/?d=Hotmail
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene: 
> http://www.lucidimagination.com/search
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Problem with pdf, upgrading Cell

2010-05-03 Thread Grant Ingersoll
I'm investigating.

On May 3, 2010, at 5:17 AM, Marc Ghorayeb wrote:

> 
> Hi,
> Grant, i confirm what Praveen has said, any PDF i try does not work with the 
> new Tika and SVN versions. :(
> Marc
> 
>> From: sagar...@opentext.com
>> To: solr-user@lucene.apache.org
>> Date: Mon, 3 May 2010 13:05:24 +0530
>> Subject: RE: Problem with pdf, upgrading Cell
>> 
>> Hello,
>> 
>> Please let me know if anybody figured out a way out of this issue. 
>> 
>> Thanks,
>> Sandhya
>> 
>> -Original Message-
>> From: Praveen Agrawal [mailto:pkal...@gmail.com] 
>> Sent: Friday, April 30, 2010 11:14 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Problem with pdf, upgrading Cell
>> 
>> Grant,
>> You can try any of the sample pdfs that come in /docs folder of Solr 1.4
>> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
>> metadata i.e. stream_size, content_type apart from my own literals are
>> indexed, and content is missing..
>> 
>> 
>> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll wrote:
>> 
>>> Praveen and Marc,
>>> 
>>> Can you share the PDF (feel free to email my private email) that fails in
>>> Solr?
>>> 
>>> Thanks,
>>> Grant
>>> 
>>> 
>>> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
>>> 
 
 Hi
 Nope i didn't get it to work... Just like you, command line version of
>>> tika extracts correctly the content, but once included in Solr, no content
>>> is extracted.
 What i tried until now is:- Updating the tika libraries inside Solr 1.4
>>> public version, no luck there.- Downloading the latest SVN version, compiled
>>> it, and started from a simple schema, still no luck.- Getting other versions
>>> compiled on hudson (nightly builds), and testing them also, still no
>>> extraction.
 I sent a mail on the developpers mailing list but they told me i should
>>> just mail here, hope some developper reads this because it's quite an
>>> important feature of Solr and somehow it got broke between the 1.4 release,
>>> and the last version on the svn.
 Marc
 _
 Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
>>> dans HOTMAIL !
 http://www.windowslive.fr/hotmail/agregation/
>>> 
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>> 
>>> Search the Lucene ecosystem using Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>> 
>>> 
> 
> _
> Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement sur 
> votre téléphone!
> http://www.messengersurvotremobile.com/?d=Hotmail

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Score cutoff

2010-05-03 Thread Satish Kumar
Hi,

Can someone give clues on how to implement this feature? This is a very
important requirement for us, so any help is greatly appreciated.


thanks!

On Tue, Apr 27, 2010 at 5:54 PM, Satish Kumar <
satish.kumar.just.d...@gmail.com> wrote:

> Hi,
>
> For some of our queries, the top xx (five or so) results are of very high
> quality and results after xx are very poor. The difference in score for the
> high quality and poor quality results is high. For example, 3.5 for high
> quality and 0.8 for poor quality. We want to exclude results with score
> value that is less than 60% or so of the first result. Is there a filter
> that does this? If not, can someone please give some hints on how to
> implement this (we want to do this as part of solr relevance ranking so that
> the facet counts, etc will be correct).
>
>
> Thanks,
> Satish
>


Re: cores and SWAP

2010-05-03 Thread Tim Heckman
I have 2 cores: core1 and core2.

Load the same data set into each and commit. Verify that searches
return the same for each core.

Delete a document (call it docA) from core2 but not from core1.

Commit and verify search results (docA disappears from core2's search
results. core1 continues to return the docA)

Swap cores.

Core2 should now return docA, but it doesn't until I reload core2.


thanks,
Tim


On Mon, May 3, 2010 at 1:41 PM, Shalin Shekhar Mangar
 wrote:
> On Mon, May 3, 2010 at 10:27 PM, Tim Heckman  wrote:
>
>> Hi, I'm trying to figure out whether I need to reload a core (or both
>> cores?) after performing a swap.
>>
>> When I perform a swap in my sandbox (non-production) environment, I am
>> seeing that one of the cores needs to be reloaded following a swap and
>> the other does not, but I haven't been able to find a pattern to which
>> one it will be.
>>
>>
> No, you should not need to reload any core after a swap. What is the
> behavior that you see?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Overlapping onDeckSearchers=2

2010-05-03 Thread Chris Hostetter
: When i run 2 -3 commits parallely  to diff instances or same instance I get
: this error
: 
: PERFORMANCE WARNING: Overlapping onDeckSearchers=2
: 
: What is the Best approach to solve this

http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F



-Hoss



Re: Overlapping onDeckSearchers=2

2010-05-03 Thread Shalin Shekhar Mangar
On Mon, May 3, 2010 at 11:24 AM, revas  wrote:

> Hello,
>
> We have a server with many solr  instances running  (around 40-50) .
>
> We are committing documents  ,sometimes one or sometimes around 200
> documents at  a time .to only one instance at a time
>
> When i run 2 -3 commits parallely  to diff instances or same instance I get
> this error
>
> PERFORMANCE WARNING: Overlapping onDeckSearchers=2
>
> What is the Best approach to solve this
>
>
You should see that warning only when you run multiple commits within a
short period of time to the same Solr instance. You will never see this
warning when performing commit on different instances. So, do you really
need to commit on the same instance in such short time? Can you batch
commits?

-- 
Regards,
Shalin Shekhar Mangar.


Re: cores and SWAP

2010-05-03 Thread Shalin Shekhar Mangar
On Mon, May 3, 2010 at 10:27 PM, Tim Heckman  wrote:

> Hi, I'm trying to figure out whether I need to reload a core (or both
> cores?) after performing a swap.
>
> When I perform a swap in my sandbox (non-production) environment, I am
> seeing that one of the cores needs to be reloaded following a swap and
> the other does not, but I haven't been able to find a pattern to which
> one it will be.
>
>
No, you should not need to reload any core after a swap. What is the
behavior that you see?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Auto-commit does not work

2010-05-03 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Chris Hostetter wrote:
> : right - and Solr should not swallow errors in the configuration :-)
> 
> If you have an error in a *known* config declaration, solr will complain 
> about it -- but solr can't complain just because you declare extra stuff 
> in your conig files that it doens't know anything about -- some other 
> plugin might care about it (or it might be there because you wanted 
> special syntax for your own documentation purposes)

I don't care about if this config is a core configuration or a
configuration of some plugin. Such kind of misconfiguration should be
handled properly - this is the minimum one can expect.

- -aj
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvfCQ4ACgkQCJIWIbr9KYz22QCfeVNIwJt0f5+XfnV1qvsZ0HJm
He0AoL5lIoEiUyhUINXpA2rcDB8bgsAy
=skdf
-END PGP SIGNATURE-


Re: Auto-commit does not work

2010-05-03 Thread Chris Hostetter

: right - and Solr should not swallow errors in the configuration :-)

If you have an error in a *known* config declaration, solr will complain 
about it -- but solr can't complain just because you declare extra stuff 
in your conig files that it doens't know anything about -- some other 
plugin might care about it (or it might be there because you wanted 
special syntax for your own documentation purposes)



-Hoss



Re: Auto-commit does not work

2010-05-03 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ahmet Arslan wrote:

> 
> I just realized that there is a typo in your autoCommit definition. The 
> letter C sould be capital.
> 
>  
>   1
>   1000 
>  
>

right - and Solr should not swallow errors in the configuration :-)

Andreas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvfB2wACgkQCJIWIbr9KYy+6ACZAXSFy8wHPWa8R+wzEUHhftEp
qw0An1GV6PAAdd4ezTKn4OF9WwROAUHP
=FeBy
-END PGP SIGNATURE-


Re: Auto-commit does not work

2010-05-03 Thread Ahmet Arslan


> commits : 135
> autocommits : 0
> optimizes : 0
> rollbacks : 0
> expungeDeletes : 0
> docsPending : 8842
> adds : 8842
> deletesById : 0
> deletesByQuery : 0
> errors : 0
> cumulative_adds : 8842
> cumulative_deletesById : 20390
> cumulative_deletesByQuery : 0
> cumulative_errors : 0

I just realized that there is a typo in your autoCommit definition. The letter 
C sould be capital.

 
  1
  1000 
 


  


cores and SWAP

2010-05-03 Thread Tim Heckman
Hi, I'm trying to figure out whether I need to reload a core (or both
cores?) after performing a swap.

When I perform a swap in my sandbox (non-production) environment, I am
seeing that one of the cores needs to be reloaded following a swap and
the other does not, but I haven't been able to find a pattern to which
one it will be.

http://wiki.apache.org/solr/CoreAdmin doesn't seem to cover this. Or,
maybe I'm doing something wrong.

Thanks for any help,
Tim


Re: Retrieving indexed field data

2010-05-03 Thread Praveen Agrawal
may be backup/restore data directory a workaround for you!!


On Mon, May 3, 2010 at 7:47 PM, Erick Erickson wrote:

> Ahhh, Nope, I'm clueless. This strikes me as a pretty hairy thing to
> do, but there's no built-in support that I know of for anything
> similar.
>
> Sorry I can't be more help
> Erick
>
> 2010/5/3 Licinio Fernández Maurelo 
>
> > Thanks for your response, Erik.
> >
> > Just want to "copy" indexing related info for fields indexed but not
> stored
> >  , *don't want to reconstruct the original field(s) value. *
> > *
> > *
> > Any help?* *
> > *
> > *
> > *
> > *
> > *
> > *
> > *
> > *
> > *
> > *
> >
> >
> >
> > 2010/5/3 Erick Erickson 
> >
> > > If you're asking if indexed but NOT stored data can be retrieved,
> > > i.e. if you can reconstruct the original field(s) from the indexed
> > > data alone, the answer is no. Or, rather, you can, kinda, but it's
> > > a lossy process.
> > >
> > > Consider stemming. If you indexed "running" using stemming,
> > > the term "run" is indexed. Lucene/SOLR has no record
> > > of the original term. Similarly with stopwords.
> > >
> > > But if you *store* the data, then the original can be retrieved.
> > >
> > > HTH
> > > Erick
> > >
> > > 2010/5/3 Licinio Fernández Maurelo 
> > >
> > > > Hi folks,
> > > >
> > > > i'm wondering if there is a way to retrieve the indexed data. The
> > reason
> > > is
> > > > that i'm working on a solrj-based tool that copies one index data
> into
> > > > other
> > > > (allowing you to perform changes in docs ). I know i can't perform
> any
> > > > change in an indexed field, just want to "copy" the chunk of bytes ..
> > > >
> > > > Am i missing something? Indexing generated data can't be retrieved
> > > anyway?
> > > >
> > > > Thanks in advance .
> > > >
> > > > --
> > > > Lici
> > > > ~Java Developer~
> > > >
> > >
> >
> >
> >
> > --
> > Lici
> > ~Java Developer~
> >
>


Re: Auto-commit does not work

2010-05-03 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ahmet Arslan wrote:

>>
>> I inserted 10k documents through a Python script (w/ solrpy
>> bindings)
>> without explict commit. However I do not see that the
>> "numDocs"
>> increased meanwhile...is there any way to hunt this down?
> 
> What does solr/admin/stats.jsp#update page says about autocommits and 
> docsPending?


commits : 135
autocommits : 0
optimizes : 0
rollbacks : 0
expungeDeletes : 0
docsPending : 8842
adds : 8842
deletesById : 0
deletesByQuery : 0
errors : 0
cumulative_adds : 8842
cumulative_deletesById : 20390
cumulative_deletesByQuery : 0
cumulative_errors : 0

Andreas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkve9pwACgkQCJIWIbr9KYwKhgCdFB39W6SitIeaL//ioXwWoC+n
4PkAn05TlCFYfJw3Vm/B20Dm7lyY/Qhm
=lWbr
-END PGP SIGNATURE-


Re: Auto-commit does not work

2010-05-03 Thread Ahmet Arslan

> Running Solr 1.4 with
> 
> 
> ?
> 
> ?
> 
> 1000
> 6
> 
> 
> 
> I inserted 10k documents through a Python script (w/ solrpy
> bindings)
> without explict commit. However I do not see that the
> "numDocs"
> increased meanwhile...is there any way to hunt this down?

What does solr/admin/stats.jsp#update page says about autocommits and 
docsPending?





Auto-commit does not work

2010-05-03 Thread Andreas Jung
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Running Solr 1.4 with


?

?

1000
6



I inserted 10k documents through a Python script (w/ solrpy bindings)
without explict commit. However I do not see that the "numDocs"
increased meanwhile...is there any way to hunt this down?

Andreas

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkve8MUACgkQCJIWIbr9KYwl6gCg6HxoSyjt9ccPFAohIKjEUzdZ
ZrEAoM7QmwP3GWVMSpQaCtABK/6K/6dK
=EiRk
-END PGP SIGNATURE-


Re: SpellChecking

2010-05-03 Thread Michael Kuhlmann
Am 03.05.2010 16:43, schrieb Jan Kammer:
> Hi,
> 
> It worked fine with a normal field. There must something wrong with
> copyfield, or why does dataimporthandler add/update no more documents?

Did you define your destination field as multivalue?

-Michael


Re: SpellChecking

2010-05-03 Thread Jan Kammer

Hi,

i build the index with ...&spellcheck.build=true
It worked fine with a normal field. There must something wrong with 
copyfield, or why does dataimporthandler add/update no more documents?


Can somebody paste the code for copyfield with many fields?

Greetz, Jan



Am 03.05.2010 16:36, schrieb Villemos, Gert:

We are using copy fields for 40+ fields to do spelling, and it works
fine.

Are you sure that you actually build the spell index before you try to
do spelling? You need to either configure SOLr to build spell index on
commit, or manually issue a spell index build request.

Regards,
Gert.





-Original Message-
From: Jan Kammer [mailto:jan.kam...@mni.fh-giessen.de]
Sent: Montag, 3. Mai 2010 16:26
To: solr-user@lucene.apache.org
Subject: Re: SpellChecking

Hi,

if I define one of my normal fields from schema.xml in solrconfig.xml
for spellchecking all works fine:

...

That didnt work, because nothing was in "spell" after that.

Next try was to copy each field in a line to "spell":



...
This does work up to 3 documents, if i define more, the count for failed

documents in dataimporthandler gets higher and higher the more i copy
into "spell".
16444

So my question is, if this is the right way to use the spellchecker with

many fields, or is there an other "better" way...

thanks.

greetz, Jan

Am 03.05.2010 16:08, schrieb Erick Erickson:
   

It would help a lot to see your actual config file, and if you
 

provided a
   

bit more
detail about what failure looks like

Best
Erick

On Mon, May 3, 2010 at 9:43 AM, Jan
 

Kammerwrote:
   


 

Hi there,

I want to enable spellchecking, but i got many fields.

I tried around with copyfield to copy all with "*" in one field, but
   

that
   

didnt work.
Next try was to copy some fields specified each by name in one field
   

named
   

"spell", but that worked only for 2 or 3 fields, but not for 10 or
   

more...
   

My question is, what the best practice is to enable spellchecking on
   

many
   

fields.

thanks.

greetz, Jan


   


 



Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.


   




RE: SpellChecking

2010-05-03 Thread Villemos, Gert
We are using copy fields for 40+ fields to do spelling, and it works
fine.

Are you sure that you actually build the spell index before you try to
do spelling? You need to either configure SOLr to build spell index on
commit, or manually issue a spell index build request.

Regards,
Gert.





-Original Message-
From: Jan Kammer [mailto:jan.kam...@mni.fh-giessen.de] 
Sent: Montag, 3. Mai 2010 16:26
To: solr-user@lucene.apache.org
Subject: Re: SpellChecking

Hi,

if I define one of my normal fields from schema.xml in solrconfig.xml 
for spellchecking all works fine:

...

That didnt work, because nothing was in "spell" after that.

Next try was to copy each field in a line to "spell":



...
This does work up to 3 documents, if i define more, the count for failed

documents in dataimporthandler gets higher and higher the more i copy 
into "spell".
16444

So my question is, if this is the right way to use the spellchecker with

many fields, or is there an other "better" way...

thanks.

greetz, Jan

Am 03.05.2010 16:08, schrieb Erick Erickson:
> It would help a lot to see your actual config file, and if you
provided a
> bit more
> detail about what failure looks like
>
> Best
> Erick
>
> On Mon, May 3, 2010 at 9:43 AM, Jan
Kammerwrote:
>
>
>> Hi there,
>>
>> I want to enable spellchecking, but i got many fields.
>>
>> I tried around with copyfield to copy all with "*" in one field, but
that
>> didnt work.
>> Next try was to copy some fields specified each by name in one field
named
>> "spell", but that worked only for 2 or 3 fields, but not for 10 or
more...
>>
>> My question is, what the best practice is to enable spellchecking on
many
>> fields.
>>
>> thanks.
>>
>> greetz, Jan
>>
>>  
>



Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.




Re: SpellChecking

2010-05-03 Thread Jan Kammer

Hi,

if I define one of my normal fields from schema.xml in solrconfig.xml 
for spellchecking all works fine:


...



That didnt work, because nothing was in "spell" after that.

Next try was to copy each field in a line to "spell":



...
This does work up to 3 documents, if i define more, the count for failed 
documents in dataimporthandler gets higher and higher the more i copy 
into "spell".

16444

So my question is, if this is the right way to use the spellchecker with 
many fields, or is there an other "better" way...


thanks.

greetz, Jan

Am 03.05.2010 16:08, schrieb Erick Erickson:

It would help a lot to see your actual config file, and if you provided a
bit more
detail about what failure looks like

Best
Erick

On Mon, May 3, 2010 at 9:43 AM, Jan Kammerwrote:

   

Hi there,

I want to enable spellchecking, but i got many fields.

I tried around with copyfield to copy all with "*" in one field, but that
didnt work.
Next try was to copy some fields specified each by name in one field named
"spell", but that worked only for 2 or 3 fields, but not for 10 or more...

My question is, what the best practice is to enable spellchecking on many
fields.

thanks.

greetz, Jan

 
   




Re: Retrieving indexed field data

2010-05-03 Thread Erick Erickson
Ahhh, Nope, I'm clueless. This strikes me as a pretty hairy thing to
do, but there's no built-in support that I know of for anything
similar.

Sorry I can't be more help
Erick

2010/5/3 Licinio Fernández Maurelo 

> Thanks for your response, Erik.
>
> Just want to "copy" indexing related info for fields indexed but not stored
>  , *don't want to reconstruct the original field(s) value. *
> *
> *
> Any help?* *
> *
> *
> *
> *
> *
> *
> *
> *
> *
> *
>
>
>
> 2010/5/3 Erick Erickson 
>
> > If you're asking if indexed but NOT stored data can be retrieved,
> > i.e. if you can reconstruct the original field(s) from the indexed
> > data alone, the answer is no. Or, rather, you can, kinda, but it's
> > a lossy process.
> >
> > Consider stemming. If you indexed "running" using stemming,
> > the term "run" is indexed. Lucene/SOLR has no record
> > of the original term. Similarly with stopwords.
> >
> > But if you *store* the data, then the original can be retrieved.
> >
> > HTH
> > Erick
> >
> > 2010/5/3 Licinio Fernández Maurelo 
> >
> > > Hi folks,
> > >
> > > i'm wondering if there is a way to retrieve the indexed data. The
> reason
> > is
> > > that i'm working on a solrj-based tool that copies one index data into
> > > other
> > > (allowing you to perform changes in docs ). I know i can't perform any
> > > change in an indexed field, just want to "copy" the chunk of bytes ..
> > >
> > > Am i missing something? Indexing generated data can't be retrieved
> > anyway?
> > >
> > > Thanks in advance .
> > >
> > > --
> > > Lici
> > > ~Java Developer~
> > >
> >
>
>
>
> --
> Lici
> ~Java Developer~
>


Re: Retrieving indexed field data

2010-05-03 Thread Licinio Fernández Maurelo
Thanks for your response, Erik.

Just want to "copy" indexing related info for fields indexed but not stored
 , *don't want to reconstruct the original field(s) value. *
*
*
Any help?* *
*
*
*
*
*
*
*
*
*
*



2010/5/3 Erick Erickson 

> If you're asking if indexed but NOT stored data can be retrieved,
> i.e. if you can reconstruct the original field(s) from the indexed
> data alone, the answer is no. Or, rather, you can, kinda, but it's
> a lossy process.
>
> Consider stemming. If you indexed "running" using stemming,
> the term "run" is indexed. Lucene/SOLR has no record
> of the original term. Similarly with stopwords.
>
> But if you *store* the data, then the original can be retrieved.
>
> HTH
> Erick
>
> 2010/5/3 Licinio Fernández Maurelo 
>
> > Hi folks,
> >
> > i'm wondering if there is a way to retrieve the indexed data. The reason
> is
> > that i'm working on a solrj-based tool that copies one index data into
> > other
> > (allowing you to perform changes in docs ). I know i can't perform any
> > change in an indexed field, just want to "copy" the chunk of bytes ..
> >
> > Am i missing something? Indexing generated data can't be retrieved
> anyway?
> >
> > Thanks in advance .
> >
> > --
> > Lici
> > ~Java Developer~
> >
>



-- 
Lici
~Java Developer~


Re: SpellChecking

2010-05-03 Thread Erick Erickson
It would help a lot to see your actual config file, and if you provided a
bit more
detail about what failure looks like

Best
Erick

On Mon, May 3, 2010 at 9:43 AM, Jan Kammer wrote:

> Hi there,
>
> I want to enable spellchecking, but i got many fields.
>
> I tried around with copyfield to copy all with "*" in one field, but that
> didnt work.
> Next try was to copy some fields specified each by name in one field named
> "spell", but that worked only for 2 or 3 fields, but not for 10 or more...
>
> My question is, what the best practice is to enable spellchecking on many
> fields.
>
> thanks.
>
> greetz, Jan
>


Re: run on reboot on windows

2010-05-03 Thread Dave Searle
I don't think jetty can be installed as a service. You'd need to  
create a bat file and put that in the win startup registry.

Sent from my iPhone

On 3 May 2010, at 11:26, "Frederico Azeiteiro"  wrote:

> Hi Ahmed,
>
> I need to achieve that also. Do you manage to install it as service  
> and
> start solr with Jetty?
> After installing and start jetty as service how do you start solr?
>
> Thanks,
> Frederico
>
> -Original Message-
> From: S Ahmed [mailto:sahmed1...@gmail.com]
> Sent: segunda-feira, 3 de Maio de 2010 01:05
> To: solr-user@lucene.apache.org
> Subject: Re: run on reboot on windows
>
> Thanks, for some reason I was looking for a solution outside of
> jetty/tomcat, when that was the obvious way to get things restarted :)
>
> On Sun, May 2, 2010 at 7:53 PM, Dave Searle
> wrote:
>
>> Tomcat is installed as a service on windows. Just go into service
>> control panel and set startup type to automatic
>>
>> Sent from my iPhone
>>
>> On 3 May 2010, at 00:43, "S Ahmed"  wrote:
>>
>>> its not tomcat/jetty that's the issue, its how to get things to re-
>>> start on
>>> a windows server (tomcat and jetty don't run as native windows
>>> services) so
>>> I am a little confused..thanks.
>>>
>>> On Sun, May 2, 2010 at 7:37 PM, caman
>>> wrote:
>>>

 Ahmed,



 Best is if you take a look at the documentation of jetty or tomcat.
 SOLR
 can
 run on any web container, it's up to you how you  configure your
> web
 container to run



 Thanks

 Aboxy











 From: S Ahmed [via Lucene]

> [mailto:ml-node+772174-2097041460-124...@n3.nabble.com %2B772174-
> 2097041460-124...@n3.nabble.com>
>> >>> %2b772174-2097041460-124...@n3.nabble.com>
 ]
 Sent: Sunday, May 02, 2010 4:33 PM
 To: caman
 Subject: Re: run on reboot on windows



 By default it uses Jetty, so your saying Tomcat on windows server
 2008/
 IIS7

 runs as a native windows service?

 On Sun, May 2, 2010 at 12:46 AM, Dave Searle <[hidden email]>wrote:


> Set tomcat6 service to auto start on boot (if running tomat)
>
> Sent from my iPhone
>
> On 2 May 2010, at 02:31, "S Ahmed" <[hidden email]> wrote:
>
>> Hi,
>>
>> I'm trying to get Solr to run on windows, such that if it reboots
>> the Solr
>> service will be running.
>>
>> How can I do this?
>



 _

 View message @


>>
> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772
> 174
 .
 html
 To start a new topic under Solr - User, email

> ml-node+472068-464289649-124...@n3.nabble.com %2B472068-464289649
> -124...@n3.nabble.com>
>> >>> %2b472068-464289649-124...@n3.nabble.com>
 To unsubscribe from Solr - User, click
 < (link removed)
 GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.




 --
 View this message in context:

>>
> http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772
> 178.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>


Re: Retrieving indexed field data

2010-05-03 Thread Erick Erickson
If you're asking if indexed but NOT stored data can be retrieved,
i.e. if you can reconstruct the original field(s) from the indexed
data alone, the answer is no. Or, rather, you can, kinda, but it's
a lossy process.

Consider stemming. If you indexed "running" using stemming,
the term "run" is indexed. Lucene/SOLR has no record
of the original term. Similarly with stopwords.

But if you *store* the data, then the original can be retrieved.

HTH
Erick

2010/5/3 Licinio Fernández Maurelo 

> Hi folks,
>
> i'm wondering if there is a way to retrieve the indexed data. The reason is
> that i'm working on a solrj-based tool that copies one index data into
> other
> (allowing you to perform changes in docs ). I know i can't perform any
> change in an indexed field, just want to "copy" the chunk of bytes ..
>
> Am i missing something? Indexing generated data can't be retrieved anyway?
>
> Thanks in advance .
>
> --
> Lici
> ~Java Developer~
>


Re: Skipping duplicates in DataImportHandler based on uniqueKey

2010-05-03 Thread Andrew Clegg


Marc Sturlese wrote:
> 
> You can use deduplication to do that. Create the signature based on the
> unique field or any field you want.
> 

Cool, thanks, I hadn't thought of that.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p773268.html
Sent from the Solr - User mailing list archive at Nabble.com.


SpellChecking

2010-05-03 Thread Jan Kammer

Hi there,

I want to enable spellchecking, but i got many fields.

I tried around with copyfield to copy all with "*" in one field, but 
that didnt work.
Next try was to copy some fields specified each by name in one field 
named "spell", but that worked only for 2 or 3 fields, but not for 10 or 
more...


My question is, what the best practice is to enable spellchecking on 
many fields.


thanks.

greetz, Jan


Re: OutOfMemoryError when using query with sort

2010-05-03 Thread Erick Erickson
How many unique terms are in your sort field?

On Sun, May 2, 2010 at 11:48 PM, Hamid Vahedi  wrote:

> I install 64 bit windows and my problem solved. also i using shard mode
> (100 M doc per machine with one solr instance)
> is there better solution? because i insert at least 5M doc per day
>
>
>
>
> 
> From: Koji Sekiguchi 
> To: solr-user@lucene.apache.org
> Sent: Sun, May 2, 2010 9:08:42 PM
> Subject: Re: OutOfMemoryError when using query with sort
>
> Hamid Vahedi wrote:
> > Hi, i using solr that running on windows server 2008 32-bit.
> > I add about 100 million article into solr without set store attribute.
> (only store document id) (index file size about 164 GB)
> > when try to get query without sort , it's return doc ids in some ms, but
> when add sort command, i get below error:
> >
> > TTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap
> space at
> Since sort uses FieldCache and it consumes memory, you got OOM.
> I think 100M docs/164GB index is considerable large for 32 bit machine.
> Why don't you use distributed search?
>
> Koji
>
> -- http://www.rondhuit.com/en/
>
>
>
>


Commit takes 1 to 2 minutes, CPU usage affects other apps

2010-05-03 Thread Markus Fischer

Hi,

we recently began having trouble with our Solr 1.4 instance. We've about 
850k documents in the index which is about 1.2GB in size; the JVM which 
runs tomcat/solr (no other apps are deployed) has been given 2GB.


We've a forum and run a process every minute which indexes the new 
messages. The number of messages updated are from 0 to 20 messages 
average. The commit takes about 1 or two minutes, but usually when it 
finished a few seconds later the next batch of documents is processed 
and the story starts again.


So actually it's like Solr is running commits all day long and CPU usage 
ranges from 80% to 120%.


This continuous CPU usage caused ill effects on other services running 
on the same machine.


Our environment is being providing by a company purely using VMWare 
infrastructure, the Solr index itself is on an NSF for which we get some 
33MB/s throughput.


So, an easy solution would be to just put more resources into it, e.g. a 
separate machine. But before I make the decision I'd like to find out 
whether the app behaves properly under this circumstances or if its 
possible to shorten the commit time down to a few seconds so the CPU is 
not drained that long.


thanks for any pointers,

- Markus



Retrieving indexed field data

2010-05-03 Thread Licinio Fernández Maurelo
Hi folks,

i'm wondering if there is a way to retrieve the indexed data. The reason is
that i'm working on a solrj-based tool that copies one index data into other
(allowing you to perform changes in docs ). I know i can't perform any
change in an indexed field, just want to "copy" the chunk of bytes ..

Am i missing something? Indexing generated data can't be retrieved anyway?

Thanks in advance .

-- 
Lici
~Java Developer~


Re: synonym filter problem for string or phrase

2010-05-03 Thread MitchK

Just for clear terminology: You mean field, not fieldType. FieldType is the
definition of tokenizers, filters etc..
You apply a fieldType on a field. And you query against a field, not against
a whole fieldType. :-)

Kind regards
- Mitch


Marco Martinez-2 wrote:
> 
> Hi Ranveer,
> 
> If you don't specify a field type in the q parameter, the search will be
> done searching in your default search field defined in the solrconfig.xml,
> its your default field a text_sync field?
> 
> Regards,
> 
> Marco Martínez Bautista
> http://www.paradigmatecnologico.com
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón
> Tel.: 91 352 59 42
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/synonym-filter-problem-for-string-or-phrase-tp765242p773083.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: run on reboot on windows

2010-05-03 Thread Frederico Azeiteiro
Hi Ahmed,

I need to achieve that also. Do you manage to install it as service and
start solr with Jetty?
After installing and start jetty as service how do you start solr?

Thanks,
Frederico

-Original Message-
From: S Ahmed [mailto:sahmed1...@gmail.com] 
Sent: segunda-feira, 3 de Maio de 2010 01:05
To: solr-user@lucene.apache.org
Subject: Re: run on reboot on windows

Thanks, for some reason I was looking for a solution outside of
jetty/tomcat, when that was the obvious way to get things restarted :)

On Sun, May 2, 2010 at 7:53 PM, Dave Searle
wrote:

> Tomcat is installed as a service on windows. Just go into service
> control panel and set startup type to automatic
>
> Sent from my iPhone
>
> On 3 May 2010, at 00:43, "S Ahmed"  wrote:
>
> > its not tomcat/jetty that's the issue, its how to get things to re-
> > start on
> > a windows server (tomcat and jetty don't run as native windows
> > services) so
> > I am a little confused..thanks.
> >
> > On Sun, May 2, 2010 at 7:37 PM, caman
> > wrote:
> >
> >>
> >> Ahmed,
> >>
> >>
> >>
> >> Best is if you take a look at the documentation of jetty or tomcat.
> >> SOLR
> >> can
> >> run on any web container, it's up to you how you  configure your
web
> >> container to run
> >>
> >>
> >>
> >> Thanks
> >>
> >> Aboxy
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> From: S Ahmed [via Lucene]
> >>
[mailto:ml-node+772174-2097041460-124...@n3.nabble.com
>  >> %2b772174-2097041460-124...@n3.nabble.com>
> >> ]
> >> Sent: Sunday, May 02, 2010 4:33 PM
> >> To: caman
> >> Subject: Re: run on reboot on windows
> >>
> >>
> >>
> >> By default it uses Jetty, so your saying Tomcat on windows server
> >> 2008/
> >> IIS7
> >>
> >> runs as a native windows service?
> >>
> >> On Sun, May 2, 2010 at 12:46 AM, Dave Searle <[hidden email]>wrote:
> >>
> >>
> >>> Set tomcat6 service to auto start on boot (if running tomat)
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On 2 May 2010, at 02:31, "S Ahmed" <[hidden email]> wrote:
> >>>
>  Hi,
> 
>  I'm trying to get Solr to run on windows, such that if it reboots
>  the Solr
>  service will be running.
> 
>  How can I do this?
> >>>
> >>
> >>
> >>
> >>  _
> >>
> >> View message @
> >>
> >>
>
http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772
174
> >> .
> >> html
> >> To start a new topic under Solr - User, email
> >>
ml-node+472068-464289649-124...@n3.nabble.com
>  >> %2b472068-464289649-124...@n3.nabble.com>
> >> To unsubscribe from Solr - User, click
> >> < (link removed)
> >> GZvcnRoZW90aGVyc3R1ZmZAZ21haWwuY29tfDQ3MjA2OHwtOTM0OTI1NzEx>  here.
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
>
http://lucene.472066.n3.nabble.com/run-on-reboot-on-windows-tp770892p772
178.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>


RE: Problem with pdf, upgrading Cell

2010-05-03 Thread Marc Ghorayeb

Hi,
Grant, i confirm what Praveen has said, any PDF i try does not work with the 
new Tika and SVN versions. :(
Marc

> From: sagar...@opentext.com
> To: solr-user@lucene.apache.org
> Date: Mon, 3 May 2010 13:05:24 +0530
> Subject: RE: Problem with pdf, upgrading Cell
> 
> Hello,
> 
> Please let me know if anybody figured out a way out of this issue. 
> 
> Thanks,
> Sandhya
> 
> -Original Message-
> From: Praveen Agrawal [mailto:pkal...@gmail.com] 
> Sent: Friday, April 30, 2010 11:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Problem with pdf, upgrading Cell
> 
> Grant,
> You can try any of the sample pdfs that come in /docs folder of Solr 1.4
> dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
> metadata i.e. stream_size, content_type apart from my own literals are
> indexed, and content is missing..
> 
> 
> On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll wrote:
> 
> > Praveen and Marc,
> >
> > Can you share the PDF (feel free to email my private email) that fails in
> > Solr?
> >
> > Thanks,
> > Grant
> >
> >
> > On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
> >
> > >
> > > Hi
> > > Nope i didn't get it to work... Just like you, command line version of
> > tika extracts correctly the content, but once included in Solr, no content
> > is extracted.
> > > What i tried until now is:- Updating the tika libraries inside Solr 1.4
> > public version, no luck there.- Downloading the latest SVN version, compiled
> > it, and started from a simple schema, still no luck.- Getting other versions
> > compiled on hudson (nightly builds), and testing them also, still no
> > extraction.
> > > I sent a mail on the developpers mailing list but they told me i should
> > just mail here, hope some developper reads this because it's quite an
> > important feature of Solr and somehow it got broke between the 1.4 release,
> > and the last version on the svn.
> > > Marc
> > > _
> > > Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
> > dans HOTMAIL !
> > > http://www.windowslive.fr/hotmail/agregation/
> >
> > --
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem using Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
  
_
Hotmail et MSN dans la poche? HOTMAIL et MSN sont dispo gratuitement sur votre 
téléphone!
http://www.messengersurvotremobile.com/?d=Hotmail

Re: Skipping duplicates in DataImportHandler based on uniqueKey

2010-05-03 Thread Marc Sturlese

You can use deduplication to do that. Create the signature based on the
unique field or any field you want.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p772768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: synonym filter problem for string or phrase

2010-05-03 Thread Marco Martinez
Hi Ranveer,

I don't see any stemming analyzer in your configuration of the field
'text_sync', also you have  at
query time and not at index time, maybe that is your problem.


Regards,


Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/4/30 Jonty Rhods 

> On 4/29/10 8:50 PM, Marco Martinez wrote:
>
> Hi Ranveer,
>
> If you don't specify a field type in the q parameter, the search will be
> done searching in your default search field defined in the solrconfig.xml,
> its your default field a text_sync field?
>
> Regards,
>
> Marco Martínez Bautista
> http://www.paradigmatecnologico.com
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón
> Tel.: 91 352 59 42
>
>
> 2010/4/29 Ranveer 
>
>
>
> Hi,
>
> I am trying to configure synonym filter.
> my requirement is:
> when user searching by phrase like "what is solr user?" then it should be
> replace with "solr user".
> something like : what is solr user? =>  solr user
>
> My schema for particular field is:
>
>  positionIncrementGap="100">
> 
> 
> 
>
> 
> 
> 
> 
> 
>  ignoreCase="true" expand="true"
> tokenizerFactory="KeywordTokenizerFactory"/>
>
> 
> 
>
> it seems working fine while trying by analysis.jsp but not by url
> http://localhost:8080/solr/core0/select?q="what is solr user?"
> or
> http://localhost:8080/solr/core0/select?q=what is solr user?
>
> Please guide me for achieve desire result.
>
>
>
>
>
>
> Hi Marco,
> thanks.
> yes my default search field is text_sync.
> I am getting result now but not as I expect.
> following is my synonym.txt
>
> what is bone cancer=>bone cancer
> what is bone cancer?=>bone cancer
> what is of bone cancer=>bone cancer
> what is symptom of bone cancer=>bone cancer
> what is symptoms of bone cancer=>bone cancer
>
> in above I am getting result of all synonym but not the last one "what is
> symptoms of bone cancer=>bone cancer".
> I think due to stemming I am not getting expected result. However when I am
> checking result from the analysis.jsp,
> its giving expected result. I am confused..
> Also I want to know best approach to configure synonym for my requirement.
>
> thanks
> with regards
>
> Hi,
>
> I am also facing same type of problem..
> I am Newbie please help.
>
> thanks
> Jonty
>


Re: phrase search - problem

2010-05-03 Thread Ahmet Arslan
> I wanted to do phrase search.  What are the analyzers
> that best suited for phrase search.  I tried with
> "textgen", but it did not yield the expected results.
> 
> I wanted to index:
> 
> my dear friend
> 
> If I search for "dear friend", I should get the result and
> if I search for "friend dear" I should not get any records.
> 

Default PhraseQuery is unordered. "dear friend" returns documents containing 
"friend dear". It is not about Analyzer but QueryParser. So you want ordered 
phrase queries. With SOLR-1604 you can accomplish what you want. It constructs 
ordered SpanNearQuery instead of PhraseQuery.

For example features:"stick memory" returns this snippet:

SmartMedia, Memory Stick, Memory Stick Pro, SD Card

http://localhost:8983/solr/select/?q=features:%22stick%20memory%22&version=2.2&start=0&rows=10&indent=on&defType=complexphrase&debugQuery=on&hl=true&hl.fl=features

https://issues.apache.org/jira/browse/SOLR-1604





RE: Problem with pdf, upgrading Cell

2010-05-03 Thread Sandhya Agarwal
Hello,

Please let me know if anybody figured out a way out of this issue. 

Thanks,
Sandhya

-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com] 
Sent: Friday, April 30, 2010 11:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell

Grant,
You can try any of the sample pdfs that come in /docs folder of Solr 1.4
dist'n. I had tried 'Installing Solr in Tomcat.pdf', 'index.pdf' etc. Only
metadata i.e. stream_size, content_type apart from my own literals are
indexed, and content is missing..


On Fri, Apr 30, 2010 at 8:52 PM, Grant Ingersoll wrote:

> Praveen and Marc,
>
> Can you share the PDF (feel free to email my private email) that fails in
> Solr?
>
> Thanks,
> Grant
>
>
> On Apr 30, 2010, at 7:55 AM, Marc Ghorayeb wrote:
>
> >
> > Hi
> > Nope i didn't get it to work... Just like you, command line version of
> tika extracts correctly the content, but once included in Solr, no content
> is extracted.
> > What i tried until now is:- Updating the tika libraries inside Solr 1.4
> public version, no luck there.- Downloading the latest SVN version, compiled
> it, and started from a simple schema, still no luck.- Getting other versions
> compiled on hudson (nightly builds), and testing them also, still no
> extraction.
> > I sent a mail on the developpers mailing list but they told me i should
> just mail here, hope some developper reads this because it's quite an
> important feature of Solr and somehow it got broke between the 1.4 release,
> and the last version on the svn.
> > Marc
> > _
> > Consultez gratuitement vos emails Orange, Gmail, Free, ... directement
> dans HOTMAIL !
> > http://www.windowslive.fr/hotmail/agregation/
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>