solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread sunnyfr

Hi,

I'm using solr1.3 mysql and tomcat55, can you please help to sort this out?
How can I index data in UTF8 ? I tried to add the parameter encoding="UTF-8"
in the datasource in data-config.xml.

| character_set_client| latin1  
 
| character_set_connection| latin1  
  
But data are stored in UTF8 inside database, not very logic but I can't
change it.

But still doesn't work, Help would be more than welcome,
Thanks
-- 
View this message in context: 
http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105301p20105301.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Out of Memory Errors

2008-10-22 Thread Nick Jenkin
Have you confirmed Java's -Xmx setting? (Max memory)

e.g. java -Xmx2000MB -jar start.jar
-Nick

On Wed, Oct 22, 2008 at 3:24 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> How much RAM in the box total? How many sort fields and what types? Sorts on
> each core?
>
> Willie Wong wrote:
>>
>> Hello,
>>
>> I've been having issues with out of memory errors on searches in Solr. I
>> was wondering if I'm hitting a limit with solr or if I've configured
>> something seriously wrong.
>>
>> Solr Setup
>> - 3 cores - 3163615 documents each
>> - 10 GB size
>> - approx 10 fields
>> - document sizes vary from a few kb to a few MB
>> - no faceting is used however the search query can be fairly complex with
>> 8 or more fields being searched on at once
>>
>> Environment:
>> - windows 2003
>> - 2.8 GHz zeon processor
>> - 1.5 GB memory assigned to solr
>> - Jetty 6 server
>>
>> Once we get to around a few  concurrent users OOM start occuring and Jetty
>> restarts.  Would this just be a case of more memory or are there certain
>> configuration settings that need to be set?  We're using an out of the box
>> Solr 1.3 beta version.
>> A few of the things we considered that might help:
>> - Removing sorts on the result sets (result sets are approx 40,000 +
>> documents)
>> - Reducing cache sizes such as the queryResultMaxDocsCached setting,
>> document cache, queryResultCache, filterCache, etc
>>
>> Am I missing anything else that should be looked at, or is it time to
>> simply increase the memory/start looking at distributing the indexes?  Any
>> help would be much appreciated.
>>
>>
>> Regards,
>>
>> WW
>>
>>
>
>


Re: error with delta import

2008-10-22 Thread Shalin Shekhar Mangar
Actually, most XML parsers don't require you to escape such characters in
attributes. You are welcome to try this out, just look at the example-DIH :)

On Tue, Oct 21, 2008 at 11:11 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote:

> Wow, I really should read more closely before I respond - I see now, Noble,
> that you were talking about DIH's ability to parse escaped '<'s in attribute
> values, rather than about whether '<' was an acceptable character in
> attribute values.
>
> I should repurpose my remarks to note to Shalin, though, that all
> (conformant) XML parsers have to be able to handle escaped '<'s in attribute
> values, since an XML document with a '<' in an attribute value is not
> well-formed.
>
> Steve
>
> On 10/21/2008 at 1:10 PM, Steven A Rowe wrote:
> > On 10/21/2008 at 12:14 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> > > On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar
> > <[EMAIL PROTECTED]> wrote:
> > > > Your data-config looks fine except for one thing -- you do not need
> to
> > > > escape '<' character in an XML attribute. It maybe throwing off the
> > > > parsing code in DataImportHandler.
> > >
> > > not really '<' is fine in attribute
> >
> > Noble, I think you're wrong - AFAICT from the XML spec., '<' is *not*
> > fine in an attribute value - from
> > :
> >
> >   [10]  AttValue ::= '"' ([^<&"] | Reference)* '"'
> >  |   "'" ([^<&'] | Reference)* "'"
> >
> > where an attribute  is:
> >
> >   [41] Attribute ::= Name Eq AttValue
> >
> > Steve
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread Shalin Shekhar Mangar
Hi,

The best way to manage international characters is to keep everything in
UTF-8. Otherwise it will be difficult to figure out the source of the
problem.

1. Make sure the program which writes data into MySQL is using UTF-8
2. Make sure the MySQL tables are using UTF-8.
3. Make sure MySQL client connections use UTF-8 by default
4. If the SQL written in your data-config has international characters,
start Solr with "-Dfile.encoding=UTF-8" as a command line parameter

http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

I don't think there is any easy way to go about this. You may need to
revisit all the parts of your system.

On Wed, Oct 22, 2008 at 12:52 PM, sunnyfr <[EMAIL PROTECTED]> wrote:

>
> Hi,
>
> I'm using solr1.3 mysql and tomcat55, can you please help to sort this out?
> How can I index data in UTF8 ? I tried to add the parameter
> encoding="UTF-8"
> in the datasource in data-config.xml.
>
> | character_set_client| latin1
> | character_set_connection| latin1
> But data are stored in UTF8 inside database, not very logic but I can't
> change it.
>
> But still doesn't work, Help would be more than welcome,
> Thanks
> --
> View this message in context:
> http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105301p20105301.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread sunnyfr

I've a function to clear up string which are in latin1 to UTF8, I would like
to know where exactly should I put it in the java code to clear up string
before indexing ?

Thanks a lot for this information,
Sunny

I'm using solr1.3, mysql, tomcat55
-- 
View this message in context: 
http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20106224.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread sunnyfr

Hi Shalin 
Thanks for your answer but it doesn't work just with Dfile.encoding 
I was hoping it could work.

I definitely can't change the database so I guess I must change java code.
I've a function to change latin-1 string to utf8  but I don't know really
where should I put it?

Thanks for your answer,


Shalin Shekhar Mangar wrote:
> 
> Hi,
> 
> The best way to manage international characters is to keep everything in
> UTF-8. Otherwise it will be difficult to figure out the source of the
> problem.
> 
> 1. Make sure the program which writes data into MySQL is using UTF-8
> 2. Make sure the MySQL tables are using UTF-8.
> 3. Make sure MySQL client connections use UTF-8 by default
> 4. If the SQL written in your data-config has international characters,
> start Solr with "-Dfile.encoding=UTF-8" as a command line parameter
> 
> http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
> 
> I don't think there is any easy way to go about this. You may need to
> revisit all the parts of your system.
> 
> On Wed, Oct 22, 2008 at 12:52 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
> 
>>
>> Hi,
>>
>> I'm using solr1.3 mysql and tomcat55, can you please help to sort this
>> out?
>> How can I index data in UTF8 ? I tried to add the parameter
>> encoding="UTF-8"
>> in the datasource in data-config.xml.
>>
>> | character_set_client| latin1
>> | character_set_connection| latin1
>> But data are stored in UTF8 inside database, not very logic but I can't
>> change it.
>>
>> But still doesn't work, Help would be more than welcome,
>> Thanks
>> --
>> View this message in context:
>> http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105301p20105301.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105342p20106791.html
Sent from the Solr - User mailing list archive at Nabble.com.



Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread jayson.minard

I am seeing odd behavior where a query such as:

http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc

works until I add q.op=AND

http://localhost:8983/solr/select/?q=moss&q.op=AND&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc

which then causes 0 results.  There is only one term in the q parameter, and
the fq parameter I would think would be unaffected, and both of its terms
are there anyway although in a String field and not a tokenized way (so
maybe it is inserting an AND between Fancy AND Doc which isn't matching the
untokenized string anymore?)

Is there a way to apply q.op to q and not fq at the same time; if that is
indeed the problem?

Cheers!
-- Jayson
-- 
View this message in context: 
http://www.nabble.com/Odd-q.op%3DAND-and-fq-interactions-in-Solr-1.3.0-tp20106953p20106953.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Sorting performance

2008-10-22 Thread Beniamin Janicki
:so you can send your updates anytime you want, and as long as you only 
:commit every 5 minutes (or commit on a master as often as you want, but 
:only run snappuller/snapinstaller on your slaves every 5 minutes) your 
:results will be at most 5minutes + warming time stale.

This is what I do as well ( commits are done once per 5 minutes ). I've got
master - slave configuration. Master has turned off all caches (commented in
solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
,Xmx= 1GB and committing takes around 10 secs ( on default configuration
with warming it took from 30 mins up to 2 hours). 

Slave caches are configured to have autowarmCount="0" and
maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
done. I haven't noticed any huge delays while serving search request.
Try to use those values - may be they'll help in your case too.

Ben Janicki


-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: 22 October 2008 04:56
To: solr-user@lucene.apache.org
Subject: Re: Sorting performance


: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache "could" be acceptable if I do it a few
times
: per day. But not on a too regular basis (right now, the first query that
loads
: the cache takes 150s).
: 
: However: I'm not sure why it looks not to be a good idea to update the
caches

you can refresh the caches automaticly after updating, the "newSearcher" 
event is fired whenever a searcher is opened (but before it's used by 
clients) so you can configure warming queries for it -- it doesn't have to 
be done manually (or by the first user to use that reader)

so you can send your updates anytime you want, and as long as you only 
commit every 5 minutes (or commit on a master as often as you want, but 
only run snappuller/snapinstaller on your slaves every 5 minutes) your 
results will be at most 5minutes + warming time stale.


-Hoss



Re: solr 1.3 database connection latin1/stored utf8 in mysql?

2008-10-22 Thread Jérôme Etévé
Hi,

  See
   http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html
  and
   
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String)

  Also note that you cannot transform a latin1 string in a utf-8
string. What you can do
is to decode a latin1 octet array to a String (java uses its own
internal representation for String which you shouldn't even know
about), and you can encode a String to an utf-8 bytes array.

Cheers.

J.


On Wed, Oct 22, 2008 at 10:11 AM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> Hi Shalin
> Thanks for your answer but it doesn't work just with Dfile.encoding
> I was hoping it could work.
>
> I definitely can't change the database so I guess I must change java code.
> I've a function to change latin-1 string to utf8  but I don't know really
> where should I put it?
>
> Thanks for your answer,
>
>
> Shalin Shekhar Mangar wrote:
>>
>> Hi,
>>
>> The best way to manage international characters is to keep everything in
>> UTF-8. Otherwise it will be difficult to figure out the source of the
>> problem.
>>
>> 1. Make sure the program which writes data into MySQL is using UTF-8
>> 2. Make sure the MySQL tables are using UTF-8.
>> 3. Make sure MySQL client connections use UTF-8 by default
>> 4. If the SQL written in your data-config has international characters,
>> start Solr with "-Dfile.encoding=UTF-8" as a command line parameter
>>
>> http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
>>
>> I don't think there is any easy way to go about this. You may need to
>> revisit all the parts of your system.
>>
>> On Wed, Oct 22, 2008 at 12:52 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Hi,
>>>
>>> I'm using solr1.3 mysql and tomcat55, can you please help to sort this
>>> out?
>>> How can I index data in UTF8 ? I tried to add the parameter
>>> encoding="UTF-8"
>>> in the datasource in data-config.xml.
>>>
>>> | character_set_client| latin1
>>> | character_set_connection| latin1
>>> But data are stored in UTF8 inside database, not very logic but I can't
>>> change it.
>>>
>>> But still doesn't work, Help would be more than welcome,
>>> Thanks
>>> --
>>> View this message in context:
>>> http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105301p20105301.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/solr-1.3-database-connection-latin1-stored-utf8-in-mysql--tp20105342p20106791.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

[EMAIL PROTECTED]


Re: Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread jayson.minard

BY the way, the fq parameter is being used to apply a facet value as a
refinement which is why it is not tokenized and is a string.


jayson.minard wrote:
> 
> I am seeing odd behavior where a query such as:
> 
> http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
> 
> works until I add q.op=AND
> 
> http://localhost:8983/solr/select/?q=moss&q.op=AND&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
> 
> which then causes 0 results.  There is only one term in the q parameter,
> and the fq parameter I would think would be unaffected, and both of its
> terms are there anyway although in a String field and not a tokenized way
> (so maybe it is inserting an AND between Fancy AND Doc which isn't
> matching the untokenized string anymore?)
> 
> Is there a way to apply q.op to q and not fq at the same time; if that is
> indeed the problem?
> 
> Cheers!
> -- Jayson
> 

-- 
View this message in context: 
http://www.nabble.com/Odd-q.op%3DAND-and-fq-interactions-in-Solr-1.3.0-tp20106953p20106971.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Out of Memory Errors

2008-10-22 Thread r.prieto
Hi Willie,

Are you using highliting ???

If, the response is yes, you need to know that for each document retrieved,
the solr highliting load into memory the full field who is using for this
functionality. If the field is too long, you have problems with memory.

You can solve the problem using this patch:

http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200806.mbox/%3C1552
[EMAIL PROTECTED]

to copy the content of the field who is used to highliting to another field
and reduce the size.

You also need to know too that Windows have a limitation for memory process
in 2 GB.



-Mensaje original-
De: Willie Wong [mailto:[EMAIL PROTECTED] 
Enviado el: miércoles, 22 de octubre de 2008 3:48
Para: solr-user@lucene.apache.org
Asunto: Out of Memory Errors

Hello,

I've been having issues with out of memory errors on searches in Solr. I 
was wondering if I'm hitting a limit with solr or if I've configured 
something seriously wrong.

Solr Setup
- 3 cores 
- 3163615 documents each
- 10 GB size
- approx 10 fields
- document sizes vary from a few kb to a few MB
- no faceting is used however the search query can be fairly complex with 
8 or more fields being searched on at once

Environment:
- windows 2003
- 2.8 GHz zeon processor
- 1.5 GB memory assigned to solr
- Jetty 6 server

Once we get to around a few  concurrent users OOM start occuring and Jetty 
restarts.  Would this just be a case of more memory or are there certain 
configuration settings that need to be set?  We're using an out of the box 
Solr 1.3 beta version. 

A few of the things we considered that might help:
- Removing sorts on the result sets (result sets are approx 40,000 + 
documents)
- Reducing cache sizes such as the queryResultMaxDocsCached setting, 
document cache, queryResultCache, filterCache, etc

Am I missing anything else that should be looked at, or is it time to 
simply increase the memory/start looking at distributing the indexes?  Any 
help would be much appreciated.


Regards,

WW



Re: function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
you can try out a Transformer to translate that

On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> I've a function to clear up string which are in latin1 to UTF8, I would like
> to know where exactly should I put it in the java code to clear up string
> before indexing ?
>
> Thanks a lot for this information,
> Sunny
>
> I'm using solr1.3, mysql, tomcat55
> --
> View this message in context: 
> http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20106224.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread jayson.minard

Thinking about this, I could work around it by quoting the facet value so
that the AND isn't inserted between tokens in the fq parameter.  


jayson.minard wrote:
> 
> BY the way, the fq parameter is being used to apply a facet value as a
> refinement which is why it is not tokenized and is a string.
> 
> 
> jayson.minard wrote:
>> 
>> I am seeing odd behavior where a query such as:
>> 
>> http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
>> 
>> works until I add q.op=AND
>> 
>> http://localhost:8983/solr/select/?q=moss&q.op=AND&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
>> 
>> which then causes 0 results.  There is only one term in the q parameter,
>> and the fq parameter I would think would be unaffected, and both of its
>> terms are there anyway although in a String field and not a tokenized way
>> (so maybe it is inserting an AND between Fancy AND Doc which isn't
>> matching the untokenized string anymore?)
>> 
>> Is there a way to apply q.op to q and not fq at the same time; if that is
>> indeed the problem?
>> 
>> Cheers!
>> -- Jayson
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Odd-q.op%3DAND-and-fq-interactions-in-Solr-1.3.0-tp20106953p20106996.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread sunnyfr

Can you tell me more about it ? 


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> you can try out a Transformer to translate that
> 
> On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>>
>> I've a function to clear up string which are in latin1 to UTF8, I would
>> like
>> to know where exactly should I put it in the java code to clear up string
>> before indexing ?
>>
>> Thanks a lot for this information,
>> Sunny
>>
>> I'm using solr1.3, mysql, tomcat55
>> --
>> View this message in context:
>> http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20106224.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20108569.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr for Whole Web Search

2008-10-22 Thread John Martyniak

I am very new to Solr, but I have played with Nutch and Lucene.

Has anybody used Solr for a whole web indexing application?

Which Spider did you use?

How does it compare to Nutch?

Thanks in advance for all of the info.

-John



Re: function to clear up string to utf8 before indexing, where should I put it?

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
http://wiki.apache.org/solr/DataImportHandler#head-eb523b0943596587f05532f3ebc506ea6d9a606b

On Wed, Oct 22, 2008 at 4:41 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>
> Can you tell me more about it ?
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> you can try out a Transformer to translate that
>>
>> On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr <[EMAIL PROTECTED]> wrote:
>>>
>>> I've a function to clear up string which are in latin1 to UTF8, I would
>>> like
>>> to know where exactly should I put it in the java code to clear up string
>>> before indexing ?
>>>
>>> Thanks a lot for this information,
>>> Sunny
>>>
>>> I'm using solr1.3, mysql, tomcat55
>>> --
>>> View this message in context:
>>> http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20106224.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20108569.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: Ocean realtime search + Solr

2008-10-22 Thread Jason Rutherglen
Not quite yet, there is the IndexReader.clone patch that needs to be
completed that Ocean depends on
https://issues.apache.org/jira/browse/LUCENE-1314.  I had it completed
but then things changed in IndexReader so now it doesn't work and I
have not had time to complete it again.  Otherwise the Ocean code
works, the issue is in how best to integrate with SOLR which is not
clear at this point given it requires a rather massive change in the
SolrCore tree of code (see
https://issues.apache.org/jira/browse/SOLR-567 for what changes are
needed).  Unfortunately it's not as simple as swapping out something
as the facet and field caching methodology needs to change.

On Tue, Oct 21, 2008 at 8:45 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Im pretty intrigued by the Ocean search stuff and the Lucene patch, Im
> wondering if it's something that a tweaked Solr w/ mod Lucene can run now?
>  Has anyone tried merging that patch and running w/ Solr?  Im sure there is
> more to it than just swapping out the libs but the real time indexing Im
> sure would be possible, no?
>
> Thanks.
>
> - Jon
>


Re: immediatley commit of docs doesnt work in multiCore case

2008-10-22 Thread Parisa


I should mention that I have already added this his tag in my SolrConfig.xml
of all cores.


 

and It works in single core but unfortunately doesn't work in multi core .
-- 
View this message in context: 
http://www.nabble.com/immediatley-commit-of-docs-doesnt-work-in-multiCore-case-tp20072378p20110855.html
Sent from the Solr - User mailing list archive at Nabble.com.



FileNotFoundException on slave after replication - script bug?

2008-10-22 Thread Jim Murphy

We're seeing strange behavior on one of our slave nodes after replication. 
When the new searcher is created we see FileNotFoundExceptions in the log
and the index is strangely invalid/corrupted.

We may have identified the root cause but wanted to run it by the community. 
We figure there is a bug in the snappuller shell script, line 181:

snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"` 

This line determines the directory name of the latest snapshot to download
to the slave from the master.  Problem with this line is that it grab the
temporary work directory of a snapshot in progress.  Those temporary
directories are prefixed with  "temp" and as far as I can tell should never
get pulled from the master so its easy to disambiguate.  It seems that this
temp directory, if it exists will be the newest one so if present it will be
the one replicated: FAIL.

We've tweaked the line to exclude any directories starting with "temp":
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|grep -v temp|sort -r|head
-1"` 

This has fixed our local issue, we can submit a patch but wanted a quick
sanity check because I'm surprised its not much more commonly seen.

Jim

-- 
View this message in context: 
http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--tp20111313p20111313.html
Sent from the Solr - User mailing list archive at Nabble.com.



Boosting Question

2008-10-22 Thread Manepalli, Kalyan
Hi,
I am working on a usecase where I want to boost a document if
there are certain group of words near the keywords searched by the user.

For eg: if the user is searching for keyword "pool", I want to boost the
documents which have words like "excellent pool", "nice pool", "awesome
pool", etc.

The list of words can be very large. 
Can anyone suggest an optimal solution to do this.

Thanks
Kalyan


Re: Solr for Whole Web Search

2008-10-22 Thread Grant Ingersoll


On Oct 22, 2008, at 7:57 AM, John Martyniak wrote:


I am very new to Solr, but I have played with Nutch and Lucene.

Has anybody used Solr for a whole web indexing application?

Which Spider did you use?

How does it compare to Nutch?


There is a patch that combines Nutch + Solr.  Nutch is used for  
crawling, Solr for searching.  Can't say I've used it for whole web  
searching, but I believe some are trying it.


At the end of the day, I'm sure Solr could do it, but it will take  
some work to setup the architecture (distributed, replicated) and deal  
properly with fault tolerance and fail over.There are also some  
examples on Hadoop about Hadoop + Lucene integration.


How big are you talking?




Thanks in advance for all of the info.

-John



--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Hierarchical Faceting

2008-10-22 Thread Marian Steinbach
On Tue, Oct 21, 2008 at 3:59 PM, Sachit P. Menon
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have gone through the archive in search of Hierarchical Faceting but was 
> not clear as what should I exactly do to achieve that.
>
> Suppose, I have 3 categories like politics, science and sports. In the 
> schema, I am defining a field type called 'Category'. I don't have a sub 
> category field type (and don't want to have one).
> Now, Cricket and Football are some categories which can be considered to be 
> under sports.
> When I search for something and if it is present in the 'sports' category, 
> then it should show me the facets of cricket and football too.
>
> My question is:
> Do I need to specify cricket, football also as categories or sub categories 
> of sports (for which I don't want to make a separate field)?
> And if I make these as categories only, then how will I achieve the drilling 
> down of the data to cricket or football.
>


Hi Sachit!

I've had the same problem with a search for whine. The origin of whine
can consist of up to three hierarchical values country (e.g.
"France"), region (e.g. "Bordeaux") and sub-region (e.g. "St.
Emilion").

I have three facet fields country, region, sub-region for that. But I
only display the "region" facet under the following conditions:

- the user has selected a specific country, e.g. France, as filter
- or only one country is left (due to other filtering or fulltext search)

Don't know if this suits you. Just the way I handle it. It's not yet
publicly available though.

Marian


Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread Jim Murphy

Thanks Yonik, 

I have more information...

1. We do indeed have large indexes: 40GB on disk, 30M documents - and is
just a test server we have 8 of these in parallel.

2. The performance problem I was seeing followed replication, and first
query on a new searcher.  It turns out we didn't configure index warming
queries very well so we removes the various "solr rocks" type queries to one
that was better for our data - and had not improvement.  The problem was
that replication completed, a new searcher was created and registered but
the first query qould take 10-20 seconds to complete.  There after it took
<200 milliseconds for similar non-cached queries.

Profiler pointed us to building the FieldSortedHitQueue was taking all the
time.  Our warming query did not include a sort but our queries commonly do. 
Once we added the sort parameter our warming query started taking the 10-20
seconds prior to registering the searcher.  After that the first query on
the new searcher took the expected 200ms.

LESSON LEARNED: warm your caches! And, if a sort is involved in your queries
incorporate that sort in your warming query!  Add a warming query for each
kind of sort that you expect to do.

 







Yonik Seeley wrote:
> 
> On Mon, Oct 6, 2008 at 2:10 PM, Jim Murphy <[EMAIL PROTECTED]> wrote:
>> We have a farm of several Master-Slave pairs all managing a single very
>> large
>> "logical" index sharded across the master-slaves.  We notice on the
>> slaves,
>> after an rsync update, as the index is being committed that all queries
>> are
>> blocked sometimes resulting in unacceptable service times.  I'm looking
>> at
>> ways we can manage these "update burps".
> 
> Updates should never block queries.
> What version of Solr are you using?
> Is it possible that your indexes are so big, opening a new index in
> the background causes enough of the old index to be flushed from OS
> cache, causing big slowdowns?
> 
> -Yonik
> 
> 
>> Question #1: Anything obvious I can tweak in the configuration to
>> mitigate
>> these multi-second blocking updates?  Our Indexes are 40GB, 20M documents
>> each.  RSync updates are every 5 minutes several hundred KB per update.
>>
>> Question #2: I'm considering setting up each slave with multiple Solr
>> cores.
>> The 2 indexes per instance would be nearly identical copies but "A" would
>> be
>> read from while "B" is being updated, then they would swap.  I'll have to
>> figure out how to rsync these 2 indexes properly but if I can get the
>> commits to happen to the offline index then I suspect my queries could
>> proceed unblocked.
>>
>> Is this the wrong tree to be barking up?  Any other thoughts?
>>
>> Thanks in advance,
>>
>> Jim
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p19843098.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20112546.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr for Whole Web Search

2008-10-22 Thread John Martyniak

Grant thanks for the response.

A couple of other people have recommended trying the Nutch + Solr  
approach, but I am not sure what the real benefit of doing that is.   
Since Nutch provides most of the same features as Solr and Solr has  
some nice additional features (like spell checking, incremental index).


So I currently have a Nutch Index of around 500,000+ Urls, but expect  
it to get much bigger.  And am generally pretty happy with it, but I  
just want to make sure that I am going down the correct path, for the  
best feature set.  As far as implementation to the front end is  
concerned, I have been using the Nutch search app as basically a  
webservice to feed the main app (So using RSS).  The main app takes  
that and manipulates the results for display.


As far as the Hadoop + Lucene integration, I haven't used that  
directly just the Hadoop integration with Nutch.  And of course Hadoop  
independently.


-John


On Oct 22, 2008, at 10:08 AM, Grant Ingersoll wrote:



On Oct 22, 2008, at 7:57 AM, John Martyniak wrote:


I am very new to Solr, but I have played with Nutch and Lucene.

Has anybody used Solr for a whole web indexing application?

Which Spider did you use?

How does it compare to Nutch?


There is a patch that combines Nutch + Solr.  Nutch is used for  
crawling, Solr for searching.  Can't say I've used it for whole web  
searching, but I believe some are trying it.


At the end of the day, I'm sure Solr could do it, but it will take  
some work to setup the architecture (distributed, replicated) and  
deal properly with fault tolerance and fail over.There are also  
some examples on Hadoop about Hadoop + Lucene integration.


How big are you talking?




Thanks in advance for all of the info.

-John



--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ













Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread John Martyniak

Jim,

This is a off topic question.

But for your 30M documents, did you fetch those from external web  
sites (Whole Web Search)?  Or are they internal documents?  If they  
are external what method did you use to fetch them and which spider?


I am in the process of deciding between using Nutch for whole web  
indexing, Solr + Spider?, or Nutch + Solr, etc.


Thank you in advance for your insight into this issue.

-John

On Oct 22, 2008, at 10:55 AM, Jim Murphy wrote:



Thanks Yonik,

I have more information...

1. We do indeed have large indexes: 40GB on disk, 30M documents -  
and is

just a test server we have 8 of these in parallel.

2. The performance problem I was seeing followed replication, and  
first
query on a new searcher.  It turns out we didn't configure index  
warming
queries very well so we removes the various "solr rocks" type  
queries to one
that was better for our data - and had not improvement.  The problem  
was
that replication completed, a new searcher was created and  
registered but
the first query qould take 10-20 seconds to complete.  There after  
it took

<200 milliseconds for similar non-cached queries.

Profiler pointed us to building the FieldSortedHitQueue was taking  
all the
time.  Our warming query did not include a sort but our queries  
commonly do.
Once we added the sort parameter our warming query started taking  
the 10-20
seconds prior to registering the searcher.  After that the first  
query on

the new searcher took the expected 200ms.

LESSON LEARNED: warm your caches! And, if a sort is involved in your  
queries
incorporate that sort in your warming query!  Add a warming query  
for each

kind of sort that you expect to do.









Yonik Seeley wrote:


On Mon, Oct 6, 2008 at 2:10 PM, Jim Murphy <[EMAIL PROTECTED]>  
wrote:
We have a farm of several Master-Slave pairs all managing a single  
very

large
"logical" index sharded across the master-slaves.  We notice on the
slaves,
after an rsync update, as the index is being committed that all  
queries

are
blocked sometimes resulting in unacceptable service times.  I'm  
looking

at
ways we can manage these "update burps".


Updates should never block queries.
What version of Solr are you using?
Is it possible that your indexes are so big, opening a new index in
the background causes enough of the old index to be flushed from OS
cache, causing big slowdowns?

-Yonik



Question #1: Anything obvious I can tweak in the configuration to
mitigate
these multi-second blocking updates?  Our Indexes are 40GB, 20M  
documents
each.  RSync updates are every 5 minutes several hundred KB per  
update.


Question #2: I'm considering setting up each slave with multiple  
Solr

cores.
The 2 indexes per instance would be nearly identical copies but  
"A" would

be
read from while "B" is being updated, then they would swap.  I'll  
have to
figure out how to rsync these 2 indexes properly but if I can get  
the
commits to happen to the offline index then I suspect my queries  
could

proceed unblocked.

Is this the wrong tree to be barking up?  Any other thoughts?

Thanks in advance,

Jim



--
View this message in context:
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p19843098.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context: 
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20112546.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread Jim Murphy

We index RSS content using our own home grown distributed spiders - not using
Nutch.  We use ruby processes do do the feed fetching and XML shreading, and
Amazon SQS to queue up work packets to insert into our Solr cluster. 

Sorry can't be of more help.

-- 
View this message in context: 
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20113143.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Boosting Question

2008-10-22 Thread Otis Gospodnetic
Hi,

Without changing any of the internals a simple approach might be to take the 
query "pool" and expand the query with those other keywords, form query phrases 
in addition to just plain "pool" keyword, and boost those expanded phrases to 
make them bubble up - if they exist.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: "Manepalli, Kalyan" <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, October 22, 2008 10:07:38 AM
> Subject: Boosting Question
> 
> Hi,
> I am working on a usecase where I want to boost a document if
> there are certain group of words near the keywords searched by the user.
> 
> For eg: if the user is searching for keyword "pool", I want to boost the
> documents which have words like "excellent pool", "nice pool", "awesome
> pool", etc.
> 
> The list of words can be very large. 
> Can anyone suggest an optimal solution to do this.
> 
> Thanks
> Kalyan



Question about copyField

2008-10-22 Thread Aleksey Gogolev

Hello.

I have field "description" in my schema. And I want make a filed
"suggestion" with the same content. So I added following line to my
schema.xml:

   

But I also want to modify "description" string before copying it to
"suggestion" field. I want to remove all comas, dots and slashes. Here
is an example of such transformation:

"TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

And so as result I want to have such doc:


 8asydauf9nbcngfaad
 TvPL/st, SAMSUNG, SML200
 TvPL st SAMSUNG SML200


I think it would be nice to use solr.PatternReplaceFilterFactory for
this purpose. So the question is: Can I use solr filters for
processing "description" string before copying it to "suggestion"
field?

Thank you for your attention.

-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey



Re: Understanding prefix query searching

2008-10-22 Thread Otis Gospodnetic
Hii,

You probably lower-case tokens during indexing (LowerCaseFilterFactory).  
Wildcard queries are not analyzed as non-wildcard ones (this is explained in 
Lucene FAQ, I believe), so your capitalized Robert doesn't match the 
lower-cased robert in your index.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Rupert Fiasco <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, October 21, 2008 9:03:10 PM
> Subject: Understanding prefix query searching
> 
> So I tried to look on google for an answer to this before I posted
> here. Basically I am trying to understand how prefix searching works.
> 
> I have a dynamic text field (indexed and stored) "full_name_t"
> 
> I have some data in my index, specifically a record with full_name_t =
> "Robert P Page"
> 
> A search on:
> 
> full_name_t:Robert
> 
> yields that document, however a search on
> 
> full_name_t:Robert*
> 
> yields nothing.
> 
> Why?
> 
> To get around this I am doing something like
> 
> (full_name_t:Robert OR full_name_t:Robert*)
> 
> But I would like to understand why the wildcard doesnt work, shouldn't
> it match anything after the first characters of "Robert"?
> 
> Thanks
> 
> -Rupert



Re: Odd q.op=AND and fq interactions in Solr 1.3.0

2008-10-22 Thread Otis Gospodnetic
Hi Jayson,

That's exactly what I was going to suggest: fq="docType:Fancy Doc"

 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: jayson.minard <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, October 22, 2008 5:26:03 AM
> Subject: Re: Odd q.op=AND and fq interactions in Solr 1.3.0
> 
> 
> Thinking about this, I could work around it by quoting the facet value so
> that the AND isn't inserted between tokens in the fq parameter.  
> 
> 
> jayson.minard wrote:
> > 
> > BY the way, the fq parameter is being used to apply a facet value as a
> > refinement which is why it is not tokenized and is a string.
> > 
> > 
> > jayson.minard wrote:
> >> 
> >> I am seeing odd behavior where a query such as:
> >> 
> >> 
> http://localhost:8983/solr/select/?q=moss&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
> >> 
> >> works until I add q.op=AND
> >> 
> >> 
> http://localhost:8983/solr/select/?q=moss&q.op=AND&version=2.2&start=0&rows=10&indent=on&fq=docType%3AFancy+Doc
> >> 
> >> which then causes 0 results.  There is only one term in the q parameter,
> >> and the fq parameter I would think would be unaffected, and both of its
> >> terms are there anyway although in a String field and not a tokenized way
> >> (so maybe it is inserting an AND between Fancy AND Doc which isn't
> >> matching the untokenized string anymore?)
> >> 
> >> Is there a way to apply q.op to q and not fq at the same time; if that is
> >> indeed the problem?
> >> 
> >> Cheers!
> >> -- Jayson
> >> 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Odd-q.op%3DAND-and-fq-interactions-in-Solr-1.3.0-tp20106953p20106996.html
> Sent from the Solr - User mailing list archive at Nabble.com.



RE: Question about copyField

2008-10-22 Thread Feak, Todd
The filters and tokenizer that are applied to the copy field are
determined by it's type in the schema. Simply create a new field type in
your schema with the filters you would like, and use that type for your
copy field. So, the field description would have it's old type, but the
field suggestion would get a new type.

-Todd Feak

-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 8:28 AM
To: solr-user@lucene.apache.org
Subject: Question about copyField


Hello.

I have field "description" in my schema. And I want make a filed
"suggestion" with the same content. So I added following line to my
schema.xml:

   

But I also want to modify "description" string before copying it to
"suggestion" field. I want to remove all comas, dots and slashes. Here
is an example of such transformation:

"TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

And so as result I want to have such doc:


 8asydauf9nbcngfaad
 TvPL/st, SAMSUNG, SML200
 TvPL st SAMSUNG SML200


I think it would be nice to use solr.PatternReplaceFilterFactory for
this purpose. So the question is: Can I use solr filters for
processing "description" string before copying it to "suggestion"
field?

Thank you for your attention.

-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey




Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread John Martyniak
Thank you that is good information, as that is kind of way that I am  
leaning.


So when you fetch the content from RSS, does that get rendered to an  
XML document that Solr indexes?


Also what where a couple of decision points for using Solr as opposed  
to using Nutch, or even straight Lucene?


-John



On Oct 22, 2008, at 11:22 AM, Jim Murphy wrote:



We index RSS content using our own home grown distributed spiders -  
not using
Nutch.  We use ruby processes do do the feed fetching and XML  
shreading, and

Amazon SQS to queue up work packets to insert into our Solr cluster.

Sorry can't be of more help.

--
View this message in context: 
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20113143.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Out of Memory Errors

2008-10-22 Thread Otis Gospodnetic
Hi,

Without knowing the details I suspect it's just that 1.5GB heap is not enough.  
Yes, sort will use your heap, as will various Solr caches.  As will norms, so 
double-check your schema to make sure you are using field types like string 
where you can, not text, for example.  If you sort by timestamp-like fields, 
reduce its granularity as much as possible.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Willie Wong <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, October 21, 2008 9:48:14 PM
> Subject: Out of Memory Errors
> 
> Hello,
> 
> I've been having issues with out of memory errors on searches in Solr. I 
> was wondering if I'm hitting a limit with solr or if I've configured 
> something seriously wrong.
> 
> Solr Setup
> - 3 cores 
> - 3163615 documents each
> - 10 GB size
> - approx 10 fields
> - document sizes vary from a few kb to a few MB
> - no faceting is used however the search query can be fairly complex with 
> 8 or more fields being searched on at once
> 
> Environment:
> - windows 2003
> - 2.8 GHz zeon processor
> - 1.5 GB memory assigned to solr
> - Jetty 6 server
> 
> Once we get to around a few  concurrent users OOM start occuring and Jetty 
> restarts.  Would this just be a case of more memory or are there certain 
> configuration settings that need to be set?  We're using an out of the box 
> Solr 1.3 beta version. 
> 
> A few of the things we considered that might help:
> - Removing sorts on the result sets (result sets are approx 40,000 + 
> documents)
> - Reducing cache sizes such as the queryResultMaxDocsCached setting, 
> document cache, queryResultCache, filterCache, etc
> 
> Am I missing anything else that should be looked at, or is it time to 
> simply increase the memory/start looking at distributing the indexes?  Any 
> help would be much appreciated.
> 
> 
> Regards,
> 
> WW



RE: error with delta import

2008-10-22 Thread Steven A Rowe
Hi Shalin,

I wasn't talking about the behavior of parsers in the wild, but rather about 
the XML specification (paraphrasing):

1. An XML document is not well-formed unless it matches the production labeled 
document.
2. Violations of well-formedness constraints are fatal errors.
3. Once a fatal error is detected, an XML parser MUST NOT continue normal 
processing.

So although there are undoubtedly parsers that will parse '<' in attribute 
values, in so doing, these parsers are non-conformant with the XML 
specification.  This is important only to the extent that people who create 
documents that target non-conforming features of parsers can't reliably expect 
these documents to be parsed by conformant parsers; XML's 
write-once-parse-anywhere promise thereby inexorably evaporates.

Telling people that it's not a problem (or required!) to write non-well-formed 
XML, because a particular XML parser can't accept well-formed XML is kind of 
insidious.  I for one will not stand idly by and permit this outrage to remain 
unchallenged!!!

:)

Steve

On 10/22/2008 at 4:01 AM, Shalin Shekhar Mangar wrote:
> Actually, most XML parsers don't require you to escape such
> characters in attributes. You are welcome to try this out,
> just look at the example-DIH :)
> 
> On Tue, Oct 21, 2008 at 11:11 PM, Steven A Rowe
> <[EMAIL PROTECTED]> wrote:
> 
> > Wow, I really should read more closely before I respond - I see now,
> > Noble, that you were talking about DIH's ability to parse escaped '<'s
> > in attribute values, rather than about whether '<' was an acceptable
> > character in attribute values.
> > 
> > I should repurpose my remarks to note to Shalin, though, that all
> > (conformant) XML parsers have to be able to handle escaped '<'s in
> > attribute values, since an XML document with a '<' in an attribute
> > value is not well-formed.
> > 
> > Steve
> > 
> > On 10/21/2008 at 1:10 PM, Steven A Rowe wrote:
> > > On 10/21/2008 at 12:14 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
> > > > On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar
> > > <[EMAIL PROTECTED]> wrote:
> > > > > Your data-config looks fine except for one thing --
> you do not need
> > to
> > > > > escape '<' character in an XML attribute. It maybe throwing off the
> > > > > parsing code in DataImportHandler.
> > > > 
> > > > not really '<' is fine in attribute
> > > 
> > > Noble, I think you're wrong - AFAICT from the XML spec., '<' is *not*
> > > fine in an attribute value - from
> > > :
> > > 
> > >   [10]  AttValue ::= '"' ([^<&"] | Reference)* '"'
> > >  |   "'" ([^<&'] | Reference)* "'"
> > > 
> > > where an attribute  is:
> > > 
> > >   [41] Attribute ::= Name Eq AttValue
> > > 
> > > Steve


Re[2]: Question about copyField

2008-10-22 Thread Aleksey Gogolev

Thanks for reply. I want to make your point more exact, cause I'm not
sure that I correctly understood you :)

As far as I know (correct me please, if I wrong) type defines the way
in which the field is indexed and queried. But I don't want to index
or query "suggestion" field in different way, I want "suggestion" field
store different value (like in example I wrote in first mail). 

So you are saying that I can tell to slor (using filedType) how solr
should process string before saving it? Yes?

FT> The filters and tokenizer that are applied to the copy field are
FT> determined by it's type in the schema. Simply create a new field type in
FT> your schema with the filters you would like, and use that type for your
FT> copy field. So, the field description would have it's old type, but the
FT> field suggestion would get a new type.

FT> -Todd Feak

FT> -Original Message-
FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT> Sent: Wednesday, October 22, 2008 8:28 AM
FT> To: solr-user@lucene.apache.org
FT> Subject: Question about copyField


FT> Hello.

FT> I have field "description" in my schema. And I want make a filed
FT> "suggestion" with the same content. So I added following line to my
FT> schema.xml:

FT>

FT> But I also want to modify "description" string before copying it to
FT> "suggestion" field. I want to remove all comas, dots and slashes. Here
FT> is an example of such transformation:

FT> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT> And so as result I want to have such doc:

FT> 
FT>  8asydauf9nbcngfaad
FT>  TvPL/st, SAMSUNG, SML200
FT>  TvPL st SAMSUNG SML200
FT> 

FT> I think it would be nice to use solr.PatternReplaceFilterFactory for
FT> this purpose. So the question is: Can I use solr filters for
FT> processing "description" string before copying it to "suggestion"
FT> field?

FT> Thank you for your attention.




-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]



Re: Index updates blocking readers: To Multicore or not?

2008-10-22 Thread Jim Murphy

We shread the RSS into individual items then create Solr XML documents to
insert.  Solr is an easy choice for us over straight Lucene since it adds
the server infrastructure that we would mostly be writing ourself - caching,
data types, master/slave replication.

We looked at nutch too - but that was before my time.

Jim



John Martyniak-3 wrote:
> 
> Thank you that is good information, as that is kind of way that I am  
> leaning.
> 
> So when you fetch the content from RSS, does that get rendered to an  
> XML document that Solr indexes?
> 
> Also what where a couple of decision points for using Solr as opposed  
> to using Nutch, or even straight Lucene?
> 
> -John
> 
> 
> 
> On Oct 22, 2008, at 11:22 AM, Jim Murphy wrote:
> 
>>
>> We index RSS content using our own home grown distributed spiders -  
>> not using
>> Nutch.  We use ruby processes do do the feed fetching and XML  
>> shreading, and
>> Amazon SQS to queue up work packets to insert into our Solr cluster.
>>
>> Sorry can't be of more help.
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20113143.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20114697.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Re[2]: Question about copyField

2008-10-22 Thread Feak, Todd
Yes, using fieldType, you can have Solr run the PatternReplaceFilter for
you.

So, for example, you can declare something like this:
--
  
...


  
... Put the PatternReplaceFilter in here. At least for indexing, maybe
for query as well

...


...
 
---

I would suggest doing this in your schema, then starting up Solr and
using the analysis admin page to see if it will index and search the way
you want. That way you don't have to pay the cost of actually indexing
the data to find out.

-Todd

-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 9:24 AM
To: Feak, Todd
Subject: Re[2]: Question about copyField


Thanks for reply. I want to make your point more exact, cause I'm not
sure that I correctly understood you :)

As far as I know (correct me please, if I wrong) type defines the way
in which the field is indexed and queried. But I don't want to index
or query "suggestion" field in different way, I want "suggestion" field
store different value (like in example I wrote in first mail). 

So you are saying that I can tell to slor (using filedType) how solr
should process string before saving it? Yes?

FT> The filters and tokenizer that are applied to the copy field are
FT> determined by it's type in the schema. Simply create a new field
type in
FT> your schema with the filters you would like, and use that type for
your
FT> copy field. So, the field description would have it's old type, but
the
FT> field suggestion would get a new type.

FT> -Todd Feak

FT> -Original Message-
FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT> Sent: Wednesday, October 22, 2008 8:28 AM
FT> To: solr-user@lucene.apache.org
FT> Subject: Question about copyField


FT> Hello.

FT> I have field "description" in my schema. And I want make a filed
FT> "suggestion" with the same content. So I added following line to my
FT> schema.xml:

FT>

FT> But I also want to modify "description" string before copying it to
FT> "suggestion" field. I want to remove all comas, dots and slashes.
Here
FT> is an example of such transformation:

FT> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT> And so as result I want to have such doc:

FT> 
FT>  8asydauf9nbcngfaad
FT>  TvPL/st, SAMSUNG, SML200
FT>  TvPL st SAMSUNG SML200
FT> 

FT> I think it would be nice to use solr.PatternReplaceFilterFactory for
FT> this purpose. So the question is: Can I use solr filters for
FT> processing "description" string before copying it to "suggestion"
FT> field?

FT> Thank you for your attention.




-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]




Re: error with delta import

2008-10-22 Thread Walter Underwood
On 10/22/08 8:57 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote:

> Telling people that it's not a problem (or required!) to write non-well-formed
> XML, because a particular XML parser can't accept well-formed XML is kind of
> insidious.

I'm with you all the way on this.

A parser which accepts non-well-formed XML is not an XML parser, since the
XML spec requires reporting a fatal error.

It is really easy to test these things. Modern browsers have good XML
parsers, so put your test case in a "test.xml" file and open it in a
browser. If it isn't well-formed, you'll get an error.

Here is my test XML:



Here is what Firefox 3.0.3 says about that:

XML Parsing Error: not well-formed
Location: file:///Users/wunderwood/Desktop/test.xml
Line Number 1, Column 18:


-^

wunder



Re: Out of Memory Errors

2008-10-22 Thread Jae Joo
Here is what I am doing to check the memory statues.
1. Run the Servelt and Solr application.
2. On command prompt, jstat -gc  5s (5s means that getting data every 5
seconds.)
3. Watch it or pipe to the file.
4. Analyze the data gathered.

Jae

On Tue, Oct 21, 2008 at 9:48 PM, Willie Wong <[EMAIL PROTECTED]>wrote:

> Hello,
>
> I've been having issues with out of memory errors on searches in Solr. I
> was wondering if I'm hitting a limit with solr or if I've configured
> something seriously wrong.
>
> Solr Setup
> - 3 cores
> - 3163615 documents each
> - 10 GB size
> - approx 10 fields
> - document sizes vary from a few kb to a few MB
> - no faceting is used however the search query can be fairly complex with
> 8 or more fields being searched on at once
>
> Environment:
> - windows 2003
> - 2.8 GHz zeon processor
> - 1.5 GB memory assigned to solr
> - Jetty 6 server
>
> Once we get to around a few  concurrent users OOM start occuring and Jetty
> restarts.  Would this just be a case of more memory or are there certain
> configuration settings that need to be set?  We're using an out of the box
> Solr 1.3 beta version.
>
> A few of the things we considered that might help:
> - Removing sorts on the result sets (result sets are approx 40,000 +
> documents)
> - Reducing cache sizes such as the queryResultMaxDocsCached setting,
> document cache, queryResultCache, filterCache, etc
>
> Am I missing anything else that should be looked at, or is it time to
> simply increase the memory/start looking at distributing the indexes?  Any
> help would be much appreciated.
>
>
> Regards,
>
> WW
>


Re[4]: Question about copyField

2008-10-22 Thread Aleksey Gogolev


FT> I would suggest doing this in your schema, then starting up Solr and
FT> using the analysis admin page to see if it will index and search the way
FT> you want. That way you don't have to pay the cost of actually indexing
FT> the data to find out.

Thanks. I did it exactly like you said.

I created a fieldType "ex" (short for experiment), defined
corresponding  and try it on the analysis page. Here is what
I got (I uploaded the page, so you can see it): 

http://tut-i-tam.com.ua/static/analysis.jsp.htm

I want the final token "samsung spinpoint p spn hard drive gb ata" to
be the actual "ex" value. So I expect such response:



 samsung spinpoint p spn hard drive gb ata
 SP2514N
 Samsung SpinPoint12 P120 SP2514N - hard 
drive - 250 GB - ATA-133
 


But when I'm searching this doc, I got this:



 Samsung SpinPoint12 P120 SP2514N - hard drive - 250 
GB - ATA-133
 SP2514N
 Samsung SpinPoint12 P120 SP2514N - hard 
drive - 250 GB - ATA-133
 


As you can see "description" and "ex" filed are identical.
The result of filter chain wasn't actually stored in the "ex" filed :(

Anyway, thank you :)

FT> -Todd

FT> -Original Message-
FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT> Sent: Wednesday, October 22, 2008 9:24 AM
FT> To: Feak, Todd
FT> Subject: Re[2]: Question about copyField


FT> Thanks for reply. I want to make your point more exact, cause I'm not
FT> sure that I correctly understood you :)

FT> As far as I know (correct me please, if I wrong) type defines the way
FT> in which the field is indexed and queried. But I don't want to index
FT> or query "suggestion" field in different way, I want "suggestion" field
FT> store different value (like in example I wrote in first mail). 

FT> So you are saying that I can tell to slor (using filedType) how solr
FT> should process string before saving it? Yes?

FT>> The filters and tokenizer that are applied to the copy field are
FT>> determined by it's type in the schema. Simply create a new field
FT> type in
FT>> your schema with the filters you would like, and use that type for
FT> your
FT>> copy field. So, the field description would have it's old type, but
FT> the
FT>> field suggestion would get a new type.

FT>> -Todd Feak

FT>> -Original Message-
FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>> Sent: Wednesday, October 22, 2008 8:28 AM
FT>> To: solr-user@lucene.apache.org
FT>> Subject: Question about copyField


FT>> Hello.

FT>> I have field "description" in my schema. And I want make a filed
FT>> "suggestion" with the same content. So I added following line to my
FT>> schema.xml:

FT>>

FT>> But I also want to modify "description" string before copying it to
FT>> "suggestion" field. I want to remove all comas, dots and slashes.
FT> Here
FT>> is an example of such transformation:

FT>> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT>> And so as result I want to have such doc:

FT>> 
FT>>  8asydauf9nbcngfaad
FT>>  TvPL/st, SAMSUNG, SML200
FT>>  TvPL st SAMSUNG SML200
FT>> 

FT>> I think it would be nice to use solr.PatternReplaceFilterFactory for
FT>> this purpose. So the question is: Can I use solr filters for
FT>> processing "description" string before copying it to "suggestion"
FT>> field?

FT>> Thank you for your attention.







-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]



RE: Re[4]: Question about copyField

2008-10-22 Thread Feak, Todd
My bad. I misunderstood what you wanted. 

The example I gave was for the searching side of things. Not the data
representation in the document.

-Todd

-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 11:14 AM
To: Feak, Todd
Subject: Re[4]: Question about copyField



FT> I would suggest doing this in your schema, then starting up Solr and
FT> using the analysis admin page to see if it will index and search the
way
FT> you want. That way you don't have to pay the cost of actually
indexing
FT> the data to find out.

Thanks. I did it exactly like you said.

I created a fieldType "ex" (short for experiment), defined
corresponding  and try it on the analysis page. Here is what
I got (I uploaded the page, so you can see it): 

http://tut-i-tam.com.ua/static/analysis.jsp.htm

I want the final token "samsung spinpoint p spn hard drive gb ata" to
be the actual "ex" value. So I expect such response:



 samsung spinpoint p spn hard drive gb
ata
 SP2514N
 Samsung SpinPoint12 P120 SP2514N -
hard drive - 250 GB - ATA-133
 


But when I'm searching this doc, I got this:



 Samsung SpinPoint12 P120 SP2514N - hard
drive - 250 GB - ATA-133
 SP2514N
 Samsung SpinPoint12 P120 SP2514N -
hard drive - 250 GB - ATA-133
 


As you can see "description" and "ex" filed are identical.
The result of filter chain wasn't actually stored in the "ex" filed :(

Anyway, thank you :)

FT> -Todd

FT> -Original Message-
FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT> Sent: Wednesday, October 22, 2008 9:24 AM
FT> To: Feak, Todd
FT> Subject: Re[2]: Question about copyField


FT> Thanks for reply. I want to make your point more exact, cause I'm
not
FT> sure that I correctly understood you :)

FT> As far as I know (correct me please, if I wrong) type defines the
way
FT> in which the field is indexed and queried. But I don't want to index
FT> or query "suggestion" field in different way, I want "suggestion"
field
FT> store different value (like in example I wrote in first mail). 

FT> So you are saying that I can tell to slor (using filedType) how solr
FT> should process string before saving it? Yes?

FT>> The filters and tokenizer that are applied to the copy field are
FT>> determined by it's type in the schema. Simply create a new field
FT> type in
FT>> your schema with the filters you would like, and use that type for
FT> your
FT>> copy field. So, the field description would have it's old type, but
FT> the
FT>> field suggestion would get a new type.

FT>> -Todd Feak

FT>> -Original Message-
FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>> Sent: Wednesday, October 22, 2008 8:28 AM
FT>> To: solr-user@lucene.apache.org
FT>> Subject: Question about copyField


FT>> Hello.

FT>> I have field "description" in my schema. And I want make a filed
FT>> "suggestion" with the same content. So I added following line to my
FT>> schema.xml:

FT>>

FT>> But I also want to modify "description" string before copying it to
FT>> "suggestion" field. I want to remove all comas, dots and slashes.
FT> Here
FT>> is an example of such transformation:

FT>> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT>> And so as result I want to have such doc:

FT>> 
FT>>  8asydauf9nbcngfaad
FT>>  TvPL/st, SAMSUNG, SML200
FT>>  TvPL st SAMSUNG SML200
FT>> 

FT>> I think it would be nice to use solr.PatternReplaceFilterFactory
for
FT>> this purpose. So the question is: Can I use solr filters for
FT>> processing "description" string before copying it to "suggestion"
FT>> field?

FT>> Thank you for your attention.







-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]




RE: Re[4]: Question about copyField

2008-10-22 Thread Joe Nguyen
It doesn't need to be a copy field, right?  Could you create a new field
"ex", extract value from description, delete digits, and set to "ex"
field before add/index to solr server?  

-Original Message-
From: Feak, Todd [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 11:25 Joe
To: solr-user@lucene.apache.org
Subject: RE: Re[4]: Question about copyField

My bad. I misunderstood what you wanted. 

The example I gave was for the searching side of things. Not the data
representation in the document.

-Todd

-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 11:14 AM
To: Feak, Todd
Subject: Re[4]: Question about copyField



FT> I would suggest doing this in your schema, then starting up Solr and
FT> using the analysis admin page to see if it will index and search the
way
FT> you want. That way you don't have to pay the cost of actually
indexing
FT> the data to find out.

Thanks. I did it exactly like you said.

I created a fieldType "ex" (short for experiment), defined
corresponding  and try it on the analysis page. Here is what
I got (I uploaded the page, so you can see it): 

http://tut-i-tam.com.ua/static/analysis.jsp.htm

I want the final token "samsung spinpoint p spn hard drive gb ata" to
be the actual "ex" value. So I expect such response:



 samsung spinpoint p spn hard drive gb
ata
 SP2514N
 Samsung SpinPoint12 P120 SP2514N -
hard drive - 250 GB - ATA-133
 


But when I'm searching this doc, I got this:



 Samsung SpinPoint12 P120 SP2514N - hard
drive - 250 GB - ATA-133
 SP2514N
 Samsung SpinPoint12 P120 SP2514N -
hard drive - 250 GB - ATA-133
 


As you can see "description" and "ex" filed are identical.
The result of filter chain wasn't actually stored in the "ex" filed :(

Anyway, thank you :)

FT> -Todd

FT> -Original Message-
FT> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT> Sent: Wednesday, October 22, 2008 9:24 AM
FT> To: Feak, Todd
FT> Subject: Re[2]: Question about copyField


FT> Thanks for reply. I want to make your point more exact, cause I'm
not
FT> sure that I correctly understood you :)

FT> As far as I know (correct me please, if I wrong) type defines the
way
FT> in which the field is indexed and queried. But I don't want to index
FT> or query "suggestion" field in different way, I want "suggestion"
field
FT> store different value (like in example I wrote in first mail). 

FT> So you are saying that I can tell to slor (using filedType) how solr
FT> should process string before saving it? Yes?

FT>> The filters and tokenizer that are applied to the copy field are
FT>> determined by it's type in the schema. Simply create a new field
FT> type in
FT>> your schema with the filters you would like, and use that type for
FT> your
FT>> copy field. So, the field description would have it's old type, but
FT> the
FT>> field suggestion would get a new type.

FT>> -Todd Feak

FT>> -Original Message-
FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>> Sent: Wednesday, October 22, 2008 8:28 AM
FT>> To: solr-user@lucene.apache.org
FT>> Subject: Question about copyField


FT>> Hello.

FT>> I have field "description" in my schema. And I want make a filed
FT>> "suggestion" with the same content. So I added following line to my
FT>> schema.xml:

FT>>

FT>> But I also want to modify "description" string before copying it to
FT>> "suggestion" field. I want to remove all comas, dots and slashes.
FT> Here
FT>> is an example of such transformation:

FT>> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT>> And so as result I want to have such doc:

FT>> 
FT>>  8asydauf9nbcngfaad
FT>>  TvPL/st, SAMSUNG, SML200
FT>>  TvPL st SAMSUNG SML200
FT>> 

FT>> I think it would be nice to use solr.PatternReplaceFilterFactory
for
FT>> this purpose. So the question is: Can I use solr filters for
FT>> processing "description" string before copying it to "suggestion"
FT>> field?

FT>> Thank you for your attention.







-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]




Re[6]: Question about copyField

2008-10-22 Thread Aleksey Gogolev

JN> It doesn't need to be a copy field, right?  Could you create a new field
JN> "ex", extract value from description, delete digits, and set to "ex"
JN> field before add/index to solr server?

Yes, I can. I just was wondering can I use solr for this purpose or
not.

JN> -Original Message-
JN> From: Feak, Todd [mailto:[EMAIL PROTECTED] 
JN> Sent: Wednesday, October 22, 2008 11:25 Joe
JN> To: solr-user@lucene.apache.org
JN> Subject: RE: Re[4]: Question about copyField

JN> My bad. I misunderstood what you wanted. 

JN> The example I gave was for the searching side of things. Not the data
JN> representation in the document.

JN> -Todd

JN> -Original Message-
JN> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
JN> Sent: Wednesday, October 22, 2008 11:14 AM
JN> To: Feak, Todd
JN> Subject: Re[4]: Question about copyField



FT>> I would suggest doing this in your schema, then starting up Solr and
FT>> using the analysis admin page to see if it will index and search the
JN> way
FT>> you want. That way you don't have to pay the cost of actually
JN> indexing
FT>> the data to find out.

JN> Thanks. I did it exactly like you said.

JN> I created a fieldType "ex" (short for experiment), defined
JN> corresponding  and try it on the analysis page. Here is what
JN> I got (I uploaded the page, so you can see it): 

JN> http://tut-i-tam.com.ua/static/analysis.jsp.htm

JN> I want the final token "samsung spinpoint p spn hard drive gb ata" to
JN> be the actual "ex" value. So I expect such response:

JN> 
JN> 
JN>  samsung spinpoint p spn hard drive gb
JN> ata
JN>  SP2514N
JN>  Samsung SpinPoint12 P120 SP2514N -
JN> hard drive - 250 GB - ATA-133
JN>  
JN> 

JN> But when I'm searching this doc, I got this:

JN> 
JN> 
JN>  Samsung SpinPoint12 P120 SP2514N - hard
JN> drive - 250 GB - ATA-133
JN>  SP2514N
JN>  Samsung SpinPoint12 P120 SP2514N -
JN> hard drive - 250 GB - ATA-133
JN>  
JN> 

JN> As you can see "description" and "ex" filed are identical.
JN> The result of filter chain wasn't actually stored in the "ex" filed :(

JN> Anyway, thank you :)

FT>> -Todd

FT>> -Original Message-
FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>> Sent: Wednesday, October 22, 2008 9:24 AM
FT>> To: Feak, Todd
FT>> Subject: Re[2]: Question about copyField


FT>> Thanks for reply. I want to make your point more exact, cause I'm
JN> not
FT>> sure that I correctly understood you :)

FT>> As far as I know (correct me please, if I wrong) type defines the
JN> way
FT>> in which the field is indexed and queried. But I don't want to index
FT>> or query "suggestion" field in different way, I want "suggestion"
JN> field
FT>> store different value (like in example I wrote in first mail). 

FT>> So you are saying that I can tell to slor (using filedType) how solr
FT>> should process string before saving it? Yes?

FT>>> The filters and tokenizer that are applied to the copy field are
FT>>> determined by it's type in the schema. Simply create a new field
FT>> type in
FT>>> your schema with the filters you would like, and use that type for
FT>> your
FT>>> copy field. So, the field description would have it's old type, but
FT>> the
FT>>> field suggestion would get a new type.

FT>>> -Todd Feak

FT>>> -Original Message-
FT>>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>>> Sent: Wednesday, October 22, 2008 8:28 AM
FT>>> To: solr-user@lucene.apache.org
FT>>> Subject: Question about copyField


FT>>> Hello.

FT>>> I have field "description" in my schema. And I want make a filed
FT>>> "suggestion" with the same content. So I added following line to my
FT>>> schema.xml:

FT>>>

FT>>> But I also want to modify "description" string before copying it to
FT>>> "suggestion" field. I want to remove all comas, dots and slashes.
FT>> Here
FT>>> is an example of such transformation:

FT>>> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT>>> And so as result I want to have such doc:

FT>>> 
FT>>>  8asydauf9nbcngfaad
FT>>>  TvPL/st, SAMSUNG, SML200
FT>>>  TvPL st SAMSUNG SML200
FT>>> 

FT>>> I think it would be nice to use solr.PatternReplaceFilterFactory
JN> for
FT>>> this purpose. So the question is: Can I use solr filters for
FT>>> processing "description" string before copying it to "suggestion"
FT>>> field?

FT>>> Thank you for your attention.










-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]



RE: Re[6]: Question about copyField

2008-10-22 Thread Joe Nguyen
Could you post fieldType specification for "ex"?  What your regex look
like?



-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 11:39 Joe
To: Joe Nguyen
Subject: Re[6]: Question about copyField


JN> It doesn't need to be a copy field, right?  Could you create a new
field
JN> "ex", extract value from description, delete digits, and set to "ex"
JN> field before add/index to solr server?

Yes, I can. I just was wondering can I use solr for this purpose or
not.

JN> -Original Message-
JN> From: Feak, Todd [mailto:[EMAIL PROTECTED] 
JN> Sent: Wednesday, October 22, 2008 11:25 Joe
JN> To: solr-user@lucene.apache.org
JN> Subject: RE: Re[4]: Question about copyField

JN> My bad. I misunderstood what you wanted. 

JN> The example I gave was for the searching side of things. Not the
data
JN> representation in the document.

JN> -Todd

JN> -Original Message-
JN> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
JN> Sent: Wednesday, October 22, 2008 11:14 AM
JN> To: Feak, Todd
JN> Subject: Re[4]: Question about copyField



FT>> I would suggest doing this in your schema, then starting up Solr
and
FT>> using the analysis admin page to see if it will index and search
the
JN> way
FT>> you want. That way you don't have to pay the cost of actually
JN> indexing
FT>> the data to find out.

JN> Thanks. I did it exactly like you said.

JN> I created a fieldType "ex" (short for experiment), defined
JN> corresponding  and try it on the analysis page. Here is
what
JN> I got (I uploaded the page, so you can see it): 

JN> http://tut-i-tam.com.ua/static/analysis.jsp.htm

JN> I want the final token "samsung spinpoint p spn hard drive gb ata"
to
JN> be the actual "ex" value. So I expect such response:

JN> 
JN> 
JN>  samsung spinpoint p spn hard drive gb
JN> ata
JN>  SP2514N
JN>  Samsung SpinPoint12 P120
SP2514N -
JN> hard drive - 250 GB - ATA-133
JN>  
JN> 

JN> But when I'm searching this doc, I got this:

JN> 
JN> 
JN>  Samsung SpinPoint12 P120 SP2514N - hard
JN> drive - 250 GB - ATA-133
JN>  SP2514N
JN>  Samsung SpinPoint12 P120
SP2514N -
JN> hard drive - 250 GB - ATA-133
JN>  
JN> 

JN> As you can see "description" and "ex" filed are identical.
JN> The result of filter chain wasn't actually stored in the "ex" filed
:(

JN> Anyway, thank you :)

FT>> -Todd

FT>> -Original Message-
FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>> Sent: Wednesday, October 22, 2008 9:24 AM
FT>> To: Feak, Todd
FT>> Subject: Re[2]: Question about copyField


FT>> Thanks for reply. I want to make your point more exact, cause I'm
JN> not
FT>> sure that I correctly understood you :)

FT>> As far as I know (correct me please, if I wrong) type defines the
JN> way
FT>> in which the field is indexed and queried. But I don't want to
index
FT>> or query "suggestion" field in different way, I want "suggestion"
JN> field
FT>> store different value (like in example I wrote in first mail). 

FT>> So you are saying that I can tell to slor (using filedType) how
solr
FT>> should process string before saving it? Yes?

FT>>> The filters and tokenizer that are applied to the copy field are
FT>>> determined by it's type in the schema. Simply create a new field
FT>> type in
FT>>> your schema with the filters you would like, and use that type for
FT>> your
FT>>> copy field. So, the field description would have it's old type,
but
FT>> the
FT>>> field suggestion would get a new type.

FT>>> -Todd Feak

FT>>> -Original Message-
FT>>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>>> Sent: Wednesday, October 22, 2008 8:28 AM
FT>>> To: solr-user@lucene.apache.org
FT>>> Subject: Question about copyField


FT>>> Hello.

FT>>> I have field "description" in my schema. And I want make a filed
FT>>> "suggestion" with the same content. So I added following line to
my
FT>>> schema.xml:

FT>>>

FT>>> But I also want to modify "description" string before copying it
to
FT>>> "suggestion" field. I want to remove all comas, dots and slashes.
FT>> Here
FT>>> is an example of such transformation:

FT>>> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT>>> And so as result I want to have such doc:

FT>>> 
FT>>>  8asydauf9nbcngfaad
FT>>>  TvPL/st, SAMSUNG, SML200
FT>>>  TvPL st SAMSUNG SML200
FT>>> 

FT>>> I think it would be nice to use solr.PatternReplaceFilterFactory
JN> for
FT>>> this purpose. So the question is: Can I use solr filters for
FT>>> processing "description" string before copying it to "suggestion"
FT>>> field?

FT>>> Thank you for your attention.










-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]



RE: Issue with Query Parsing '+' works as 'OR'

2008-10-22 Thread Lance Norskog
URI encoding turns a space into a plus, then (maybe) Lucene takes that as a
space. Also you want a + in front of first_name.

A AND B -> +first_name:joe++last_name:smith

B AND maybe A -> first_name:joe++last_name:smith

Some of us need sample use cases to understand these things; documentation
with only technical definitions don't help much.

Lance

-Original Message-
From: Sunil Sarje [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 20, 2008 9:19 PM
To: solr-user@lucene.apache.org
Subject: Issue with Query Parsing '+' works as 'OR'

I am working with nightly build of Oct 17, 2008  and found the issue that
something wrong with Query Parsing; It takes + as OR

e.g. q=first_name:joe+last_name:smith is behaving as OR instead of AND.
Default operator is set to AND in schema.xml 


Is there any new configuration I need to put in place in order to get this
working ?

Thanks
-Sunil




Re[8]: Question about copyField

2008-10-22 Thread Aleksey Gogolev

Here is it, regex is very simple:















But the problem is not about the filed type. The problem is: how to retrive
final token and put it into the filed. Theoretically I gan retrive
token with AnalysisRequestHandler.

JN> Could you post fieldType specification for "ex"?  What your regex look
JN> like?





JN> -Original Message-
JN> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
JN> Sent: Wednesday, October 22, 2008 11:39 Joe
JN> To: Joe Nguyen
JN> Subject: Re[6]: Question about copyField


JN>> It doesn't need to be a copy field, right?  Could you create a new
JN> field
JN>> "ex", extract value from description, delete digits, and set to "ex"
JN>> field before add/index to solr server?

JN> Yes, I can. I just was wondering can I use solr for this purpose or
JN> not.

JN>> -Original Message-
JN>> From: Feak, Todd [mailto:[EMAIL PROTECTED] 
JN>> Sent: Wednesday, October 22, 2008 11:25 Joe
JN>> To: solr-user@lucene.apache.org
JN>> Subject: RE: Re[4]: Question about copyField

JN>> My bad. I misunderstood what you wanted. 

JN>> The example I gave was for the searching side of things. Not the
JN> data
JN>> representation in the document.

JN>> -Todd

JN>> -Original Message-
JN>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
JN>> Sent: Wednesday, October 22, 2008 11:14 AM
JN>> To: Feak, Todd
JN>> Subject: Re[4]: Question about copyField



FT>>> I would suggest doing this in your schema, then starting up Solr
JN> and
FT>>> using the analysis admin page to see if it will index and search
JN> the
JN>> way
FT>>> you want. That way you don't have to pay the cost of actually
JN>> indexing
FT>>> the data to find out.

JN>> Thanks. I did it exactly like you said.

JN>> I created a fieldType "ex" (short for experiment), defined
JN>> corresponding  and try it on the analysis page. Here is
JN> what
JN>> I got (I uploaded the page, so you can see it): 

JN>> http://tut-i-tam.com.ua/static/analysis.jsp.htm

JN>> I want the final token "samsung spinpoint p spn hard drive gb ata"
JN> to
JN>> be the actual "ex" value. So I expect such response:

JN>> 
JN>> 
JN>>  samsung spinpoint p spn hard drive gb
JN>> ata
JN>>  SP2514N
JN>>  Samsung SpinPoint12 P120
JN> SP2514N -
JN>> hard drive - 250 GB - ATA-133
JN>>  
JN>> 

JN>> But when I'm searching this doc, I got this:

JN>> 
JN>> 
JN>>  Samsung SpinPoint12 P120 SP2514N - hard
JN>> drive - 250 GB - ATA-133
JN>>  SP2514N
JN>>  Samsung SpinPoint12 P120
JN> SP2514N -
JN>> hard drive - 250 GB - ATA-133
JN>>  
JN>> 

JN>> As you can see "description" and "ex" filed are identical.
JN>> The result of filter chain wasn't actually stored in the "ex" filed
JN> :(

JN>> Anyway, thank you :)

FT>>> -Todd

FT>>> -Original Message-
FT>>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT>>> Sent: Wednesday, October 22, 2008 9:24 AM
FT>>> To: Feak, Todd
FT>>> Subject: Re[2]: Question about copyField


FT>>> Thanks for reply. I want to make your point more exact, cause I'm
JN>> not
FT>>> sure that I correctly understood you :)

FT>>> As far as I know (correct me please, if I wrong) type defines the
JN>> way
FT>>> in which the field is indexed and queried. But I don't want to
JN> index
FT>>> or query "suggestion" field in different way, I want "suggestion"
JN>> field
FT>>> store different value (like in example I wrote in first mail). 

FT>>> So you are saying that I can tell to slor (using filedType) how
JN> solr
FT>>> should process string before saving it? Yes?

FT The filters and tokenizer that are applied to the copy field are
FT determined by it's type in the schema. Simply create a new field
FT>>> type in
FT your schema with the filters you would like, and use that type for
FT>>> your
FT copy field. So, the field description would have it's old type,
JN> but
FT>>> the
FT field suggestion would get a new type.

FT -Todd Feak

FT -Original Message-
FT From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] 
FT Sent: Wednesday, October 22, 2008 8:28 AM
FT To: solr-user@lucene.apache.org
FT Subject: Question about copyField


FT Hello.

FT I have field "description" in my schema. And I want make a filed
FT "suggestion" with the same content. So I added following line to
JN> my
FT schema.xml:

FT

FT But I also want to modify "description" string before copying it
JN> to
FT "suggestion" field. I want to remove all comas, dots and slashes.
FT>>> Here
FT is an example of such transformation:

FT "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"

FT And so as result I want to have such doc:

FT 
FT  8asydauf9nbcngfaad
FT  TvPL/st, SAMSUNG, SML200
FT  TvPL st SAMSUNG SML200
FT 

FT I think it would 

Re: Issue with Query Parsing '+' works as 'OR'

2008-10-22 Thread Walter Underwood
To pass a plus sign in a URL parameter, use %2B.

This query:

  foo +bar

Looks like this in a URL:

  q=foo+%2Bbar

wunder

On 10/22/08 11:52 AM, "Lance Norskog" <[EMAIL PROTECTED]> wrote:

> URI encoding turns a space into a plus, then (maybe) Lucene takes that as a
> space. Also you want a + in front of first_name.
> 
> A AND B -> +first_name:joe++last_name:smith
> 
> B AND maybe A -> first_name:joe++last_name:smith
> 
> Some of us need sample use cases to understand these things; documentation
> with only technical definitions don't help much.
> 
> Lance
> 
> -Original Message-
> From: Sunil Sarje [mailto:[EMAIL PROTECTED]
> Sent: Monday, October 20, 2008 9:19 PM
> To: solr-user@lucene.apache.org
> Subject: Issue with Query Parsing '+' works as 'OR'
> 
> I am working with nightly build of Oct 17, 2008  and found the issue that
> something wrong with Query Parsing; It takes + as OR
> 
> e.g. q=first_name:joe+last_name:smith is behaving as OR instead of AND.
> Default operator is set to AND in schema.xml  defaultOperator="AND"/>
> 
> 
> Is there any new configuration I need to put in place in order to get this
> working ?
> 
> Thanks
> -Sunil
> 
> 



Re: Re[6]: Question about copyField

2008-10-22 Thread Shalin Shekhar Mangar
If you want your indexed value changed, you can use an analyzer (either
PatternReplaceFilter or a custom one). If you want the stored value changed,
you can use a custom UpdateRequestProcessor. However, taking care of this in
your application may be easier than bothering with the two particularly if
you have to create your custom plugin.

On Thu, Oct 23, 2008 at 12:08 AM, Aleksey Gogolev <[EMAIL PROTECTED]> wrote:

>
> JN> It doesn't need to be a copy field, right?  Could you create a new
> field
> JN> "ex", extract value from description, delete digits, and set to "ex"
> JN> field before add/index to solr server?
>
> Yes, I can. I just was wondering can I use solr for this purpose or
> not.
>
> JN> -Original Message-
> JN> From: Feak, Todd [mailto:[EMAIL PROTECTED]
> JN> Sent: Wednesday, October 22, 2008 11:25 Joe
> JN> To: solr-user@lucene.apache.org
> JN> Subject: RE: Re[4]: Question about copyField
>
> JN> My bad. I misunderstood what you wanted.
>
> JN> The example I gave was for the searching side of things. Not the data
> JN> representation in the document.
>
> JN> -Todd
>
> JN> -Original Message-
> JN> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED]
> JN> Sent: Wednesday, October 22, 2008 11:14 AM
> JN> To: Feak, Todd
> JN> Subject: Re[4]: Question about copyField
>
>
>
> FT>> I would suggest doing this in your schema, then starting up Solr and
> FT>> using the analysis admin page to see if it will index and search the
> JN> way
> FT>> you want. That way you don't have to pay the cost of actually
> JN> indexing
> FT>> the data to find out.
>
> JN> Thanks. I did it exactly like you said.
>
> JN> I created a fieldType "ex" (short for experiment), defined
> JN> corresponding  and try it on the analysis page. Here is what
> JN> I got (I uploaded the page, so you can see it):
>
> JN> http://tut-i-tam.com.ua/static/analysis.jsp.htm
>
> JN> I want the final token "samsung spinpoint p spn hard drive gb ata" to
> JN> be the actual "ex" value. So I expect such response:
>
> JN> 
> JN> 
> JN>  samsung spinpoint p spn hard drive gb
> JN> ata
> JN>  SP2514N
> JN>  Samsung SpinPoint12 P120 SP2514N -
> JN> hard drive - 250 GB - ATA-133
> JN>  
> JN> 
>
> JN> But when I'm searching this doc, I got this:
>
> JN> 
> JN> 
> JN>  Samsung SpinPoint12 P120 SP2514N - hard
> JN> drive - 250 GB - ATA-133
> JN>  SP2514N
> JN>  Samsung SpinPoint12 P120 SP2514N -
> JN> hard drive - 250 GB - ATA-133
> JN>  
> JN> 
>
> JN> As you can see "description" and "ex" filed are identical.
> JN> The result of filter chain wasn't actually stored in the "ex" filed :(
>
> JN> Anyway, thank you :)
>
> FT>> -Todd
>
> FT>> -Original Message-
> FT>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED]
> FT>> Sent: Wednesday, October 22, 2008 9:24 AM
> FT>> To: Feak, Todd
> FT>> Subject: Re[2]: Question about copyField
>
>
> FT>> Thanks for reply. I want to make your point more exact, cause I'm
> JN> not
> FT>> sure that I correctly understood you :)
>
> FT>> As far as I know (correct me please, if I wrong) type defines the
> JN> way
> FT>> in which the field is indexed and queried. But I don't want to index
> FT>> or query "suggestion" field in different way, I want "suggestion"
> JN> field
> FT>> store different value (like in example I wrote in first mail).
>
> FT>> So you are saying that I can tell to slor (using filedType) how solr
> FT>> should process string before saving it? Yes?
>
> FT>>> The filters and tokenizer that are applied to the copy field are
> FT>>> determined by it's type in the schema. Simply create a new field
> FT>> type in
> FT>>> your schema with the filters you would like, and use that type for
> FT>> your
> FT>>> copy field. So, the field description would have it's old type, but
> FT>> the
> FT>>> field suggestion would get a new type.
>
> FT>>> -Todd Feak
>
> FT>>> -Original Message-
> FT>>> From: Aleksey Gogolev [mailto:[EMAIL PROTECTED]
> FT>>> Sent: Wednesday, October 22, 2008 8:28 AM
> FT>>> To: solr-user@lucene.apache.org
> FT>>> Subject: Question about copyField
>
>
> FT>>> Hello.
>
> FT>>> I have field "description" in my schema. And I want make a filed
> FT>>> "suggestion" with the same content. So I added following line to my
> FT>>> schema.xml:
>
> FT>>>
>
> FT>>> But I also want to modify "description" string before copying it to
> FT>>> "suggestion" field. I want to remove all comas, dots and slashes.
> FT>> Here
> FT>>> is an example of such transformation:
>
> FT>>> "TvPL/st, SAMSUNG, SML200"  => "TvPL st SAMSUNG SML200"
>
> FT>>> And so as result I want to have such doc:
>
> FT>>> 
> FT>>>  8asydauf9nbcngfaad
> FT>>>  TvPL/st, SAMSUNG, SML200
> FT>>>  TvPL st SAMSUNG SML200
> FT>>> 
>
> FT>>> I think it would be nice to use solr.PatternReplaceFilterFactory
> JN> for
> FT>>> this purpose. So the question is: Can I use solr filters for
> FT>>> proces

Re: Issues with facet

2008-10-22 Thread Jeremy Hinegardner
On Tue, Oct 21, 2008 at 06:57:03AM -0700, prerna07 wrote:
> 
> Hi,
> 
> On using Facet in solr query I am facing various issues.
> 
> Scenario 1:
> I have 11 Index with tag : productIndex 
> 
> my search query is appended by facet  parameters :
> facet=true&facet.field=Index_Type_s&qt=dismaxrequest
> 
> The facet node i am getting in solr result is :
>  
> - 
> - 
>   11 
>   11 
>   11 
>   

What does your schema look like?  I am guessing you are using dynamic fields and
have an analyzer on the type for fields that are '*_s' that uses
WordDelimiterFilterFactory with generateWordParts="1".

  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Since facets use the indexed values not the stored values, the indexed value
for "productIndex" would be 3 terms 'product', 'index' and 'productindex'

> 
> According to my understanding I should get only one result, which should be
> like the below mentioned node
> 
> - 
>  11 
>   
> 
> Scenario 2: 
> 
> My index has following fields :
>  In Search of the Shape of the Universe,
> mathamatics 
> 
> My search Query is : 
> facet=true&facet.field=productDescription_s&qt=dismaxrequest
> 
> The result Solr is giving displaying :
> 
> 
> - 
>   1 
>   1 
>   2 
>   1 
>   1 
> 
> 
> I am not able to figure out the facet results. It does noyt contain any 
> result of Universe, It also removes characters from matahmatics and shape.
> 
> Please help me understanding the issue and let me know if any change in
> schema / solrConfig can solve the issue.

I believe that both of these are a result of the Analyzer you are using on your
'*_s' fields.

> 
> Thanks,
> Prerna

enjoy,

-jeremy

-- 

 Jeremy Hinegardner  [EMAIL PROTECTED] 



SolrSharp gone?

2008-10-22 Thread Otis Gospodnetic
Hello,

It looks like we might have lost SolrSharp: 
http://wiki.apache.org/solr/SolrSharp
It looks like its home is http://www.codeplex.com/solrsharp , but the site is 
no longer available.
Does anyone know its status?

There is also http://code.google.com/p/deveel-solr/ , but this seems brand new. 
 Does anyone know what its status is?


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Advice needed on master-slave configuration

2008-10-22 Thread William Pierce

Folks:

I have two instances of solr running one on the master (U) and the other on 
the slave (Q).  Q is used for queries only, while U is where updates/deletes 
are done.   I am running on Windows so unfortunately I cannot use the 
distribution scripts.


Every N hours when changes are committed and the index on U is updated,  I 
want to copy the files from the master to the slave.Do I need to halt 
the solr server on Q while the index is being updated?  If not,  how do I 
copy the files into the data folder while the server is running? Any 
pointers would be greatly appreciated!


Thanks!

- Bill 



Re: SolrSharp gone?

2008-10-22 Thread Ryan McKinley


On Oct 22, 2008, at 4:17 PM, Otis Gospodnetic wrote:


Hello,

It looks like we might have lost SolrSharp: 
http://wiki.apache.org/solr/SolrSharp
It looks like its home is http://www.codeplex.com/solrsharp , but  
the site is no longer available.

Does anyone know its status?



looks like it is there to me...
http://www.codeplex.com/solrsharp/SourceControl/ListDownloadableCommits.aspx

last update was dec '07

ryan


There is also http://code.google.com/p/deveel-solr/ , but this seems  
brand new.  Does anyone know what its status is?



Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





Re: Advice needed on master-slave configuration

2008-10-22 Thread Otis Gospodnetic
Normally you don't have to start Q, but only "reload" Solr searcher when the 
index has been copied.
However, you are on Windows, and its FS has the tendency not to let you 
delete/overwrite files that another app (Solr/java) has opened.  Are you able 
to copy the index from U to Q?  How are you doing it?  Are you deleting index 
files from the index dir on Q that are no longer in the index dir on U?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: William Pierce <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, October 22, 2008 5:24:28 PM
> Subject: Advice needed on master-slave configuration
> 
> Folks:
> 
> I have two instances of solr running one on the master (U) and the other on 
> the slave (Q).  Q is used for queries only, while U is where updates/deletes 
> are done.   I am running on Windows so unfortunately I cannot use the 
> distribution scripts.
> 
> Every N hours when changes are committed and the index on U is updated,  I 
> want to copy the files from the master to the slave.Do I need to halt 
> the solr server on Q while the index is being updated?  If not,  how do I 
> copy the files into the data folder while the server is running? Any 
> pointers would be greatly appreciated!
> 
> Thanks!
> 
> - Bill 



How to search a DataImportHandler solr index

2008-10-22 Thread Nick80

Hi,

I'm using a couple of Solr 1.1 powered indexes and have relied on my "old"
Solr installation for more than two years now. I'm working on a new project
that is a bit complexer than my previous ones and I thought I had a look at
all the new goodies in Solr. One item that caught my attention is the
DataImportHandler.

According to the documentation I read, it allows you among other things to
very easily index one-to-many and many-to-many relationships. Right? What I
cann't find is, how do you search the index? Is it still possible to do
faceting on all the fields? Or isn't that possible? Any information on
searching a fairly complex index build by DataImportHandler is very welcome.
Thanks.

Kind regards,

Nick
-- 
View this message in context: 
http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20120698.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to search a DataImportHandler solr index

2008-10-22 Thread Matthew Runo
DataImportHandler is only a way to get data into your index, from a  
relational database of some sort. It won't affect your Solr reads in  
any way - so everything that Solr normally does will still work the  
same.


(I have not had a chance to look at it in depth, but searching the  
index would be the same as it 'normally' is).


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Oct 22, 2008, at 3:07 PM, Nick80 wrote:



Hi,

I'm using a couple of Solr 1.1 powered indexes and have relied on my  
"old"
Solr installation for more than two years now. I'm working on a new  
project
that is a bit complexer than my previous ones and I thought I had a  
look at

all the new goodies in Solr. One item that caught my attention is the
DataImportHandler.

According to the documentation I read, it allows you among other  
things to
very easily index one-to-many and many-to-many relationships. Right?  
What I
cann't find is, how do you search the index? Is it still possible to  
do

faceting on all the fields? Or isn't that possible? Any information on
searching a fairly complex index build by DataImportHandler is very  
welcome.

Thanks.

Kind regards,

Nick
--
View this message in context: 
http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20120698.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Advice needed on master-slave configuration

2008-10-22 Thread William Pierce

Otis,

Yes,  I had forgotten that Windows will not permit me to overwrite files 
currently in use.   So my copy scripts are failing.  Windows will not even 
allow a rename of a folder containing a file in use so I am not sure how to 
do this


I am going to dig around and see what I can come up with short of 
stopping/restarting tomcat...


Thanks,
- Bill


--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Wednesday, October 22, 2008 2:30 PM
To: 
Subject: Re: Advice needed on master-slave configuration

Normally you don't have to start Q, but only "reload" Solr searcher when 
the index has been copied.
However, you are on Windows, and its FS has the tendency not to let you 
delete/overwrite files that another app (Solr/java) has opened.  Are you 
able to copy the index from U to Q?  How are you doing it?  Are you 
deleting index files from the index dir on Q that are no longer in the 
index dir on U?



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: William Pierce <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, October 22, 2008 5:24:28 PM
Subject: Advice needed on master-slave configuration

Folks:

I have two instances of solr running one on the master (U) and the other 
on
the slave (Q).  Q is used for queries only, while U is where 
updates/deletes

are done.   I am running on Windows so unfortunately I cannot use the
distribution scripts.

Every N hours when changes are committed and the index on U is updated, 
I

want to copy the files from the master to the slave.Do I need to halt
the solr server on Q while the index is being updated?  If not,  how do I
copy the files into the data folder while the server is running? Any
pointers would be greatly appreciated!

Thanks!

- Bill





Re: Solr for Whole Web Search

2008-10-22 Thread Jon Baer
If that is the case you should look @ the DataImportHandler examples  
as they can already index RSS, im doing it now for ~ a dozen feeds on  
an hourly basis.  (This is also for any XML-based feed for XHTML, XML,  
etc).  I find Nutch more useful for plain vanilla HTML (something that  
was built non-dynamic), since otherwise you can bring your DB content  
in that you would have to the page to begin with.  As well as Nutch  
for other types of documents I think (PDF) and anything that Tika (http://incubator.apache.org/tika/ 
) can extract.


- Jon

On Oct 22, 2008, at 11:08 AM, John Martyniak wrote:


Grant thanks for the response.

A couple of other people have recommended trying the Nutch + Solr  
approach, but I am not sure what the real benefit of doing that is.   
Since Nutch provides most of the same features as Solr and Solr has  
some nice additional features (like spell checking, incremental  
index).


So I currently have a Nutch Index of around 500,000+ Urls, but  
expect it to get much bigger.  And am generally pretty happy with  
it, but I just want to make sure that I am going down the correct  
path, for the best feature set.  As far as implementation to the  
front end is concerned, I have been using the Nutch search app as  
basically a webservice to feed the main app (So using RSS).  The  
main app takes that and manipulates the results for display.


As far as the Hadoop + Lucene integration, I haven't used that  
directly just the Hadoop integration with Nutch.  And of course  
Hadoop independently.


-John


On Oct 22, 2008, at 10:08 AM, Grant Ingersoll wrote:



On Oct 22, 2008, at 7:57 AM, John Martyniak wrote:


I am very new to Solr, but I have played with Nutch and Lucene.

Has anybody used Solr for a whole web indexing application?

Which Spider did you use?

How does it compare to Nutch?


There is a patch that combines Nutch + Solr.  Nutch is used for  
crawling, Solr for searching.  Can't say I've used it for whole web  
searching, but I believe some are trying it.


At the end of the day, I'm sure Solr could do it, but it will take  
some work to setup the architecture (distributed, replicated) and  
deal properly with fault tolerance and fail over.There are also  
some examples on Hadoop about Hadoop + Lucene integration.


How big are you talking?




Thanks in advance for all of the info.

-John



--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ















Re: Issues with facet

2008-10-22 Thread prerna07


Thanks, it helped.

We were using *_s fields which had analyser section.

We used   to copy all fields in some other field type and used
this new type in facet. It is working fine now.

Thanks,
Prerna


prerna07 wrote:
> 
> Hi,
> 
> On using Facet in solr query I am facing various issues.
> 
> Scenario 1:
> I have 11 Index with tag : productIndex 
> 
> my search query is appended by facet  parameters :
> facet=true&facet.field=Index_Type_s&qt=dismaxrequest
> 
> The facet node i am getting in solr result is :
>  
> - 
> - 
>   11 
>   11 
>   11 
>   
>   
> 
> According to my understanding I should get only one result, which should
> be like the below mentioned node
> 
> - 
>  11 
>   
> 
> Scenario 2: 
> 
> My index has following fields :
>  In Search of the Shape of the Universe,
> mathamatics 
> 
> My search Query is : 
> facet=true&facet.field=productDescription_s&qt=dismaxrequest
> 
> The result Solr is giving displaying :
> 
> 
> - 
>   1 
>   1 
>   2 
>   1 
>   1 
> 
> 
> I am not able to figure out the facet results. It does noyt contain any 
> result of Universe, It also removes characters from matahmatics and shape.
> 
> Please help me understanding the issue and let me know if any change in
> schema / solrConfig can solve the issue.
> 
> Thanks,
> Prerna
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Issues-with-facet-tp20090842p20123830.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: error with delta import

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
The case in point is DIH. DIH uses the standard DOM parser that comes
w/ JDK. If it reads the xml properly do we need to complain?.  I guess
that data-config.xml may not be used for any other purposes.


On Wed, Oct 22, 2008 at 10:10 PM, Walter Underwood
<[EMAIL PROTECTED]> wrote:
> On 10/22/08 8:57 AM, "Steven A Rowe" <[EMAIL PROTECTED]> wrote:
>
>> Telling people that it's not a problem (or required!) to write 
>> non-well-formed
>> XML, because a particular XML parser can't accept well-formed XML is kind of
>> insidious.
>
> I'm with you all the way on this.
>
> A parser which accepts non-well-formed XML is not an XML parser, since the
> XML spec requires reporting a fatal error.
>
> It is really easy to test these things. Modern browsers have good XML
> parsers, so put your test case in a "test.xml" file and open it in a
> browser. If it isn't well-formed, you'll get an error.
>
> Here is my test XML:
>
> 
>
> Here is what Firefox 3.0.3 says about that:
>
> XML Parsing Error: not well-formed
> Location: file:///Users/wunderwood/Desktop/test.xml
> Line Number 1, Column 18:
>
> 
> -^
>
> wunder
>
>



-- 
--Noble Paul


Re: Advice needed on master-slave configuration

2008-10-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
If you are using a nightly you can try the new SolrReplication feature
http://wiki.apache.org/solr/SolrReplication


On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]> wrote:
> Otis,
>
> Yes,  I had forgotten that Windows will not permit me to overwrite files
> currently in use.   So my copy scripts are failing.  Windows will not even
> allow a rename of a folder containing a file in use so I am not sure how to
> do this
>
> I am going to dig around and see what I can come up with short of
> stopping/restarting tomcat...
>
> Thanks,
> - Bill
>
>
> --
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> Sent: Wednesday, October 22, 2008 2:30 PM
> To: 
> Subject: Re: Advice needed on master-slave configuration
>
>> Normally you don't have to start Q, but only "reload" Solr searcher when
>> the index has been copied.
>> However, you are on Windows, and its FS has the tendency not to let you
>> delete/overwrite files that another app (Solr/java) has opened.  Are you
>> able to copy the index from U to Q?  How are you doing it?  Are you deleting
>> index files from the index dir on Q that are no longer in the index dir on
>> U?
>>
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>>
>>> From: William Pierce <[EMAIL PROTECTED]>
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, October 22, 2008 5:24:28 PM
>>> Subject: Advice needed on master-slave configuration
>>>
>>> Folks:
>>>
>>> I have two instances of solr running one on the master (U) and the other
>>> on
>>> the slave (Q).  Q is used for queries only, while U is where
>>> updates/deletes
>>> are done.   I am running on Windows so unfortunately I cannot use the
>>> distribution scripts.
>>>
>>> Every N hours when changes are committed and the index on U is updated, I
>>> want to copy the files from the master to the slave.Do I need to halt
>>> the solr server on Q while the index is being updated?  If not,  how do I
>>> copy the files into the data folder while the server is running? Any
>>> pointers would be greatly appreciated!
>>>
>>> Thanks!
>>>
>>> - Bill
>>
>>
>



-- 
--Noble Paul