Multiple default search fields or catchall field?

2009-12-08 Thread Thomas Koch
Hi,

I'm indexing feeds and websites referenced by the feeds. So I have as text 
fields:
title - from the feed entries title
description - from the feed entries description
text - the websites text

When the user doesn't define a default search field, then all three fields 
should be used for search. And I need to have highlighting. However it should 
still be possible to search only in title or description.

- Do I need a catchall text field with content copied from all text fields?
- Do I need to store the content in the catchall field as well as in the 
individual fields to get highlighting in every case?
- Isn't it a big waste of hard disc space to store the content two times?

Thanks for any help,

Thomas Koch, http://www.koch.ro


Re: Apache solr for multiple searches

2009-12-08 Thread regany


Bhuvi HN wrote:
 
 Can we have one single instance of the Apache Solr running for both the
 search like Job search and resume search.
 

Yes, you want to run a multicore (multiple index) setup - see:
http://wiki.apache.org/solr/CoreAdmin


-- 
View this message in context: 
http://old.nabble.com/Apache-solr-for-multiple-searches-tp26551563p26690643.html
Sent from the Solr - User mailing list archive at Nabble.com.



How to setup dynamic multicore replication

2009-12-08 Thread Thijs

Hi

I need some help setting up dynamic multicore replication.

We are changing our setup from a replicated single core index with 
multiple document types, as described on the wiki[1], to a dynamic 
multicore setup. We need this so that we can display facets with a zero 
count that are unique to the document 'type'.


So when indexing new documents we want to create new cores on the fly 
using the CoreAdminHandler through SolrJ.


What I can't figure out is how I setup solr.xml and solrconfig.xml so 
that a core automatically is also replicated from the master to it's 
slaves once it's created.


I have a solr.xml that starts like this:

?xml version='1.0' encoding='UTF-8'?
solr persistent=true
  cores adminPath=/admin/cores
  /cores
/solr

and the replication part of solrconfig.xml
master:
requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
str name=replicateAfterstartup/str
str name=replicateAfteroptimize/str
str name=confFilesschema.xml/str
  /lst
/requestHandler

slave:
requestHandler name=/replication class=solr.ReplicationHandler
  lst name=slave
str name=masterUrlhttp://localhost:8081/solr/replication/str
str name=pollInterval00:00:20/str
  /lst
/requestHandler

I think I should change the masterUrl in the slave configuration to 
something like:
str 
name=masterUrlhttp://localhost:8081/solr/${solr.core.name}/replication/str
So that the replication automatically finds the correct core replication 
handler.


But how do I tell the slaves a new core is created, and that is should 
start replicating those to?


Thanks in advance.

Thijs

[1] 
http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index




RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi Regan,

I am using STRING fields only for values that in most cases will be used
to FACET on..
I suggest using TEXT fields as per the default examples...

ALSO, remember that if you do not specify the 
solr.LowerCaseFilterFactory  that your search has just become case
sensitive.. I struggled with that one before, so make sure what you are
indexing is what you are searching for.
* Stick to the default examples that is provided with the SOLR distro
and you should be fine.

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left
to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 06:15
To: solr-user@lucene.apache.org
Subject: Re: why no results?



Tom Hill-7 wrote:
 
 Try solr.TextField instead.
 


Thanks Tom,

I've replaced the types section above with...

types
fieldtype name=string class=solr.TextField
sortMissingLast=true omitNorms=true /
/types


deleted my index, restarted Solr and re-indexed my documents - but the
search still returns nothing.

Do I need to change the type in the fields sections as well?

regan
-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi,

Try changing your TEXT field to type text
field name=text type=text indexed=true
stored=false multiValued=true / (without the  of course :))

That is your problem... also use the text type as per default examples
with SOLR distro :)

Jaco Olivier


-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 05:44
To: solr-user@lucene.apache.org
Subject: why no results?


hi all - newbie solr question - I've indexed some documents and can
search /
receive results using the following schema - BUT ONLY when searching on
the
id field. If I try searching on the title, subtitle, body or text
field I
receive NO results. Very confused. :confused: Can anyone see anything
obvious I'm doing wrong Regan.



?xml version=1.0 ?

schema name=core0 version=1.1

types
fieldtype name=string class=solr.StrField
sortMissingLast=true omitNorms=true /
/types

 fields
!-- general --
field  name=id type=string indexed=true stored=true
multiValued=false required=true /
field name=title type=string indexed=true stored=true
multiValued=false /
field name=subtitle type=string indexed=true
stored=true
multiValued=false /
field name=body type=string indexed=true stored=true
multiValued=false /
field name=text type=string indexed=true stored=false
multiValued=true /
 /fields

 !-- field to use to determine and enforce document uniqueness. --
 uniqueKeyid/uniqueKey

 !-- field for the QueryParser to use when an explicit fieldname is
absent
--
 defaultSearchFieldtext/defaultSearchField

 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=OR/

 !-- copyFields group fields into one single searchable indexed field
for
speed.  --
copyField source=title dest=text /
copyField source=subtitle dest=text /
copyField source=body dest=text /

/schema

-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


Re: How to setup dynamic multicore replication

2009-12-08 Thread Shalin Shekhar Mangar
On Tue, Dec 8, 2009 at 2:43 PM, Thijs vonk.th...@gmail.com wrote:

 Hi

 I need some help setting up dynamic multicore replication.

 We are changing our setup from a replicated single core index with multiple
 document types, as described on the wiki[1], to a dynamic multicore setup.
 We need this so that we can display facets with a zero count that are unique
 to the document 'type'.


If you go by that wiki link, then there is no need to have multiple cores.
It basically says that, in some cases, it is possible to flatten multiple
indexes into one index. Am I missing something?

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to setup dynamic multicore replication

2009-12-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Dec 8, 2009 at 2:43 PM, Thijs vonk.th...@gmail.com wrote:
 Hi

 I need some help setting up dynamic multicore replication.

 We are changing our setup from a replicated single core index with multiple
 document types, as described on the wiki[1], to a dynamic multicore setup.
 We need this so that we can display facets with a zero count that are unique
 to the document 'type'.

 So when indexing new documents we want to create new cores on the fly using
 the CoreAdminHandler through SolrJ.

 What I can't figure out is how I setup solr.xml and solrconfig.xml so that a
 core automatically is also replicated from the master to it's slaves once
 it's created.

 I have a solr.xml that starts like this:

 ?xml version='1.0' encoding='UTF-8'?
 solr persistent=true
  cores adminPath=/admin/cores
  /cores
 /solr

 and the replication part of solrconfig.xml
 master:
 requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
    str name=replicateAfterstartup/str
    str name=replicateAfteroptimize/str
    str name=confFilesschema.xml/str
  /lst
 /requestHandler

 slave:
 requestHandler name=/replication class=solr.ReplicationHandler
  lst name=slave
    str name=masterUrlhttp://localhost:8081/solr/replication/str
    str name=pollInterval00:00:20/str
  /lst
 /requestHandler

 I think I should change the masterUrl in the slave configuration to
 something like:
 str
 name=masterUrlhttp://localhost:8081/solr/${solr.core.name}/replication/str
 So that the replication automatically finds the correct core replication
 handler.
if you have dynamically created cores this is the solution.

 But how do I tell the slaves a new core is created, and that is should start
 replicating those to?

 Thanks in advance.

 Thijs

 [1]
 http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: Apache solr for multiple searches

2009-12-08 Thread Shalin Shekhar Mangar
On Tue, Dec 8, 2009 at 2:28 PM, regany re...@newzealand.co.nz wrote:



 Bhuvi HN wrote:
 
  Can we have one single instance of the Apache Solr running for both the
  search like Job search and resume search.
 

 Yes, you want to run a multicore (multiple index) setup - see:
 http://wiki.apache.org/solr/CoreAdmin



Or you could combine them into the same index. That is usually the easier
solution.

See
http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index

-- 
Regards,
Shalin Shekhar Mangar.


Re: How to setup dynamic multicore replication

2009-12-08 Thread Thijs

If I for example do:

/select?q=type:bookfacet=truefacet.mincount=0facet.field=title
the titles that are returned for the facet query also contains titles 
that are of type dvd. While I only want the unique titles for type book.



On 8-12-2009 12:09, Shalin Shekhar Mangar wrote:

On Tue, Dec 8, 2009 at 2:43 PM, Thijsvonk.th...@gmail.com  wrote:


Hi

I need some help setting up dynamic multicore replication.

We are changing our setup from a replicated single core index with multiple
document types, as described on the wiki[1], to a dynamic multicore setup.
We need this so that we can display facets with a zero count that are unique
to the document 'type'.



If you go by that wiki link, then there is no need to have multiple cores.
It basically says that, in some cases, it is possible to flatten multiple
indexes into one index. Am I missing something?





Re: How to setup dynamic multicore replication

2009-12-08 Thread Thijs

But the slave never gets the message that a core is created...
at least not in my setup...
So it never starts replicating...


On 8-12-2009 12:13, Noble Paul നോബിള്‍  नोब्ळ् wrote:

On Tue, Dec 8, 2009 at 2:43 PM, Thijsvonk.th...@gmail.com  wrote:

Hi

I need some help setting up dynamic multicore replication.

We are changing our setup from a replicated single core index with multiple
document types, as described on the wiki[1], to a dynamic multicore setup.
We need this so that we can display facets with a zero count that are unique
to the document 'type'.

So when indexing new documents we want to create new cores on the fly using
the CoreAdminHandler through SolrJ.

What I can't figure out is how I setup solr.xml and solrconfig.xml so that a
core automatically is also replicated from the master to it's slaves once
it's created.

I have a solr.xml that starts like this:

?xml version='1.0' encoding='UTF-8'?
solr persistent=true
  cores adminPath=/admin/cores
  /cores
/solr

and the replication part of solrconfig.xml
master:
requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
str name=replicateAfterstartup/str
str name=replicateAfteroptimize/str
str name=confFilesschema.xml/str
  /lst
/requestHandler

slave:
requestHandler name=/replication class=solr.ReplicationHandler
  lst name=slave
str name=masterUrlhttp://localhost:8081/solr/replication/str
str name=pollInterval00:00:20/str
  /lst
/requestHandler

I think I should change the masterUrl in the slave configuration to
something like:
str
name=masterUrlhttp://localhost:8081/solr/${solr.core.name}/replication/str
So that the replication automatically finds the correct core replication
handler.

if you have dynamically created cores this is the solution.


But how do I tell the slaves a new core is created, and that is should start
replicating those to?

Thanks in advance.

Thijs

[1]
http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index










test

2009-12-08 Thread Thomas Senior

test


Tika and DIH integration (https://issues.apache.org/jira/browse/SOLR-1358)

2009-12-08 Thread Jorg Heymans
Hi,

I am looking into using Solr for indexing a large database that has
documents (mostly pdf and msoffice) stored as CLOBs in several tables.
It is my understanding that the DIH as provided in Solr 1.4 cannot
index these CLOBs yet, and that SOLR-1358 should provide exactly this.
So i was wondering what the most 'recommended' way is of solving this
.. Should it be done with a custom textextractor of some sort, set on
the column/field ?

Thanks,
Jorg


Re: edismax using bigrams instead of phrases?

2009-12-08 Thread Bill Dueber
On Mon, Dec 7, 2009 at 5:45 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 it would be a mistake to have a pf1 field that was an alias for pf ...
 as it stands the pf parm in dismax is analogous to a pf* or
 pf-Infinity


Of course -- I waswell, let's just pretend I was drunk.

How about pfInf or pfAll?





-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


RE: Facet query with special characters

2009-12-08 Thread Peter 4U

Hello Hoss,

 

Many thanks for your answer.

That's very interesting.

So, are you saying this is an issue on the index side, rather than the query 
side?

Note that I am (supposed to be) indexing/searching without analysis 
tokenization (if that's the correct term) - i.e. field values like 
'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in 'pds', 
'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype).

 

What would be your opinion on the best way to index/analyze/not-analyze such 
fields?

 

Thanks!

Peter


 
 Date: Mon, 7 Dec 2009 15:30:47 -0800
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: Re: Facet query with special characters
 
 
 
 : When performing a facet query where part of the value portion has a 
 : special character (a minus sign in this case), the query returns zero 
 : results unless I put a wildcard (*) at the end.
 
 check your analysis configuration for this fieldtype, in particular look 
 at what debugQuery produces for your parsed query, and look at what 
 analysis.jsp says it will do at query time with the input string 
 pds-comp.domain ... because it sounds like you have a disconnect between 
 how the text is indexed and how it is searched. adding a * to your 
 input query forces it to make a WildcardQuery which doesn't use analysis, 
 so you get a match on the literal token.
 
 in short: i suspect your problem has nothing to do with query string 
 escaping, and everything to do with field tokenization.
 
 
 -Hoss
 
  
_
View your other email accounts from your Hotmail inbox. Add them now.
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: How to setup dynamic multicore replication

2009-12-08 Thread Joe Kessel

Hi,

In my environment I create cores on the fly, then replicate the core to all of 
the slaves.  I first create the core on the master and persist the solr.xml via 
the CoreAdmin  API.  I then do the same on each of my slaves.  After loading / 
committing / optimizing the data on the master I send the replication request 
to each of the slaves.  So each slave's replication handler 
http://slave_host_port/solr/core_name/replication gets a request to fetch 
index which includes the master url 
http://master_host_port/solr/core_name/replication;

 

The slave's solrconf.xml has no mention of the master as it is all done 
progromatically.  You need to specify the core name in the url, and if you 
haven't created the core on the master it will result in error.

 

I don't create a new core everytime I update, but I do have the slaves fetch 
the index after every update.  My first attempt to set the polling did not seem 
to work, and have not had a chance to revisit.  I have not found a way to 
persist the solrconfig.xml with the updates to the slave list, so the control / 
management is within my application.

 

Hope this highlevel overview helps.

 

Joe


 
 Date: Tue, 8 Dec 2009 12:42:12 +0100
 From: vonk.th...@gmail.com
 To: solr-user@lucene.apache.org
 Subject: Re: How to setup dynamic multicore replication
 
 But the slave never gets the message that a core is created...
 at least not in my setup...
 So it never starts replicating...
 
 
 On 8-12-2009 12:13, Noble Paul നോബിള്‍ नोब्ळ् wrote:
  On Tue, Dec 8, 2009 at 2:43 PM, Thijsvonk.th...@gmail.com wrote:
  Hi
 
  I need some help setting up dynamic multicore replication.
 
  We are changing our setup from a replicated single core index with multiple
  document types, as described on the wiki[1], to a dynamic multicore setup.
  We need this so that we can display facets with a zero count that are 
  unique
  to the document 'type'.
 
  So when indexing new documents we want to create new cores on the fly using
  the CoreAdminHandler through SolrJ.
 
  What I can't figure out is how I setup solr.xml and solrconfig.xml so that 
  a
  core automatically is also replicated from the master to it's slaves once
  it's created.
 
  I have a solr.xml that starts like this:
 
  ?xml version='1.0' encoding='UTF-8'?
  solr persistent=true
  cores adminPath=/admin/cores
  /cores
  /solr
 
  and the replication part of solrconfig.xml
  master:
  requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
  str name=replicateAfterstartup/str
  str name=replicateAfteroptimize/str
  str name=confFilesschema.xml/str
  /lst
  /requestHandler
 
  slave:
  requestHandler name=/replication class=solr.ReplicationHandler
  lst name=slave
  str name=masterUrlhttp://localhost:8081/solr/replication/str
  str name=pollInterval00:00:20/str
  /lst
  /requestHandler
 
  I think I should change the masterUrl in the slave configuration to
  something like:
  str
  name=masterUrlhttp://localhost:8081/solr/${solr.core.name}/replication/str
  So that the replication automatically finds the correct core replication
  handler.
  if you have dynamically created cores this is the solution.
 
  But how do I tell the slaves a new core is created, and that is should 
  start
  replicating those to?
 
  Thanks in advance.
 
  Thijs
 
  [1]
  http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index
 
 
 
 
 
 
  
_
Windows Live Hotmail is faster and more secure than ever.
http://www.microsoft.com/windows/windowslive/hotmail_bl1/hotmail_bl1.aspx?ocid=PID23879::T:WLMTAGL:ON:WL:en-ww:WM_IMHM_1:092009

Re: Multiple default search fields or catchall field?

2009-12-08 Thread Erick Erickson
See below.

On Tue, Dec 8, 2009 at 3:48 AM, Thomas Koch tho...@koch.ro wrote:

 Hi,

 I'm indexing feeds and websites referenced by the feeds. So I have as text
 fields:
 title - from the feed entries title
 description - from the feed entries description
 text - the websites text

 When the user doesn't define a default search field, then all three fields
 should be used for search. And I need to have highlighting. However it
 should
 still be possible to search only in title or description.

 - Do I need a catchall text field with content copied from all text fields?


This is a common way to do this. You could also write custom code to munge
the
query, but there's no need to go there as a first option, I'd only think
about
this if you have problems with the catchall approach.



 - Do I need to store the content in the catchall field as well as in the
 individual fields to get highlighting in every case?


No. You don't display the catchall field, so you don't need to store it.


 - Isn't it a big waste of hard disc space to store the content two times?

 Disk space is cheap. It really depends upon how much data you're storing
whether you care. 100M - who cares? 100G - lotsa people care.. But you
don't have to so it's a moot point.

HTH
Erick


 Thanks for any help,

 Thomas Koch, http://www.koch.ro



Re: How to setup dynamic multicore replication

2009-12-08 Thread Thijs

Hi, Thanks.

That was my second option.
But I was hoping that the master and slaves could find that out for 
themselves. As now I have to also have my 'updater software' know about 
all the slaves (and maybe even their state). Which it previously had no 
idea about.


This way I can't just plugin a 'empty' slave that knows where it's 
master is and have it pull in all the required cores and indexes.


Thijs


On 8-12-2009 14:25, Joe Kessel wrote:


Hi,

In my environment I create cores on the fly, then replicate the core to all of the slaves.  I first 
create the core on the master and persist the solr.xml via the CoreAdmin  API.  I then do the same on 
each of my slaves.  After loading / committing / optimizing the data on the master I send the 
replication request to each of the slaves.  So each slave's replication handler 
http://slave_host_port/solr/core_name/replication gets a request to fetch index which 
includes the master url http://master_host_port/solr/core_name/replication;



The slave's solrconf.xml has no mention of the master as it is all done 
progromatically.  You need to specify the core name in the url, and if you 
haven't created the core on the master it will result in error.



I don't create a new core everytime I update, but I do have the slaves fetch 
the index after every update.  My first attempt to set the polling did not seem 
to work, and have not had a chance to revisit.  I have not found a way to 
persist the solrconfig.xml with the updates to the slave list, so the control / 
management is within my application.



Hope this highlevel overview helps.



Joe




Date: Tue, 8 Dec 2009 12:42:12 +0100
From: vonk.th...@gmail.com
To: solr-user@lucene.apache.org
Subject: Re: How to setup dynamic multicore replication

But the slave never gets the message that a core is created...
at least not in my setup...
So it never starts replicating...


On 8-12-2009 12:13, Noble Paul നോബിള്‍ नोब्ळ् wrote:

On Tue, Dec 8, 2009 at 2:43 PM, Thijsvonk.th...@gmail.com  wrote:

Hi

I need some help setting up dynamic multicore replication.

We are changing our setup from a replicated single core index with multiple
document types, as described on the wiki[1], to a dynamic multicore setup.
We need this so that we can display facets with a zero count that are unique
to the document 'type'.

So when indexing new documents we want to create new cores on the fly using
the CoreAdminHandler through SolrJ.

What I can't figure out is how I setup solr.xml and solrconfig.xml so that a
core automatically is also replicated from the master to it's slaves once
it's created.

I have a solr.xml that starts like this:

?xml version='1.0' encoding='UTF-8'?
solr persistent=true
cores adminPath=/admin/cores
/cores
/solr

and the replication part of solrconfig.xml
master:
requestHandler name=/replication class=solr.ReplicationHandler
lst name=master
str name=replicateAfterstartup/str
str name=replicateAfteroptimize/str
str name=confFilesschema.xml/str
/lst
/requestHandler

slave:
requestHandler name=/replication class=solr.ReplicationHandler
lst name=slave
str name=masterUrlhttp://localhost:8081/solr/replication/str
str name=pollInterval00:00:20/str
/lst
/requestHandler

I think I should change the masterUrl in the slave configuration to
something like:
str
name=masterUrlhttp://localhost:8081/solr/${solr.core.name}/replication/str
So that the replication automatically finds the correct core replication
handler.

if you have dynamically created cores this is the solution.


But how do I tell the slaves a new core is created, and that is should start
replicating those to?

Thanks in advance.

Thijs

[1]
http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index










_
Windows Live Hotmail is faster and more secure than ever.
http://www.microsoft.com/windows/windowslive/hotmail_bl1/hotmail_bl1.aspx?ocid=PID23879::T:WLMTAGL:ON:WL:en-ww:WM_IMHM_1:092009




Re: how to do auto-suggest case-insensitive match and return original case field values

2009-12-08 Thread hermida

Hi again,

Just pinging again to any Solr experts out there... sorry that my previous
message was a bit long (wanted to fully explain what I've already done and
where the exact difficulty arises)... but to summarize:

Does anyone know how to use Solr querying with faceting to do an
auto-suggest that search case-insensitively yet returns the original mixed
case values???

thanks for any help,
Leandro

-- 
View this message in context: 
http://old.nabble.com/how-to-do-auto-suggest-case-insensitive-match-and-return-original-case-field-values-tp26636365p26694224.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: # in query

2009-12-08 Thread Joel Nylund

Thanks Eric,

I looked more into this, but still stuck:

I have this field indexed using text_rev

I looked at the luke analysis for this field, but im unsure how to  
read it.


When I query the field by the id I get:

result name=response numFound=1 start=0
−
doc
str name=id5405255/str
str name=textTitle###'s test blog/str
/doc
/result

If I try to query even multiple ### I get nothing.

Here is what luke handler says:  (btw when I used id instead of docid  
on luke I got a nullpointer exception  /admin/luke?docid=5405255  vs / 
admin/luke?id=5405255)


lst name=textTitle
str name=typetext_rev/str
str name=schemaITS---/str
str name=indexITS--/str
int name=docs290329/int
int name=distinct401016/int
−
lst name=topTerms
int name=#1;golb49362/int
int name=blog49362/int
int name=#1;ecapsym29426/int
int name=myspace29426/int
int name=#1;s8773/int
int name=s8773/int
int name=#1;ed8033/int
int name=de8033/int
int name=com6884/int
int name=#1;moc6884/int
/lst
−
lst name=histogram
int name=1308908/int
int name=234340/int
int name=421916/int
int name=814474/int
int name=169122/int
int name=325578/int
int name=643162/int
int name=1281844/int
int name=256910/int
int name=512464/int
int name=1024182/int
int name=204872/int
int name=409626/int
int name=819212/int
int name=163842/int
int name=327682/int
int name=655362/int
/lst
/lst


solr/select?q=textTitle:%23%23%23  - gets no results.

I have the same field indexed as a alphaOnlySort, and it gives me lots  
of results, but not the ones I want.


Any other ideas?

thanks
Joel


On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:


Well, the very first thing I would is examine the field definition in
your schema file. I suspect that the tokenizers and/or
filters you're using for indexing and/or querying is doing something
to the # symbol. Most likely stripping it. If you're just searching
for the single-letter term #, I *think* the query parser silently  
just

drops that part of the clause out, but check on that.

The second thing would be to get a copy of Luke and examine your
index to see if what you *think* is in your index actually is there.

HTH
Erick

On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund jnyl...@yahoo.com wrote:

ok thanks,  sorry my brain wasn't working, but even when I url  
encode it, I
dont get any results, is there something special I have to do for  
solr?


thanks
Joel


On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:

Sure you have to escape it! %23


otherwise the browser considers it as a separator between the URL  
for the
server (on the left) and the fragment identifier (on the right)  
which is not

sent the server.

You might want to read about URL-encoding, escaping with  
backslash is a

shell-thing, not a thing for URLs!

paul


Le 07-déc.-09 à 21:16, Joel Nylund a écrit :

Hi,


How can I put a # sign in a query, do I need to escape it?

For example I want to query books with title that contain #

No work so far:
http://localhost:8983/solr/select?q=textTitle:#;
http://localhost:8983/solr/select?q=textTitle:#
http://localhost:8983/solr/select?q=textTitle:\#;

Getting
org.apache.lucene.queryParser.ParseException: Cannot parse  
'textTitle:\':

Lexical error at line 1, column 12.  Encountered: EOF after : 

and sometimes just no response.


thanks
Joel










Re: Tika and DIH integration (https://issues.apache.org/jira/browse/SOLR-1358)

2009-12-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
we are very close to resolving SOLR-1358 . So you may be able to use it

On Tue, Dec 8, 2009 at 5:32 PM, Jorg Heymans jorg.heym...@gmail.com wrote:
 Hi,

 I am looking into using Solr for indexing a large database that has
 documents (mostly pdf and msoffice) stored as CLOBs in several tables.
 It is my understanding that the DIH as provided in Solr 1.4 cannot
 index these CLOBs yet, and that SOLR-1358 should provide exactly this.
 So i was wondering what the most 'recommended' way is of solving this
 .. Should it be done with a custom textextractor of some sort, set on
 the column/field ?

 Thanks,
 Jorg




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: # in query

2009-12-08 Thread Erick Erickson
In Luke, there's a tab that will let you go to a document ID. From there
you can see all the fields in a particular document, and examine what
the actual tokens stored are. Until and unless you know what tokens
are being indexed, you simply can't know what your queries should look
like...

*Assuming* that the ### are getting indexed and *assuming* your tokenizer
tokenized on, whitespace, and *assuming* that by text_rev you
are talking about ReversedWildcardFilterFactory, I
wouldn't expect a search to match if it wasn't exactly:
s'###. But as you see, there's a long chain of assumptions there any
one of which may be violated by your schema. So please post the
relevant portions of your schema to make it easier to help.

Best
Erick


On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund jnyl...@yahoo.com wrote:

 Thanks Eric,

 I looked more into this, but still stuck:

 I have this field indexed using text_rev

 I looked at the luke analysis for this field, but im unsure how to read it.

 When I query the field by the id I get:

 result name=response numFound=1 start=0
 -
 doc
 str name=id5405255/str
 str name=textTitle###'s test blog/str
 /doc
 /result

 If I try to query even multiple ### I get nothing.

 Here is what luke handler says:  (btw when I used id instead of docid on
 luke I got a nullpointer exception  /admin/luke?docid=5405255  vs
 /admin/luke?id=5405255)

 lst name=textTitle
 str name=typetext_rev/str
 str name=schemaITS---/str
 str name=indexITS--/str
 int name=docs290329/int
 int name=distinct401016/int
 -
 lst name=topTerms
 int name=#1;golb49362/int
 int name=blog49362/int
 int name=#1;ecapsym29426/int
 int name=myspace29426/int
 int name=#1;s8773/int
 int name=s8773/int
 int name=#1;ed8033/int
 int name=de8033/int
 int name=com6884/int
 int name=#1;moc6884/int
 /lst
 -
 lst name=histogram
 int name=1308908/int
 int name=234340/int
 int name=421916/int
 int name=814474/int
 int name=169122/int
 int name=325578/int
 int name=643162/int
 int name=1281844/int
 int name=256910/int
 int name=512464/int
 int name=1024182/int
 int name=204872/int
 int name=409626/int
 int name=819212/int
 int name=163842/int
 int name=327682/int
 int name=655362/int
 /lst
 /lst


 solr/select?q=textTitle:%23%23%23  - gets no results.

 I have the same field indexed as a alphaOnlySort, and it gives me lots of
 results, but not the ones I want.

 Any other ideas?

 thanks
 Joel



 On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:

  Well, the very first thing I would is examine the field definition in
 your schema file. I suspect that the tokenizers and/or
 filters you're using for indexing and/or querying is doing something
 to the # symbol. Most likely stripping it. If you're just searching
 for the single-letter term #, I *think* the query parser silently just
 drops that part of the clause out, but check on that.

 The second thing would be to get a copy of Luke and examine your
 index to see if what you *think* is in your index actually is there.

 HTH
 Erick

 On Mon, Dec 7, 2009 at 3:28 PM, Joel Nylund jnyl...@yahoo.com wrote:

  ok thanks,  sorry my brain wasn't working, but even when I url encode it,
 I
 dont get any results, is there something special I have to do for solr?

 thanks
 Joel


 On Dec 7, 2009, at 3:20 PM, Paul Libbrecht wrote:

 Sure you have to escape it! %23


 otherwise the browser considers it as a separator between the URL for
 the
 server (on the left) and the fragment identifier (on the right) which is
 not
 sent the server.

 You might want to read about URL-encoding, escaping with backslash is
 a
 shell-thing, not a thing for URLs!

 paul


 Le 07-déc.-09 à 21:16, Joel Nylund a écrit :

 Hi,


 How can I put a # sign in a query, do I need to escape it?

 For example I want to query books with title that contain #

 No work so far:
 http://localhost:8983/solr/select?q=textTitle:#;
 http://localhost:8983/solr/select?q=textTitle:#
 http://localhost:8983/solr/select?q=textTitle:\#;

 Getting
 org.apache.lucene.queryParser.ParseException: Cannot parse
 'textTitle:\':
 Lexical error at line 1, column 12.  Encountered: EOF after : 

 and sometimes just no response.


 thanks
 Joel








Re: Exception encountered during replication on slave....Any clues?

2009-12-08 Thread William Pierce

Hi, Noble:

When I hit the masterUrl from the slave box at

http://localhost:8080/postingsmaster/replication

I get the following xml response:

?xml version=1.0 encoding=UTF-8 ?
   - response
   - lst name=responseHeader
int name=status0/int
   int name=QTime0/int
   /lst
   str name=statusOK/str
str name=messageNo command/str
 /response

And then when I look in the logs,  I see the exception that I mentioned. 
What exactly does this error mean that replication is not available.By 
the way, when I go to the admin url for the slave and click on replication, 
I see a screen with the master url listed (as above) and the word 
unreachable after it.And, of course, the same exception shows up in 
the tomcat logs.


Thanks,

- Bill

--
From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com
Sent: Monday, December 07, 2009 9:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Exception encountered during replication on slaveAny clues?


are you able to hit the
http://localhost:8080/postingsmaster/replication using a browser from
the slave box. if you are able to hit it what do you see?


On Tue, Dec 8, 2009 at 3:42 AM, William Pierce evalsi...@hotmail.com
wrote:

Just to make doubly sure,  per tck's suggestion,  I went in and
explicitly
added in the port in the masterurl so that it now reads:

http://localhost:8080/postingsmaster/replication

Still getting the same exception...

I am running solr 1.4, on Ubuntu karmic, using tomcat 6 and Java 1.6.

Thanks,

- Bill

--
From: William Pierce evalsi...@hotmail.com
Sent: Monday, December 07, 2009 2:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Exception encountered during replication on slaveAny
clues?


tck,

thanks for your quick response.  I am running on the default port
(8080).
If I copy that exact string given in the masterUrl and execute it in the
browser I get a response from solr:

?xml version=1.0 encoding=UTF-8 ?
- response
- lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 str name=statusOK/str
 str name=messageNo command/str
 /response

So the masterUrl is reachable/accessible so far as I am able to tell

Thanks,

- Bill

--
From: TCK moonwatcher32...@gmail.com
Sent: Monday, December 07, 2009 1:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Exception encountered during replication on slaveAny
clues?


are you missing the port number in the master's url ?

-tck



On Mon, Dec 7, 2009 at 4:44 PM, William Pierce
evalsi...@hotmail.comwrote:


Folks:

I am seeing this exception in my logs that is causing my replication
to
fail.I start with  a clean slate (empty data directory).  I index
the
data on the postingsmaster using the dataimport handler and it
succeeds.
 When the replication slave attempts to replicate it encounters this
error.

Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller
fetchLatestIndex
SEVERE: Master at: http://localhost/postingsmaster/replication is not
available. Index fetch failed. Exception: Invalid version or the data
in
not
in 'javabin' format

Any clues as to what I should look for to debug this further?

Replication is enabled as follows:

The postingsmaster solrconfig.xml looks as follows:

requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
!--Replicate on 'optimize' it can also be  'commit' --
str name=replicateAftercommit/str
!--If configuration files need to be replicated give the names
here
.
comma separated --
str name=confFiles/str
  /lst
 /requestHandler

The postings slave solrconfig.xml looks as follows:

requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=slave
  !--fully qualified url for the replication handler of
master --
  str
name=masterUrlhttp://localhost/postingsmaster/replication
/str
  !--Interval in which the slave should poll master .Format is
HH:mm:ss . If this is absent slave does not poll automatically.
   But a snappull can be triggered from the admin or the http API
--
  str name=pollInterval00:05:00/str
   /lst
 /requestHandler


Thanks,

- Bill













--
-
Noble Paul | Systems Architect| AOL | http://aol.com



RE: Facet query with special characters

2009-12-08 Thread Chris Hostetter

: Note that I am (supposed to be) indexing/searching without analysis 
: tokenization (if that's the correct term) - i.e. field values like 
: 'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in 
: 'pds', 'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype).
...
: What would be your opinion on the best way to index/analyze/not-analyze such 
fields?

a whitespace tokenizer is probeably the best bet, but in order to be 
certain what's going on, you would need to look at a few things (and if 
you wanted help from other people, you would need to post those things) 
that i mentioned before

:  check your analysis configuration for this fieldtype, in particular look 
:  at what debugQuery produces for your parsed query, and look at what 
:  analysis.jsp says it will do at query time with the input string 
:  pds-comp.domain ... because it sounds like you have a disconnect between 
:  how the text is indexed and how it is searched. adding a * to your 

...so what does your schema look like, what is the outputfrom debugQuery, 
what is the output from analysis.jsp, etc...

-Hoss



Re: KStem download

2009-12-08 Thread darniz

Hi Guys 
I still have this problem
I got the fresh release of apache solr 1.4
i added decleration of Kstemmer in my schema.xml and put the two jars files
under 
\example\lib folder.


I some how think its not able to find the solr home, looking at the error.

If i make a nightly distribution build and upload the war file in tomcat and
in the tomcat webapp i specify the 
solr.home property to point to example\solr folder and in the example\solr
folder i placed  a lib folder in which i copied the two jar files, then
tomcat works fine.


SEVERE: Could not start SOLR. Check solr/home property
java.lang.NoClassDefFoundError:
org/apache/solr/util/plugin/ResourceLoaderAware
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375)
at java.lang.ClassLoader.loadClass(ClassLoader.java:300)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:592)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:84)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
at
org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:835)
at
org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:424)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:414)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:456)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:95)
at org.apache.solr.core.SolrCore.init(SolrCore.java:520)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.ClassNotFoundException:

Re: how to do auto-suggest case-insensitive match and return original case field values

2009-12-08 Thread Chris Hostetter

: In my web application I want to set up auto-suggest as you type
: functionality which will search case-insensitively yet return the original
: case terms.  It doesn't seem like TermsComponent can do this as it can only
: return the lowercase indexed terms your are searching against, not the
...
: which provides useful sorting by and returning of term frequency counts in
: your index.  How does one get this same information with regular Solr Query? 
: I set up the following prefix query, searching by the indexed lowercased
: field and returning the other:

The type of approach you are describing (doing a prefix based query for 
autosuggest) probably won't work very well unless your index is 100% 
designed just for the autosuggest ... if it's an index about products, and 
you're just using one of hte fields for autosuggest, you aren't going to 
get good autosuggest results because the same word is going to appear in 
multiple products.  what you need is an index of *words* that you want to 
autosuggest, with fields indicating how important those words are that you 
can use in a function query (this replaces the term freq that 
TermComponent would use)

the fact that your test field is multivalued and stores widly different 
things in each doc is an example of what i mean.

Have you considered the possibility of just indexing the lowercase value 
concatenated with the regular case value using a special delimiter, and 
ten returning to your TermComponent based solution?  index PowerPoint 
as powerpoint|PowerPoint and just split on the \ character when you 
get hte data back from your prefix based term lookup.


-Hoss



Re: java.lang.NoSuchMethodError: org.apache.commons.httpclient.HttpConnectionManager.getParams()Lorg/apache/commons/httpclient/params/HttpConnectionManagerParams;

2009-12-08 Thread Chris Hostetter

: Strangely i dont get this error when i execute this code from command line.
: This error only occurs when i access it from a web application. Secondly,
: this same method works fine with another web application. Both web
...
: java.lang.NoSuchMethodError:
: 
org.apache.commons.httpclient.HttpConnectionManager.getParams()Lorg/apache/commons/httpclient/params/HttpConnectionManagerParams;
:   at
: 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.setDefaultMaxConnectionsPerHost(CommonsHttpSolrServer.java:455)

...if you only get this problem in some enviornments, then i suspect there 
is something wrong with those environments (and clearly not with the 
code).I would start by checking every jar in all of your environments 
and making sure you don't have multiple copies of hte same jar (in 
different versions) mistakenly installed.



-Hoss



indexing XML with solr example webapp - out of java heap space

2009-12-08 Thread Feroze Daud
Hi!



I downloaded SOLR and am trying to index an XML file. This XML file is
huge (500M).

 

When I try to index it using the post.jar tool in example\exampledocs,
I get a out of java heap space error in the SimplePostTool
application.

 

Any ideas how to fix this? Passing in -Xms1024M does not fix it.

 

Feroze.

 

 



Re: Solr Admin XPath

2009-12-08 Thread Chris Hostetter

Wild shots in the dark:

 * remove the white psace arround the = characters
 * replace the single-quote characters with double quote characters

: XPathExpression reqPerSec = 
xpath.compile(/solr/solr-info/QUERYHANDLER/entry[name = 
'dismax']/stats/st...@name = 'avgRequestsPerSecond']);
...
: This doesn't throw any errors, and the XPath works just fine in /any/ XPath 
tester I try... except Java. 


-Hoss



Re: # in query

2009-12-08 Thread Joel Nylund
ok, I just realized I was using the luke handler, didnt know there was  
a fat client, I assume thats what you are talking about.


I downloaded the lukeall.jar, ran it, pointed to my index, found the  
document in question, didn't see how it was tokenized, but I clicked  
the reconstruct  edit button,


this gives me a tab that has tokenized per field, for this field it  
shows:



s|s, ecapsym|myspace, golb|blog

title is: ###'s myspace blog

schema is:

 !-- A general unstemmed text field that indexes tokens normally and  
also
 reversed (via ReversedWildcardFilterFactory), to enable more  
efficient

 leading wildcard queries. --
fieldType name=text_rev class=solr.TextField  
positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=1  
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory  
withOriginal=true
   maxPosAsterisk=3 maxPosQuestion=2  
maxFractionAsterisk=0.33/

  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=0  
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/

filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


	field name=textTitle type=text_rev indexed=true stored=true  
required=false multiValued=false/




thanks
Joel




On Dec 8, 2009, at 11:14 AM, Erick Erickson wrote:

In Luke, there's a tab that will let you go to a document ID. From  
there

you can see all the fields in a particular document, and examine what
the actual tokens stored are. Until and unless you know what tokens
are being indexed, you simply can't know what your queries should look
like...

*Assuming* that the ### are getting indexed and *assuming* your  
tokenizer

tokenized on, whitespace, and *assuming* that by text_rev you
are talking about ReversedWildcardFilterFactory, I
wouldn't expect a search to match if it wasn't exactly:
s'###. But as you see, there's a long chain of assumptions there  
any

one of which may be violated by your schema. So please post the
relevant portions of your schema to make it easier to help.

Best
Erick


On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund jnyl...@yahoo.com wrote:


Thanks Eric,

I looked more into this, but still stuck:

I have this field indexed using text_rev

I looked at the luke analysis for this field, but im unsure how to  
read it.


When I query the field by the id I get:

result name=response numFound=1 start=0
-
doc
str name=id5405255/str
str name=textTitle###'s test blog/str
/doc
/result

If I try to query even multiple ### I get nothing.

Here is what luke handler says:  (btw when I used id instead of  
docid on

luke I got a nullpointer exception  /admin/luke?docid=5405255  vs
/admin/luke?id=5405255)

lst name=textTitle
str name=typetext_rev/str
str name=schemaITS---/str
str name=indexITS--/str
int name=docs290329/int
int name=distinct401016/int
-
lst name=topTerms
int name=#1;golb49362/int
int name=blog49362/int
int name=#1;ecapsym29426/int
int name=myspace29426/int
int name=#1;s8773/int
int name=s8773/int
int name=#1;ed8033/int
int name=de8033/int
int name=com6884/int
int name=#1;moc6884/int
/lst
-
lst name=histogram
int name=1308908/int
int name=234340/int
int name=421916/int
int name=814474/int
int name=169122/int
int name=325578/int
int name=643162/int
int name=1281844/int
int name=256910/int
int name=512464/int
int name=1024182/int
int name=204872/int
int name=409626/int
int name=819212/int
int name=163842/int
int name=327682/int
int name=655362/int
/lst
/lst


solr/select?q=textTitle:%23%23%23  - gets no results.

I have the same field indexed as a alphaOnlySort, and it gives me  
lots of

results, but not the ones I want.

Any other ideas?

thanks
Joel



On Dec 7, 2009, at 3:42 PM, Erick Erickson wrote:

Well, the very first thing I would is examine the field definition in

your schema file. I suspect that the tokenizers and/or
filters you're using for indexing and/or querying is doing something
to the # symbol. Most likely stripping it. If you're just searching
for the single-letter term #, I *think* the query parser  
silently just

drops that part of the clause out, but check on that.

The second thing would be to get a copy of Luke and 

Re: WELCOME to solr-user@lucene.apache.org

2009-12-08 Thread Chris Hostetter

(FYI: in the future please start a new thread with an approriate subject 
line when you ask questions -- you probably would have gotten a lot more 
responses fro people interested in Tika and SolrCell if they could tell 
that this email was about SolrCell)

: I found that Tika read the html and extract metadata like meta name=id
: content=12 from my htmls but my documents has an already an id setted by
: literal.id=10.
: 
: I tried to map the id from Tika by fmap.id=ignored_ but it ignore also my
: literal.id

H, yeah: that seems like  an odd order of operations, but it's 
documented on the wiki so evidently it's intentional...

http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations

my best sugguestions:

 * use the capture param to restrict what gets extracted (it's probably
possible to write an XPath query that selects everything *except* 
metadata[id])
 * change the name of your uniqueKey field to be something other then id 
so it's less likely to collide with a value from the document.

I also opened two Jira issues that you may want to post comments in...

https://issues.apache.org/jira/browse/SOLR-1633
https://issues.apache.org/jira/browse/SOLR-1634


-Hoss



Case Insensitive search not working

2009-12-08 Thread insaneyogi3008

Hello,

I tried to force case insensitive search by having the following setting in
my schema.xml file which I guess is standard for Case sensitive searches :

fieldType name=text_ws class=solr.TextField positionIncrementGap=100
 analyzer type = index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class = solr.LowerCaseFilterFactory/
  /analyzer

analyzer type=query
tokenizer class = solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/

/analyzer
/fieldType


However when I perform searches on San Jose  san jose , I get 16  0
responses back respectively is there anything else I missing here ? 


-- 
View this message in context: 
http://old.nabble.com/Case-Insensitive-search-not-working-tp26699734p26699734.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to do auto-suggest case-insensitive match and return original case field values

2009-12-08 Thread Uri Boness

Just updated SOLR-1625 to support regexp hints.

https://issues.apache.org/jira/browse/SOLR-1625

Cheers,
Uri

Chris Hostetter wrote:

: In my web application I want to set up auto-suggest as you type
: functionality which will search case-insensitively yet return the original
: case terms.  It doesn't seem like TermsComponent can do this as it can only
: return the lowercase indexed terms your are searching against, not the
...
: which provides useful sorting by and returning of term frequency counts in
: your index.  How does one get this same information with regular Solr Query? 
: I set up the following prefix query, searching by the indexed lowercased

: field and returning the other:

The type of approach you are describing (doing a prefix based query for 
autosuggest) probably won't work very well unless your index is 100% 
designed just for the autosuggest ... if it's an index about products, and 
you're just using one of hte fields for autosuggest, you aren't going to 
get good autosuggest results because the same word is going to appear in 
multiple products.  what you need is an index of *words* that you want to 
autosuggest, with fields indicating how important those words are that you 
can use in a function query (this replaces the term freq that 
TermComponent would use)


the fact that your test field is multivalued and stores widly different 
things in each doc is an example of what i mean.


Have you considered the possibility of just indexing the lowercase value 
concatenated with the regular case value using a special delimiter, and 
ten returning to your TermComponent based solution?  index PowerPoint 
as powerpoint|PowerPoint and just split on the \ character when you 
get hte data back from your prefix based term lookup.



-Hoss


  


Re: Case Insensitive search not working

2009-12-08 Thread Tom Hill
Did you rebuild the index? Changing the analyzer for the index doesn't
affect already indexed documents.

Tom


On Tue, Dec 8, 2009 at 11:57 AM, insaneyogi3008 insaney...@gmail.comwrote:


 Hello,

 I tried to force case insensitive search by having the following setting in
 my schema.xml file which I guess is standard for Case sensitive searches :

 fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
 analyzer type = index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class = solr.LowerCaseFilterFactory/
  /analyzer

analyzer type=query
tokenizer class = solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/

/analyzer
/fieldType


 However when I perform searches on San Jose  san jose , I get 16  0
 responses back respectively is there anything else I missing here ?


 --
 View this message in context:
 http://old.nabble.com/Case-Insensitive-search-not-working-tp26699734p26699734.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: # in query

2009-12-08 Thread Erick Erickson
Sorry, I usually think of things in Lucene land and reflexively think of the
fat client.

Anyway, here's your problem I think...

WordDelimiterFilterFactory. See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

It's losing the # altogether, as indicated by the
tokens you saw:
s|s,  ecapsym|myspace,  golb|blog
not a # in sight.

It's kind of subtle, but in the above page entry, this phrase implies that
all non
alpha-numerics are dropped: (by default, all non alpha-numeric characters)

title is: ###'s myspace blog

I'm assuming that the Title (if you're looking at it in Luke)
is giving back your stored value. The tokens are what count
during searching, storing and indexing are orthogonal

HTH
Erick

On Tue, Dec 8, 2009 at 2:25 PM, Joel Nylund jnyl...@yahoo.com wrote:

 ok, I just realized I was using the luke handler, didnt know there was a
 fat client, I assume thats what you are talking about.

 I downloaded the lukeall.jar, ran it, pointed to my index, found the
 document in question, didn't see how it was tokenized, but I clicked the
 reconstruct  edit button,

 this gives me a tab that has tokenized per field, for this field it shows:


  s|s,  ecapsym|myspace,  golb|blog

 title is: ###'s myspace blog

 schema is:

  !-- A general unstemmed text field that indexes tokens normally and also
 reversed (via ReversedWildcardFilterFactory), to enable more
 efficient
 leading wildcard queries. --
fieldType name=text_rev class=solr.TextField
 positionIncrementGap=100

  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory
 withOriginal=true
   maxPosAsterisk=3 maxPosQuestion=2
 maxFractionAsterisk=0.33/

  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/

filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


field name=textTitle type=text_rev indexed=true stored=true
 required=false multiValued=false/



 thanks
 Joel





 On Dec 8, 2009, at 11:14 AM, Erick Erickson wrote:

  In Luke, there's a tab that will let you go to a document ID. From there
 you can see all the fields in a particular document, and examine what
 the actual tokens stored are. Until and unless you know what tokens
 are being indexed, you simply can't know what your queries should look
 like...

 *Assuming* that the ### are getting indexed and *assuming* your tokenizer
 tokenized on, whitespace, and *assuming* that by text_rev you
 are talking about ReversedWildcardFilterFactory, I
 wouldn't expect a search to match if it wasn't exactly:
 s'###. But as you see, there's a long chain of assumptions there any
 one of which may be violated by your schema. So please post the
 relevant portions of your schema to make it easier to help.

 Best
 Erick


 On Tue, Dec 8, 2009 at 9:54 AM, Joel Nylund jnyl...@yahoo.com wrote:

  Thanks Eric,

 I looked more into this, but still stuck:

 I have this field indexed using text_rev

 I looked at the luke analysis for this field, but im unsure how to read
 it.

 When I query the field by the id I get:

 result name=response numFound=1 start=0
 -
 doc
 str name=id5405255/str
 str name=textTitle###'s test blog/str
 /doc
 /result

 If I try to query even multiple ### I get nothing.

 Here is what luke handler says:  (btw when I used id instead of docid on
 luke I got a nullpointer exception  /admin/luke?docid=5405255  vs
 /admin/luke?id=5405255)

 lst name=textTitle
 str name=typetext_rev/str
 str name=schemaITS---/str
 str name=indexITS--/str
 int name=docs290329/int
 int name=distinct401016/int
 -
 lst name=topTerms
 int name=#1;golb49362/int
 int name=blog49362/int
 int name=#1;ecapsym29426/int
 int name=myspace29426/int
 int name=#1;s8773/int
 int name=s8773/int
 int name=#1;ed8033/int
 int name=de8033/int
 int name=com6884/int
 int name=#1;moc6884/int
 /lst
 -
 lst name=histogram
 int name=1308908/int
 int name=234340/int
 int name=421916/int
 int name=814474/int
 int name=169122/int
 int name=325578/int
 int name=643162/int
 int name=1281844/int
 int name=256910/int
 

About fsv (sort field falues)

2009-12-08 Thread Marc Sturlese

I am tracing QueryComponent.java and would like to know the pourpose of doFSV
function. Don't understand what fsv are for.
Have tried some queries with fsv=true and some extra info apears in the
response:

lst name=sort_values/

But don't know what is it for and can't find much info out there. I read:
// The query cache doesn't currently store sort field values, and
SolrIndexSearcher doesn't
// currently have an option to return sort field values.  Because of
this, we
// take the documents given and re-derive the sort values.
Is it for cache pourposes?
Thanks in advance!

-- 
View this message in context: 
http://old.nabble.com/About-fsv-%28sort-field-falues%29-tp26700729p26700729.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: About fsv (sort field falues)

2009-12-08 Thread Yonik Seeley
On Tue, Dec 8, 2009 at 4:04 PM, Marc Sturlese marc.sturl...@gmail.com wrote:
 I am tracing QueryComponent.java and would like to know the pourpose of doFSV
 function. Don't understand what fsv are for.
 Have tried some queries with fsv=true and some extra info apears in the
 response:

 lst name=sort_values/

It's currently an internal feature (i.e. back compat is not
guaranteed) used for merging search results in a distributed search.
It contains the sort values (i.e. what was used to sort the documents)
for everything but score.

-Yonik
http://www.lucidimagination.com


Re: how to do auto-suggest case-insensitive match and return original case field values

2009-12-08 Thread hermida

Hello,

Thanks for the reply (see below)


hossman wrote:
 
 The type of approach you are describing (doing a prefix based query for 
 autosuggest) probably won't work very well unless your index is 100% 
 designed just for the autosuggest ... if it's an index about products, and 
 you're just using one of hte fields for autosuggest, you aren't going to 
 get good autosuggest results because the same word is going to appear in 
 multiple products.  what you need is an index of *words* that you want to 
 autosuggest, with fields indicating how important those words are that you 
 can use in a function query (this replaces the term freq that 
 TermComponent would use)
 
 the fact that your test field is multivalued and stores widly different 
 things in each doc is an example of what i mean.
 

I am using Solr to index biological annotations about proteins (which my
documents). There is no tokenization or special analysis of the annotation
text strings as they are not free text, each annotation is a single token. 
Also, for the purpose of my auto-suggest and searching there are actually no
different types of annotations, that's why they all go into the same
multivalued field for each protein document.  I want to use the auto-suggest
and search to help biologists (who know the annotation terminology) find all
the protein documents with the annotation they are thinking of, and to
suggest what is available as they type.  The thing is that in my field
letter case can be important define the meaning of an annotation, but the
biologist might not remember the exact case.  Therefore I want them to be
able to type in what ever case and the auto-suggest will pull up as they
type annotations with the correct case to assist them.

Let's just take the fundamental question, independent of any example:  is it
possible to do a case-insensitive prefix search using faceting (to get the
term suggestions) that also returns the originally mixed case terms of *all*
those terms listed in lowercase in the facet list?  The only other post I
saw in this forum on this topic a user seemed to think this was easily
doable, but I don't think they actually tried to do it because the faceted
search doesn't seem possible, you run into all these problems.  It just
isn't something Solr/Lucene can actually do the way it is organized.


hossman wrote:
 
 Have you considered the possibility of just indexing the lowercase value 
 concatenated with the regular case value using a special delimiter, and 
 ten returning to your TermComponent based solution?  index PowerPoint 
 as powerpoint|PowerPoint and just split on the \ character when you 
 get hte data back from your prefix based term lookup.
 

I think this is a good workaround, will definitely try it!

leandro

-- 
View this message in context: 
http://old.nabble.com/how-to-do-auto-suggest-case-insensitive-match-and-return-original-case-field-values-tp26636365p2670.html
Sent from the Solr - User mailing list archive at Nabble.com.



do copyField's need to exist as Fields?

2009-12-08 Thread regany

Hello!

(solr newbie alert)

I want to pass 4 fields into Solr

1. id (unique)
2. title
3. subtitle
4. body

but only want to index and store 2:

1. id (unique)
2. text (copyField of id, title, subtitle, body).

The search then searches on text, and returns only matching id's.

When I set up the 2 fields, and the copyFields, it doesn't seem to work. I'm
guessing for a copyField to work you need to have fields with the same name
already set.

Is there a different way I should be setting it up to achieve the above??

regan
-- 
View this message in context: 
http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp26701706p26701706.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: java.lang.NumberFormatException: For input string:

2009-12-08 Thread Chris Hostetter

: its strange i had a dismaxhandler and it had an empty value for ps field
: i added a default value like 100 and the error disappeared.

I really wish the java compiler had an option so we could say when 
compiling our code, treat this list of unchecked exceptions like checked 
exceptions so we could prevent code that doesn't catch 
NumberFormatException from ever getting committed.

I've got a patch that will improve the error message on this in the 
future...
https://issues.apache.org/jira/browse/SOLR-1635

:  SEVERE: java.lang.NumberFormatException: For input string: 
:  at
:  
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
:  at java.lang.Integer.parseInt(Integer.java:468)
:  at java.lang.Integer.valueOf(Integer.java:553)


-Hoss



Re: do copyField's need to exist as Fields?

2009-12-08 Thread regany


regany wrote:
 
 Is there a different way I should be setting it up to achieve the above??
 


Think I figured it out.

I set up the fields so they are present, but get ignored accept for the
text field which gets indexed...

field  name=id type=text indexed=true stored=true
multiValued=false required=true /
field name=title stored=false indexed=false multiValued=true
type=text /
field name=subtitle stored=false indexed=false multiValued=true
type=text /
field name=body stored=false indexed=false multiValued=true
type=text /
field name=text type=text indexed=true stored=false
multiValued=true /

and then copyField the first 4 fields to the text field:

copyField source=id dest=text /
copyField source=title dest=text /
copyField source=subtitle dest=text /
copyField source=body dest=text /


Seems to be working!? :drunk:
-- 
View this message in context: 
http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp26701706p26702224.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr usage with Auctions/Classifieds?

2009-12-08 Thread regany

hello!

just wondering if anyone is using Solr as their search for an auction /
classified site, and if so how have you managed your setup in general? ie.
searching against listings that may have expired etc.

regan
-- 
View this message in context: 
http://old.nabble.com/Solr-usage-with-Auctions-Classifieds--tp26702828p26702828.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multiple Facet prefixes on the same facet field in one request?

2009-12-08 Thread Robert Purdy

Hey all, 

Is there anyway in Solr 1.4/1.5 to perform multiple facet prefixes on the
same facet field in one request? 

Ex. On field 'Foo' I want to perform a facet prefix of A* and  B* so I can
get a facet response of all terms prefixed with A and all terms prefixed
with B, either grouped together in the same facet result list or seperate
facet lists labeled by the prefix.

Currently, I perform one request per facet prefix and I am hoping that there
is some cryptic way using local params that I am missing that will allow me
to do this.

Robert.   
-- 
View this message in context: 
http://old.nabble.com/Multiple-Facet-prefixes-on-the-same-facet-field-in-one-request--tp26702997p26702997.html
Sent from the Solr - User mailing list archive at Nabble.com.



Packaging installing SOLR on linux

2009-12-08 Thread insaneyogi3008

Hello,

At the risk of asking a highly general question , can anybody give me
pointers or best practices on how best one can package SOLR  its associated
file as a Linux rps file , so that this core/instance can be ported on
multiple instances ? If anybody has experience working on such a system ,
the knowledge will be very useful . 


With Regards
Sri
-- 
View this message in context: 
http://old.nabble.com/Packaging---installing-SOLR-on-linux-tp26703295p26703295.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Replicating multiple cores

2009-12-08 Thread Jason Rutherglen
 Yes. I'd highly recommend using the Java replication though.

Is there a reason?  I understand it's new etc, however I think one
issue with it is it's somewhat non-native access to the filesystem.
Can you illustrate a real world advantage other than the enhanced
admin screens?

On Mon, Dec 7, 2009 at 11:13 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Tue, Dec 8, 2009 at 11:48 AM, Jason Rutherglen 
 jason.rutherg...@gmail.com wrote:

 If I've got multiple cores on a server, I guess I need multiple
 rsyncd's running (if using the shell scripts)?


 Yes. I'd highly recommend using the Java replication though.

 --
 Regards,
 Shalin Shekhar Mangar.



Solr Cell and Spellchecking.

2009-12-08 Thread Michael Boyle
Following Eric Hatcher's post about using SolrCell and acts_as_solr { 
http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ }, I 
have been able to index a rich document stream and retrieve it's id. No 
worries.


However, I have the SpellCheckComponent setup to build on commit 
(buildOnCommit=true). Alas, the rich document text is not being added to 
the spellchecker dictionary.


Is there something special I need to do within the SolrConfig.xml or 
within the acts_as_solr ruby classes?


- thanks in advance for any ideas -

Mike Boyle


bool default - if missing when updating uses current or default value?

2009-12-08 Thread regany

hello,

if I have a booleen fieldType (solr.BoolField) with a default value of
true, and I insert a new document I understand that the booleen value will
be set to TRUE.

But if I update an existing document, and I don't pass in a value for the
booleen field, will Solr keep the existing booleen value unchanged, or will
it update the booleen value again using the default? - ie. true.

?

regan
-- 
View this message in context: 
http://old.nabble.com/bool-default---if-missing-when-%22updating%22-uses-current-or-default-value--tp26703630p26703630.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiindexing

2009-12-08 Thread Lance Norskog
A core is one index. I think you mean:

3-5 indexes in different cores. Since you want to search across them,
they should have the same schema. There is a feature called
Distributed Search that searches across multiple indexes. There is no
administration support for indexing parts of one data set into
multiple indexes. You have to set up all solr instances and index
parts of the data into each one with your own scripting.

Does this help?

On 12/7/09, Jörg Agatz joerg.ag...@googlemail.com wrote:
 Hi Users..

 i need help with Multiindexing in Solr,

 i want one Core, and 3 to 5 diferent indizes. So i can search in simultan in
 all or in some of them.
 i find the Help im WIKI.. but it dosent Help.
 http://wiki.apache.org/solr/MultipleIndexes?highlight=%28multi%29
 there stand nothing about Multiindexing in Solr.
 in the Book from Solr1.4 too

 exist no way to use more than one index in one core/instance?

 King



-- 
Lance Norskog
goks...@gmail.com


Re: question about schemas

2009-12-08 Thread Lance Norskog
I don't know. The common way to do this in Solr is the full
denormalization technique, but that blows up in this case. This is not
an easy problem space to implement in Solr. Data warehousing  star
schema techniques may be more appropriate.

On 12/7/09, solr-user solr-u...@hotmail.com wrote:


 Lance Norskog-2 wrote:

 You can make a separate facet field which contains a range of buckets:
 10, 20, 50, or 100 means that the field has a value 0-10, 11-20, 21-50, or
 51-100. You could use a separate filter query with values for these
 buckets. Filter queries are very fast in Solr 1.4 and this would limit
 your range query execution to documents which match the buckets.


 Lance, I am afraid that I do not see how to use this suggestion.

 Which of the three (four?) suggested schemas would I be using?  How would
 these range facets prevent the potential issues I found such as getting
 product facets instead of customer facets, or having very large numbers of
 ANDs and ORs, and so forth.
 --
 View this message in context:
 http://old.nabble.com/question-about-schemas-tp26600956p26679922.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


Re: Solr Search in stemmed and non stemmed mode

2009-12-08 Thread Lance Norskog
Short answer: the standard query handler is right for carefully
designing queries. The dismax query handler is right for putting a
'search box' in a web page for regular users.

On 12/7/09, khalid y kern...@gmail.com wrote:
 Thanks,

 I'll read the mail archive.

 Your suggestion is like mine but whitout the DisMax handler. I'm going to
 read what is this handler.
 I have one field text and another text_unstemmed where I copy all others
 fields. I'm writing my custom query handler who check if quotes exists and
 switch between the good field.

 Going to read...

 Thanks


 2009/12/7 Erick Erickson erickerick...@gmail.com

 Try searching the mail archive for
 stemmer exact match
 or similar, this has been discussed multiple times and you'll get more
 complete discussions wy faster

 One suggestion is to use two fields, one for the stemmed version
 and one for the original, then use whichever field you need to via
 DixMax handler (more detail in the mail archive).

 Best
 Erick

 On Mon, Dec 7, 2009 at 10:02 AM, khalid y kern...@gmail.com wrote:

  Hi !!
 
  I'm looking for a way to have two index in solr one stemmed and another
 non
  stemmed. Why ? It's simple :-)
 
  My users can do query for  :
  -  banking marketing =  it return all document matches bank*** and
  market***
  - banking marketing = it return all document matches banking and
  market***
 
  The second request need that I can switch between stemmed and not
  stemmed
  when the user write the keyword with quotes.
 
  The  optimal solution is : solr can mix gracefully results from stemmed
 and
  non stemmed index, with a good score calculation ect...
 
  The near optimal solution is : if solr see   it switch in non stemmed
 mod
  for all key words in query
 
  I have an idea but I prefer to listen the comunity voice before to
 propose
  it. I'll expose it in my next post.
 
  If someone has an graceful idea to do this craps :-)
 
  Thanks
 




-- 
Lance Norskog
goks...@gmail.com


Re: search on tomcat server

2009-12-08 Thread Lance Norskog
Solr comes with an example solr installation in the example/
directory. Run this, look at the README.txt file, index the xml files
in example/exampledocs, and do queries like 'disk' and 'memory'. And
read example/conf/schema.xml and example/conf/solrconfig.xml.

Most of the details of what solr does and how to set it up will be clear.

On 12/7/09, Sascha Szott sz...@zib.de wrote:
 Hi Jill,

 just to make sure your index contains at least one document, what is the
 output of

 http://localhost:8080/solr/select?q=*:*debugQuery=trueechoParams=all

 Best,
 Sascha

 Jill Han wrote:
 In fact, I just followed the instructions titled as Tomcat On Windows.
 Here are the updates on my computer
 1. -Dsolr.solr.home=C:\solr\example
 2. change dataDir to dataDirC:\solr\example\data/dataDir in
 solrconfig.xml at C:\solr\example\conf
 3. created solr.xml at C:\Tomcat 5.5\conf\Catalina\localhost
 ?xml version=1.0 encoding=utf-8?
 Context docBase=c:/solr/example/apache-solr-1.3.0.war debug=0
 crossContext=true
   Environment name=solr/home type=java.lang.String
 value=c:/solr/example override=true/
 /Context

 I restarted Tomcat, went to http://localhost:8080/solr/admin/
 Entered video in Query String field, and got
 /**
 ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lst name=responseHeader
   int name=status0/int
   int name=QTime0/int
 - lst name=params
   str name=rows10/str
   str name=start0/str
   str name=indenton/str
   str name=qvideo/str
   str name=version2.2/str
   /lst
   /lst
   result name=response numFound=0 start=0 /
   /response
 /
 My questions are
 1. is the setting correct?
 2. where does solr start to search words entered in Query String field
 3. how can I make result page like general searching result page, such as,
 not found, if found, a url, instead of xml will be returned.


 Thanks a lot for your helps,

 Jill

 -Original Message-
 From: William Pierce [mailto:evalsi...@hotmail.com]
 Sent: Friday, December 04, 2009 12:56 PM
 To: solr-user@lucene.apache.org
 Subject: Re: search on tomcat server

 Have you gone through the solr tomcat wiki?

 http://wiki.apache.org/solr/SolrTomcat

 I found this very helpful when I did our solr installation on tomcat.

 - Bill

 --
 From: Jill Han jill@alverno.edu
 Sent: Friday, December 04, 2009 8:54 AM
 To: solr-user@lucene.apache.org
 Subject: RE: search on tomcat server
 X-HOSTLOC: hermes.apache.org/140.211.11.3

 I went through all the links on
 http://wiki.apache.org/solr/#Search_and_Indexing
 And still have no clue as how to proceed.
 1. do I have to do some implementation in order to get solr to search
 doc.
 on tomcat server?
 2. if I have files, such as .doc, docx, .pdf, .jsp, .html, etc under
 window xp, c:/tomcat/webapps/test1, /webapps/test2,
   What should I do to make solr search those directories
 3. since I am using tomcat, instead of jetty, is there any demo that
 shows
 the solr searching features, and real searching result?

 Thanks,
 Jill


 -Original Message-
 From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
 Sent: Monday, November 30, 2009 10:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: search on tomcat server

 On Mon, Nov 30, 2009 at 9:55 PM, Jill Han jill@alverno.edu wrote:

 I got solr running on the tomcat server,
 http://localhost:8080/solr/admin/

 After I enter a search word, such as, solr, then hit Search button, it
 will go to

 http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10in
 dent=onhttp://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10in%0Adent=on

  and display

   ?xml version=1.0 encoding=UTF-8 ?

 -
 http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10i
 ndent=onhttp://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10i%0Andent=on
  response

 -
 http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10i
 ndent=onhttp://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10i%0Andent=on
lst name=responseHeader

  int name=status0/int

  int name=QTime0/int

 -
 http://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10i
 ndent=onhttp://localhost:8080/solr/select/?q=solrversion=2.2start=0rows=10i%0Andent=on
  lst name=params

str name=rows10/str

str name=start0/str

str name=indenton/str

str name=qsolr/str

str name=version2.2/str

 /lst

   /lst

result name=response numFound=0 start=0 /

  /response

  My question is what is the next step to search files on tomcat
 server?



 Looks like you have not added any documents to Solr. See the Indexing
 Documents section at http://wiki.apache.org/solr/#Search_and_Indexing

 --
 Regards,
 Shalin Shekhar Mangar.






-- 
Lance Norskog
goks...@gmail.com


Re: Replicating multiple cores

2009-12-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Dec 9, 2009 at 6:14 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Yes. I'd highly recommend using the Java replication though.

 Is there a reason?  I understand it's new etc, however I think one
 issue with it is it's somewhat non-native access to the filesystem.
 Can you illustrate a real world advantage other than the enhanced
 admin screens?
Complexity is the main problem w/ rsync based replication. you have to
manage so many processes and monitor them separately. The other
problem is managing snapshots. These snapshots need to be cleaned up
every now and then. You do not have enough info on what is
heppening/happened

 On Mon, Dec 7, 2009 at 11:13 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 On Tue, Dec 8, 2009 at 11:48 AM, Jason Rutherglen 
 jason.rutherg...@gmail.com wrote:

 If I've got multiple cores on a server, I guess I need multiple
 rsyncd's running (if using the shell scripts)?


 Yes. I'd highly recommend using the Java replication though.

 --
 Regards,
 Shalin Shekhar Mangar.





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: Replicating multiple cores

2009-12-08 Thread Jason Rutherglen
 Complexity is the main problem

I agree, replicating multiple cores otherwise means multiple rsyncd
processes, and true enough that management of shell scripts multiplies
in complexity.

2009/12/8 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 On Wed, Dec 9, 2009 at 6:14 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Yes. I'd highly recommend using the Java replication though.

 Is there a reason?  I understand it's new etc, however I think one
 issue with it is it's somewhat non-native access to the filesystem.
 Can you illustrate a real world advantage other than the enhanced
 admin screens?
 Complexity is the main problem w/ rsync based replication. you have to
 manage so many processes and monitor them separately. The other
 problem is managing snapshots. These snapshots need to be cleaned up
 every now and then. You do not have enough info on what is
 heppening/happened

 On Mon, Dec 7, 2009 at 11:13 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 On Tue, Dec 8, 2009 at 11:48 AM, Jason Rutherglen 
 jason.rutherg...@gmail.com wrote:

 If I've got multiple cores on a server, I guess I need multiple
 rsyncd's running (if using the shell scripts)?


 Yes. I'd highly recommend using the Java replication though.

 --
 Regards,
 Shalin Shekhar Mangar.





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



Enumerating wildcard terms

2009-12-08 Thread Mark N
Is it possible to  enumerate all terms that match the specified wildcard
filter term.  Similar to Lunce  WildCardTermEnum API

for example if I search abc*   then I just should able to access all the
terms abc1, abc2 , abc3... that exists in Index

What should be better approach to meet this functionality ?




-- 
Nipen Mark


Re: Enumerating wildcard terms

2009-12-08 Thread Erik Hatcher

Mark,

The TermsComponent should do the trick for you.

http://wiki.apache.org/solr/TermsComponent

   Erik


On Dec 9, 2009, at 7:46 AM, Mark N wrote:

Is it possible to  enumerate all terms that match the specified  
wildcard

filter term.  Similar to Lunce  WildCardTermEnum API

for example if I search abc*   then I just should able to access all  
the

terms abc1, abc2 , abc3... that exists in Index

What should be better approach to meet this functionality ?




--
Nipen Mark





RE: do copyField's need to exist as Fields?

2009-12-08 Thread Jaco Olivier
Hi Regan,

Something I noticed on your setup...
The ID field in your setup I assume to be your uniqueID for the book or
journal (The ISSN or something)
Try making this a string as TEXT is not the ideal field to use for
unique IDs

field  name=id type=string indexed=true stored=true
multiValued=false required=true /

Congrats on figuring out SOLR fields - I suggest getting the SOLR 1.4
Book.. It really saved me a 1000 questions on this mailing list :)

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 09 December 2009 00:48
To: solr-user@lucene.apache.org
Subject: Re: do copyField's need to exist as Fields?



regany wrote:
 
 Is there a different way I should be setting it up to achieve the
above??
 


Think I figured it out.

I set up the fields so they are present, but get ignored accept for
the
text field which gets indexed...

field  name=id type=text indexed=true stored=true
multiValued=false required=true /
field name=title stored=false indexed=false multiValued=true
type=text /
field name=subtitle stored=false indexed=false multiValued=true
type=text /
field name=body stored=false indexed=false multiValued=true
type=text /
field name=text type=text indexed=true stored=false
multiValued=true /

and then copyField the first 4 fields to the text field:

copyField source=id dest=text /
copyField source=title dest=text /
copyField source=subtitle dest=text /
copyField source=body dest=text /


Seems to be working!? :drunk:
-- 
View this message in context:
http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp267017
06p26702224.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


Re: how to do auto-suggest case-insensitive match and return original case field values

2009-12-08 Thread hermida


Uri Boness wrote:
 
 Just updated SOLR-1625 to support regexp hints.
 
 https://issues.apache.org/jira/browse/SOLR-1625
 
 Cheers,
 Uri
 

This is perfect, exactly what is needed to make this functionality possible. 
Is the patch already in trunk? 

thanks,
leandro
-- 
View this message in context: 
http://old.nabble.com/how-to-do-auto-suggest-w--case-insensitive-search-and-suggesting-original-mixed-case-field-values-tp26636365p26706241.html
Sent from the Solr - User mailing list archive at Nabble.com.