SOLR-769 clustering

2009-04-21 Thread Antonio Eggberg

Hello:

I have got the clustering working i.e SOLR-769. I am wondering 

- why there is a filed called body, does it have special purpose?

   field name=body type=text indexed=true stored=true 
multiValued=true/

- can my clustering field be a copyField? basically I like to remove the urls 
and html?

- is there anyway to have minimum number of labels per cluster? 

Thanks.
Antonio


  __
Ta semester! - sök efter resor hos Kelkoo.
Jämför pris på flygbiljetter och hotellrum här:
http://www.kelkoo.se/c-169901-resor-biljetter.html?partnerId=96914052


Re: Big Problem with special characters

2009-04-21 Thread Kraus, Ralf | pixelhouse GmbH

Otis Gospodnetic schrieb:

Try debugQuery=true and see if the resulting query string makes sense.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

thx for the hint...

My problem was the WhitespaceTokenizer :-( After I change back to 
StandardTokenizer everythign was fine !


Greets -Ralf-



autowarmcount how to check if cache has been warmed up

2009-04-21 Thread sunnyfr

Hi, 

Is it possible to have autowarmcount=500 with warmupTime=2751 and size=5,
where can I check up if the cache is full or not cuz really there it looks
empty still??? and commitment is done.
solr1.4 

thanks for your help, 
sunny

name:queryResultCache  
class:  org.apache.solr.search.FastLRUCache  
version:1.0  
description:Concurrent LRU Cache(maxSize=14774644, initialSize=14774644,
minSize=13297179, acceptableSize=14035911, cleanupThread=false,
autowarmCount=500,
regenerator=org.apache.solr.search.solrindexsearche...@6e4eeaaf)  
stats:  lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 5
warmupTime : 2751
cumulative_lookups : 0
cumulative_hits : 0
cumulative_hitratio : 0.00
cumulative_inserts : 0
cumulative_evictions : 0 
-- 
View this message in context: 
http://www.nabble.com/autowarmcount-how-to-check-if-cache-has-been-warmed-up-tp23156612p23156612.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Solr to index a database

2009-04-21 Thread ahammad

Thanks for the link...

I'm still a bit unclear as to how it goes. For example, lets say i have a
table called PRODUCTS, and within that table, I have the following columns:
NUMBER (product number)
NAME (product name)
PRICE

How would I index all this information? Here is an example (from the links
you provided) of xml that confuses me:

entity name=item pk=ID query=select * from item
---deltaQuery=select id from item where last_modified 
'${dataimporter.last_index_time}'
field column=NAME name=name /
field column=NAME name=nameSort /
field column=NAME name=alphaNameSort /

What is that deltaQuery (or even if it was a regular query expression)
line for? It seems to me like a sort of filter. What if I don't want to
filter anything and just want to index all the rows?

Cheers




Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 On Mon, Apr 20, 2009 at 7:15 PM, ahammad ahmed.ham...@gmail.com wrote:

 Hello,

 I've never used Solr before, but I believe that it will suit my current
 needs with indexing information from a database.

 I downloaded and extracted Solr 1.3 to play around with it. I've been
 looking at the following tutorials:
 http://www.ibm.com/developerworks/java/library/j-solr-update/index.html
 http://www.ibm.com/developerworks/java/library/j-solr-update/index.html
 http://wiki.apache.org/solr/DataImportHandler
 http://wiki.apache.org/solr/DataImportHandler

 There are a few things I don't understand. For example, the IBM article
 sometimes refers to directories that aren't there, or a little different
 from what I have in my extracted copy of Solr (ie
 solr-dw/rss/conf/solrconfig.xml). I tried to follow the steps as best I
 can,
 but as soon as I put the following in solrconfig.xml, the whole thing
 breaks:

 requestHandler name=/dataimport
  class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
  str name=configrss-data-config.xml/str
 /lst
 /requestHandler

 Obviously I replace with my own info...One thing I don't quite get is the
 data-config.xml file. What exactly is it? I've seen examples of what it
 contains but since I don't know enough, I couldn't really adjust it. In
 any
 case, this is the error I get, which may be because of a misconfigured
 data-config.xml...
 the data-config.xml describes how to fetch data from various data
 sources and index them into Solr.
 
 The stacktrace says that your xml is invalid.
 
 The best bet is to take one of the sample dataconfig xml files and make
 changes.
 
 http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/db/conf/db-data-config.xml?revision=691151view=markup
 
 http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/rss/conf/rss-data-config.xml?revision=691151view=markup
 
 

 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
 occurred while initializing context at
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165)
 at
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:99)
 at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:96)
 at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:388)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:571) at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544)
 at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:831) at
 org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:720) at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1150) at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022)
 at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1014) at
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
 

RE: Sort by distance from location?

2009-04-21 Thread Ensdorf Ken
I've never used them personally, but I think a function query would suit you 
here.  Function queries allow you to define a custom function as a component of 
the score of a result document.  Define a distance function based on the user's 
current location and the that of the search result, such that the shorter the 
distance, the higher the function output.  This will boost results inversely 
proportional to the distance from the user.

-Ken

 -Original Message-
 From: Development Team [mailto:dev.and...@gmail.com]
 Sent: Tuesday, April 14, 2009 5:32 PM
 To: solr-user@lucene.apache.org
 Subject: Sort by distance from location?

 Hi everybody,
  My index has latitude/longitude values for locations. I am
 required to
 do a search based on a set of criteria, and order the results based on
 how
 far the lat/long location is to the current user's location. Currently
 we
 are emulating such a search by adding criteria of ever-widening
 bounding
 boxes, and the more of those boxes match the document, the higher the
 score
 and thus the closer ones appear at the start of the results. The query
 looks
 something like this (newlines between each search term):

 +criteraOne:1
 +criteriaTwo:true
 +latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0]
 (latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79])
 (latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51])
 (latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03])
 [[...etc...about 10 times...]]

  Naturally this is quite slow (query is approximately 6x slower
 than
 normal), and... I can't help but feel that there's a more elegant way
 of
 sorting by distance.
  Does anybody know how to do this or have any suggestions?

 Sincerely,

  Daryl.


Master Slave Solr Replication Automation

2009-04-21 Thread payalsharma

We have a requirement of replicating data from one Solr set on a Linux Box to
Second Solr on another Linux box. In order to achieve the same we will use
the SolrCollectionDistributionScripts(snapshooter, snappuller etc) and rsync
utility. 

Configurations:
1.  Apache Solr 1.3.0
2.  Machines : Linux 
3.  Master Slave : 1 Master and 1 slave

Settings done at our end:

Solr on the both Linux boxes contains multiple cores. We have disintegrated
the data to be indexed among multicores, sample solr path of data folder for
the same is like:

Path :: {SOLR_HOME}/solr/multicore/multi_corename/data

Sample :
machine_path/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data

* SOLR_HOME :: machine_path/apache-solr-1.3.0/example/
** multi_corename :: CORE_WWW.ABCD.COM

Thus we will be going to have multiple cores on master as well as slaves
servers

As mentioned on  http:// http://wiki.apache.org/solr/CollectionDistribution 
:For the Solr distribution scripts, the name of the index directory can be
defined by the environment variable data_dir in the configuration file
conf/scripts.conf

Example conf/scripts.conf file on  slave solr server :
user=
solr_hostname=localhost
solr_port=8080
rsyncd_port=18983
data_dir=${SOLR_HOME}/solr/muticore/CORE_WWW.ABCD.COM /data
webapp_name=solr
master_host=10.x.xx.xxx
master_data_dir=${SOLR_HOME}/solr/muticore/CORE_WWW.ABCD.COM/data
master_status_dir=${SOLR_HOME}/solr/muticore/CORE_WWW.ABCD.COM /status

The index directory name mentioned above should match the value used by the
Solr server which is defined in solr/conf/solrconfig.xml. 

Following are few queries:

1. Please confirm whether the tag entry  : dataDir/datadir 
In solrconfig.xml should match for the  Slave solr server / master solr
server in accordance to the scripts.conf configuration settings.

2. Also let us know whether some specific handling has to be done in case of
using multi cores during replication.

3. Are there any pitfalls in using the solr distribution scripts and rsync
utility.

Please throw some light on the queries.

-- 
View this message in context: 
http://www.nabble.com/Master-Slave-Solr-Replication-Automation-tp23158672p23158672.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best way to index without diacritics

2009-04-21 Thread lupiss

Hola!

Yo también tengo el mismo problema, ya tengo mis índices de mis documentos
cambie el charset a iso-8859-1 y ya pude ver las ñ y acentos, ahora ligué el
buscador a mi aplicación y desde páginas jsp se hacen búsquedas, el problema
es que cuando el usuario escribe en el text que pide el parámetro de la
consulta debe escribir los acentos, y eso es justamente lo que no quiero,
leí un poco y creo que ocupo incluir la clase isolatin o algo así, pero no
sé bien como, primero se que la debo incluir en el config.xml o en el
schema.xml pero también debo bajar un jar o algo así de la clase isolatin? o
que debo hacer no sé... por favor alguna ayuda!!!? me estoy volviendo loca!
:(


Hello! 

I too have the same problem as I have my documents indexes change charset to
iso-8859-1, and I could see ñ and accents, now linking the browser to my
application from jsp pages are searched, the problem is that when the user
types in the text that requests the parameter of the query must write
accents, and this is what we do not want to, I read a bit and I think the
deal include ISOLATINO class or something, but I do not know as well, first
is that I include in the config.xml or the schema.xml but I download a jar
or something of the kind ISOLATINO? or that I do not know ... please some
help !!!?. I'm going crazy! : (
-- 
View this message in context: 
http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23159812.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete from Solr index...

2009-04-21 Thread lupiss

Hola!
¿qué tal?

Tengo un problema parecido, necesito borrar algunos índices de mi solr, ya
que los di de alta mientras hacía pruebas y ahora que entregaré el proyecto
necesito que no aparezcan ya, se me complica esto ya que toda la información
de solr está en inglés y pués yo no lo entiendo bien, en fin, espero me
puedan ayudar ya que tengo solo unos días para entregar el proyecto, de
antemano mil gracias! :)

Hello! 
How are you? 

I have a similar problem, I need to delete some of my SOLR indexes, since
the various tests was as high and now that I need to deliver the project
because they do not show, I compounded this, as all information is in
English and SOLR after I do not understand, finally, I hope I can help
because I have just days to deliver the project, a thousand thanks in
advance! :)
-- 
View this message in context: 
http://www.nabble.com/Delete-from-Solr-index...-tp10264940p23159879.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best way to index without diacritics

2009-04-21 Thread Otis Gospodnetic

Does this help: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=(isolatin)#head-4ebf7aea23b3d6d34a1f8314f9de17334a3e2fac


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: lupiss lupitaga...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, April 21, 2009 12:22:08 PM
 Subject: Re: Best way to index without diacritics
 
 
 Hola!
 
 Yo también tengo el mismo problema, ya tengo mis índices de mis documentos
 cambie el charset a iso-8859-1 y ya pude ver las ñ y acentos, ahora ligué el
 buscador a mi aplicación y desde páginas jsp se hacen búsquedas, el problema
 es que cuando el usuario escribe en el text que pide el parámetro de la
 consulta debe escribir los acentos, y eso es justamente lo que no quiero,
 leí un poco y creo que ocupo incluir la clase isolatin o algo así, pero no
 sé bien como, primero se que la debo incluir en el config.xml o en el
 schema.xml pero también debo bajar un jar o algo así de la clase isolatin? o
 que debo hacer no sé... por favor alguna ayuda!!!? me estoy volviendo loca!
 :(
 
 
 Hello! 
 
 I too have the same problem as I have my documents indexes change charset to
 iso-8859-1, and I could see ñ and accents, now linking the browser to my
 application from jsp pages are searched, the problem is that when the user
 types in the text that requests the parameter of the query must write
 accents, and this is what we do not want to, I read a bit and I think the
 deal include ISOLATINO class or something, but I do not know as well, first
 is that I include in the config.xml or the schema.xml but I download a jar
 or something of the kind ISOLATINO? or that I do not know ... please some
 help !!!?. I'm going crazy! : (
 -- 
 View this message in context: 
 http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23159812.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr - clarification on date sortable fields

2009-04-21 Thread Wesley Small
I am sending this question out on behalf a college. Which needs a
clarification on solr indexing on  date and sortable fields.

We have declared a field date in schema.xml like below

field name=premierDate_dt type=date indexed=true  stored=true
multiValued=false default=NOW/

While indexing if I don't pass any value to this field like
premierDate_dt/ or premierDate_dt/premierDate_dt, I am  getting the
below error 

SEVERE:  org.apache.solr.common.SolrException: Invalid Date String:''
at  org.apache.solr.schema.DateField.parseMath(DateField.java:167)
at  org.apache.solr.schema.DateField.toInternal(DateField.java:138)
at  org.apache.solr.schema.FieldType.createField(FieldType.java:179)
at  org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
at  
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:243)
at  
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProces
 sorFactory.java:58)
 

Instead if I remove the tag from the request, it is not giving any  issues.
The same behavious exist for sortable fields as well like sint, slong.  Is
there any work around we can make in schema file?

Or the request needs to be changed accordingly?

A  quick work around for this is declaring the fields as string. But the
limitation would be we can not perform any range search queries on these
fields..

Interestingly,f we replace with all zeros in the date (I.e.
premierDate_dt-00-00T00:00:00Z/premierDate_dt,
It gets indexed and the value in index is created as 0002-11-30T00:00:00.


Thanks.



Re: query on part number not matching

2009-04-21 Thread Kevin Osborn
Or in this case, I was using DisMax. My ps was 5, but I didn't have a qs field. 
Setting qs to a small value did the trick.





From: Yonik Seeley yo...@lucidimagination.com
To: solr-user@lucene.apache.org
Sent: Monday, April 20, 2009 6:09:51 PM
Subject: Re: query on part number not matching

On Mon, Apr 20, 2009 at 8:50 PM, Kevin Osborn osbo...@yahoo.com wrote:
 Looks like the format didn't come through in the email. ch, vxrch, and 
 cisco7204xvrch are all in position 4.

Ah... the traditional way to handle that case is to use a little
slop with the phrase query.

-Yonik



  

Re: Solr - clarification on date sortable fields

2009-04-21 Thread Otis Gospodnetic

This all makes sense.  You are sending a blank string for a field that expects 
a date (or null - no element at all - if you want it to default to NOW).  So, 
yes, you need to either pass a valid date or don't pass that element in at all.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Wesley Small wesley.sm...@mtvstaff.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Tuesday, April 21, 2009 12:57:22 PM
 Subject: Solr - clarification on date  sortable fields
 
 I am sending this question out on behalf a college. Which needs a
 clarification on solr indexing on  date and sortable fields.
 
 We have declared a field date in schema.xml like below
 
 
 multiValued=false default=NOW/
 
 While indexing if I don't pass any value to this field like
 or , I am  getting the
 below error 
 
 SEVERE:  org.apache.solr.common.SolrException: Invalid Date String:''
 at  org.apache.solr.schema.DateField.parseMath(DateField.java:167)
 at  org.apache.solr.schema.DateField.toInternal(DateField.java:138)
 at  org.apache.solr.schema.FieldType.createField(FieldType.java:179)
 at  
  org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
 at  
  org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:243)
 at  
  org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProces
  sorFactory.java:58)
  
 
 Instead if I remove the tag from the request, it is not giving any  issues.
 The same behavious exist for sortable fields as well like sint, slong.  Is
 there any work around we can make in schema file?
 
 Or the request needs to be changed accordingly?
 
 A  quick work around for this is declaring the fields as string. But the
 limitation would be we can not perform any range search queries on these
 fields..
 
 Interestingly,f we replace with all zeros in the date (I.e.
 -00-00T00:00:00Z,
 It gets indexed and the value in index is created as 0002-11-30T00:00:00.
 
 
 Thanks.



Re: SOLR-769 clustering

2009-04-21 Thread Stanislaw Osinski
Hi Antonio,

- is there anyway to have minimum number of labels per cluster?


The current search results clustering algorithms (from Carrot2) by design
generate one label per cluster, so there is no way to force them to create
more. What is the reason you'd like to have more labels per cluster?

I'd leave the other two Solr-related questions to answer by a more competent
person (Grant?).

Cheers,

Staszek


Re: Hierarchal Faceting Field Type

2009-04-21 Thread Nasseam Elkarra
Thank you. We tried your suggestion but we are still getting the  
following problem:


fieldType name=category class=solr.TextField
analyzer type=store
tokenizer class=solr.PatternTokenizerFactory pattern=;/
/analyzer
/fieldType

field name=my_facet type=category indexed=true stored=false  
multiValued=true/


Sample data:
level one;level two;level three;
level one;level two;level threeB;

When we query for:
level one;level two;level three;*

We are getting back :
level one;level two;level threeB;

Even though the B is before the semicolon. Any idea why?

Thank you,
Nasseam

Check out our solr-powered Ajax search+nav solution:
http://factbook.bodukai.com/

Powered by Boutique:
http://bodukai.com/boutique/

On Apr 17, 2009, at 3:10 PM, Chris Hostetter wrote:



: level one#
: level one#level two#
: level one#level two#level three#
:
: Trying to find the right combination of field type and query to  
get the
: desired results. Saw some previous posts about hierarchal facets  
which helped
: in the generating the right query but having an issue using the  
built in text
: field which ignores our delimiter and the string field which  
prevents us from
: doing a start with search. Does anyone have any insight into the  
field

: declaration?

Use TextField, with a PatternTokenizer

BTW: if this isn't thread you've already seen, it's handy to know  
about...


http://www.nabble.com/Hierarchical-Faceting-to20090898.html#a20176326


-Hoss





Re: Solr Getting values for an id

2009-04-21 Thread Raju444us

The id field and testScore fields are name value pairs for each id i have a
testScore.When I search based on id  how do I know the position.I there any
method  or api in solr which gives me the position.

I have not understand the second part of your reply.Can you please tell me
how I can do with making those fields non multivalued.If possible some
example or code is help ful

Thanks,
Raju


Otis Gospodnetic wrote:
 
 
 You'll have to manually pull/parse those out and match them based on their
 positions, I think.  Or make those fields non-multivalued and add
 additional fields instead, if their number is fixed.
 
  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Raju444us ngudipa...@cormineid.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, April 21, 2009 5:40:47 PM
 Subject: Solr Getting values for an id
 
 
 i have a problem.
 I have a requirement.I indexed document something like this.The id and
 testScore fields are multivalued.
 My problem is if i search for id=1 this should return the search results
 with id = 1 and testScore = 90.
 Is there any way I can do this.
 
 
 
   Test Name
   1
   90
   2
   92
   3
   97
 
 
 
 Thanks,
 Naveen
 
 
 -- 
 View this message in context: 
 http://www.nabble.com/Solr-Getting-values-for-an-id-tp23165464p23165464.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-Getting-values-for-an-id-tp23165464p23165694.html
Sent from the Solr - User mailing list archive at Nabble.com.



filtering a query by a set of pk values

2009-04-21 Thread Craig de Stigter
Hi

We have a need to filter and rank a queryset outside of solr (its a
specialised spatial search) and then restrict the solr search based on
that filter.

Previously we were doing our filter, then passing a set of primary
keys to solr like so:
q = '(aerial photos) AND (pk:123^1.8 OR pk:163^1.2 OR pk:920^0.73)'

I know it's quite ugly but we are lacking a better alternative. Anyway
that worked when we were using the standard query handler, but we've
now switched to the dismax handler (so we can boost individual fields)
and it no longer works.

The fq parameter looked promising for doing something similar until I
realised that it doesn't seem to do OR queries and it doesn't allow
you to influence the ranking of results.

Can anyone suggest a cleaner way of doing this, or a just-as-ugly way
that works with the dismax handler?

Thanks
Craig de Stigter


Re: Best way to index without diacritics

2009-04-21 Thread lupiss

hola, gracias por contestar. sí, yo creo que esa es la clase que me servirá,
pero no sé cómo implementarla, podrías decirme si tu ya la haz usado, y si
es así, decirme qué líneas incluíste en el schema.xml, en el config.xml, qué
.jar adjuntaste, etc, todos los detalles, o incluso si tienes un ejemplo que
puedas escribir en el foro por favor, se me hace difícil porque además de
que soy nueva en solr, toda la información está en inglés y pués yo no lo
entiendo muy bien que digamos :(
gracias nuevamente


hi, thanks for reply. yes, I believe that is the class that will serve me,
but I do not know how to implement it, and you could tell me if the beam
used, and if so, tell me what lines included in the schema.xml in
config.xml, which . jar attached, etc, all the details, or even if you have
an example that you can write in the forum please, I also made it difficult
because I am new to SOLR, all information is in English and then I do not
understand very well to say: ( 
thanks again
-- 
View this message in context: 
http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23166430.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best way to index without diacritics

2009-04-21 Thread wiserweb
Amigo!
Viva Solr :)

Sent from my BlackBerry device on the Rogers Wireless Network

-Original Message-
From: lupiss lupitaga...@hotmail.com

Date: Tue, 21 Apr 2009 16:17:04 
To: solr-user@lucene.apache.org
Subject: Re: Best way to index without diacritics



hola, gracias por contestar. sí, yo creo que esa es la clase que me servirá,
pero no sé cómo implementarla, podrías decirme si tu ya la haz usado, y si
es así, decirme qué líneas incluíste en el schema.xml, en el config.xml, qué
.jar adjuntaste, etc, todos los detalles, o incluso si tienes un ejemplo que
puedas escribir en el foro por favor, se me hace difícil porque además de
que soy nueva en solr, toda la información está en inglés y pués yo no lo
entiendo muy bien que digamos :(
gracias nuevamente


hi, thanks for reply. yes, I believe that is the class that will serve me,
but I do not know how to implement it, and you could tell me if the beam
used, and if so, tell me what lines included in the schema.xml in
config.xml, which . jar attached, etc, all the details, or even if you have
an example that you can write in the forum please, I also made it difficult
because I am new to SOLR, all information is in English and then I do not
understand very well to say: ( 
thanks again
-- 
View this message in context: 
http://www.nabble.com/Best-way-to-index-without-diacritics-tp18935599p23166430.html
Sent from the Solr - User mailing list archive at Nabble.com.



Issue with Solr Snapshots (missing .nrm file)

2009-04-21 Thread Santhosh Kumar
We had one issue with our Solr production deployment couple of weeks back.  
Following is more info about it.

Server Setup
===
Platform: Sun Solaris Ultrasparc
JDK: 1.5
Solr: 1.2
Index Size: ~15GB
Topology: One master and two slaves

Problem Statement
===
Every day we index different contents into the Solr master and run the optimize 
at the end of it.  Snapshooter is triggered at the end of Optimize and it 
creates a snapshot of the index.  After couple of hours, Slaves pull the latest 
snapshot and install it to serve the searches.

Couple of weeks back, Slaves didn't pull the snapshot and when researched we 
found that one file (.nrm) was missing in the snapshot created by master, as 
shown below.

# ls -lrth
total 29877000
-rw-r--r--   1 jbossstaff   1.9K Apr 14 11:00 _429o.fnm
-rw-r--r--   1 jbossstaff28M Apr 14 12:18 _429o.fdx
-rw-r--r--   1 jbossstaff   9.8G Apr 14 12:18 _429o.fdt
-rw-r--r--   1 jbossstaff   501M Apr 14 12:43 _429o.tis
-rw-r--r--   1 jbossstaff   6.5M Apr 14 12:43 _429o.tii
-rw-r--r--   1 jbossstaff   2.4G Apr 14 12:43 _429o.prx
-rw-r--r--   1 jbossstaff   1.2G Apr 14 12:43 _429o.frq
-rw-r--r--   1 jbossstaff 44 Apr 14 12:44 segments_53gt
-rw-r--r--   1 jbossstaff 20 Apr 14 12:44 segments.gen
-rw-r--r--   1 jbossstaff   351M Apr 14 12:44 _429o.nrm --- This is 
the missing file in the snapshot!

# cd snapshot.20090414124449/
# ls -lrth
total 29157784
-rw-r--r--   1 jbossstaff 20 Apr 14 12:44 segments.gen
-rw-r--r--   1 jbossstaff   1.9K Apr 14 12:44 _429o.fnm
-rw-r--r--   1 jbossstaff   9.8G Apr 14 13:03 _429o.fdt
-rw-r--r--   1 jbossstaff28M Apr 14 13:03 _429o.fdx
-rw-r--r--   1 jbossstaff   1.2G Apr 14 13:06 _429o.frq
-rw-r--r--   1 jbossstaff   2.4G Apr 14 13:12 _429o.prx
-rw-r--r--   1 jbossstaff   501M Apr 14 13:13 _429o.tis
-rw-r--r--   1 jbossstaff   6.5M Apr 14 13:13 _429o.tii
-rw-r--r--   1 jbossstaff 44 Apr 14 13:13 segments_53gt
***This snapshot is missing _429o.nrm file!***

Has anybody faced this issue (missing a file, or may be missing .nrm file) 
might happen?  Any insight is greatly appreciated.

Couple of other questions.


1.   For index of size 15GB, how much of breathing space is required (both 
memory and diskspace) in master and slave?

2.   Would Slaves pull the snapshots if any of the files are missing in 
index? (I guess that snapshot-puller will pull but may not be able to install 
it?)

3.   We are using Solr 1.2 and contemplating to upgrade to 1.3.  What is 
your experience in this path of upgrade? Is it strongly recommended (based on 
any critical bugs which were fixed?)

Thanks,
Santhosh.



RE: OutofMemory on Highlightling

2009-04-21 Thread Gargate, Siddharth
I tried disabling the documentCache but still the same issue. 

documentCache
  class=solr.LRUCache
  size=0
  initialSize=0
  autowarmCount=0/



-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Monday, April 20, 2009 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: OutofMemory on Highlightling

Gargate, Siddharth wrote:
 Anybody facing the same issue? Following is my configuration
 ...
 field name=content type=text indexed=true stored=false
 multiValued=true/
 field name=teaser type=text indexed=false stored=true/
 copyField source=content dest=teaser maxChars=100 /
 ...

 ...
 requestHandler name=standard class=solr.SearchHandler
 default=true
  lst name=defaults
str name=echoParamsexplicit/str

int name=rows500/int
  str name=hltrue/str
   str name=flid,score/str
   str name=hl.flteaser/str
   str name=hl.alternateFieldteaser/str
   int name=hl.fragsize200/int
   int name=hl.maxAlternateFieldLength200/int
   int name=hl.maxAnalyzedChars500/int
  /lst
   /requestHandler
 ...

 Search works fine if I disable highlighting and it brings 500 results.
 But if I enable hightlighting and set the no. of rows to just 20 I get
 OOME.

   
How about switching documentCache off?

Koji




Re: Using Solr to index a database

2009-04-21 Thread Noble Paul നോബിള്‍ नोब्ळ्
delta query is for incremental imports

us ethe  'query' attribute to import data


On Tue, Apr 21, 2009 at 7:35 PM, ahammad ahmed.ham...@gmail.com wrote:

 Thanks for the link...

 I'm still a bit unclear as to how it goes. For example, lets say i have a
 table called PRODUCTS, and within that table, I have the following columns:
 NUMBER (product number)
 NAME (product name)
 PRICE

 How would I index all this information? Here is an example (from the links
 you provided) of xml that confuses me:

            entity name=item pk=ID query=select * from item
    ---    deltaQuery=select id from item where last_modified 
 '${dataimporter.last_index_time}'
            field column=NAME name=name /
            field column=NAME name=nameSort /
            field column=NAME name=alphaNameSort /

 What is that deltaQuery (or even if it was a regular query expression)
 line for? It seems to me like a sort of filter. What if I don't want to
 filter anything and just want to index all the rows?

 Cheers




 Noble Paul നോബിള്‍  नोब्ळ् wrote:

 On Mon, Apr 20, 2009 at 7:15 PM, ahammad ahmed.ham...@gmail.com wrote:

 Hello,

 I've never used Solr before, but I believe that it will suit my current
 needs with indexing information from a database.

 I downloaded and extracted Solr 1.3 to play around with it. I've been
 looking at the following tutorials:
 http://www.ibm.com/developerworks/java/library/j-solr-update/index.html
 http://www.ibm.com/developerworks/java/library/j-solr-update/index.html
 http://wiki.apache.org/solr/DataImportHandler
 http://wiki.apache.org/solr/DataImportHandler

 There are a few things I don't understand. For example, the IBM article
 sometimes refers to directories that aren't there, or a little different
 from what I have in my extracted copy of Solr (ie
 solr-dw/rss/conf/solrconfig.xml). I tried to follow the steps as best I
 can,
 but as soon as I put the following in solrconfig.xml, the whole thing
 breaks:

 requestHandler name=/dataimport
  class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
  str name=configrss-data-config.xml/str
 /lst
 /requestHandler

 Obviously I replace with my own info...One thing I don't quite get is the
 data-config.xml file. What exactly is it? I've seen examples of what it
 contains but since I don't know enough, I couldn't really adjust it. In
 any
 case, this is the error I get, which may be because of a misconfigured
 data-config.xml...
 the data-config.xml describes how to fetch data from various data
 sources and index them into Solr.

 The stacktrace says that your xml is invalid.

 The best bet is to take one of the sample dataconfig xml files and make
 changes.

 http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/db/conf/db-data-config.xml?revision=691151view=markup

 http://svn.apache.org/viewvc/lucene/solr/trunk/example/example-DIH/solr/rss/conf/rss-data-config.xml?revision=691151view=markup



 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
 occurred while initializing context at
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:165)
 at
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:99)
 at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:96)
 at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:388)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:571) at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:122)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:221)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:302)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3635)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4222)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:740)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:544)
 at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:831) at
 org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:720) at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:490) at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1150) at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1022)
 at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at
 

RE: OutofMemory on Highlightling

2009-04-21 Thread Gargate, Siddharth
Here is the stack trace

SEVERE: java.lang.OutOfMemoryError: Java heap space
at
java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.init(String.java:444)
at
org.apache.lucene.store.IndexInput.readString(IndexInput.java:125)
at
org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390)
at
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892)
at
org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j
ava:277)
at
org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176
)
at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457)
at
org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java
:482)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS
olrHighlighter.java:253)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightCo
mponent.java:84)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
Handler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
86)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84
5)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)



-Original Message-
From: Gargate, Siddharth [mailto:sgarg...@ptc.com] 
Sent: Wednesday, April 22, 2009 9:29 AM
To: solr-user@lucene.apache.org
Subject: RE: OutofMemory on Highlightling

I tried disabling the documentCache but still the same issue. 

documentCache
  class=solr.LRUCache
  size=0
  initialSize=0
  autowarmCount=0/



-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Monday, April 20, 2009 4:38 PM
To: solr-user@lucene.apache.org
Subject: Re: OutofMemory on Highlightling

Gargate, Siddharth wrote:
 Anybody facing the same issue? Following is my configuration
 ...
 field name=content type=text indexed=true stored=false
 multiValued=true/
 field name=teaser type=text indexed=false stored=true/
 copyField source=content dest=teaser maxChars=100 /
 ...

 ...
 requestHandler name=standard class=solr.SearchHandler
 default=true
  lst name=defaults
str name=echoParamsexplicit/str

int name=rows500/int
  str name=hltrue/str
   str name=flid,score/str
   str name=hl.flteaser/str
   str name=hl.alternateFieldteaser/str
   int name=hl.fragsize200/int
   int name=hl.maxAlternateFieldLength200/int
   int name=hl.maxAnalyzedChars500/int
  /lst
   /requestHandler
 ...

 Search works fine if I disable highlighting and it brings 500 results.
 But if I enable hightlighting and set the no. of rows to just 20 I get
 OOME.

   
How about switching documentCache off?

Koji