date:20120813

I added this in schema.xml

schema
fields
...
dynamicField name=attr_* type=text indexed=true stored=true 
multiValued=true/ 
   defType = edismax qf = article_id article_nom   
 /fields

uniqueKeyarticle_id/uniqueKey

solrQueryParser defaultOperator=OR/
 
/schema


But i have this error: 

###
org.xml.sax.SAXParseException: The reference to entity defType must end with 
the ';' delimiter. at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown
 Source) at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown 
Source) at
###

Is this a syntax problem? Am I writing defType wrong?


Thank you

-Message d'origine-
De : Ahmet Arslan [mailto:iori...@yahoo.com] 
Envoyé : vendredi 10 août 2012 16:22
À : solr-user@lucene.apache.org
Objet : RE: multi-searching problem

 It seems more complicate than i
 need.
 I just want, if the user specify nothing, to search in all my fields 
 that I declared in my schema.xml like that :
 defaultSearchFieldarticle_nom/defaultSearchField
 but not only article_nom but all fields.
 There should be some simple way to do that without using all of 
 this..?
 Or am I wrong? 

It is not that complicated. Just list your fields in qf parameter, that's all. 
defType=edismaxqf=field1 field2 field3 ... 


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

RE: multi-searching problem

2012-08-13 Thread deniz

well i dont know much stuff about dismax, but for making a search as default
on multiple fields, you can use copyField which is simpler than dismax
(though performance could be effected, I am not so sure)
basically, you can copy the other fields into one field and make it your
default search field and you are done... I have done a similar thing for
providing a universal search, where all of the fields on a document are
checked by default



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/multi-searching-problem-tp4000433p4000748.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: multi-searching problem

I always tried this in my schema.xml: 

###
field name=all type=string indexed=true  stored=true 
multiValued=true/
copyfield source=article_id dest=all/ 
copyfield source=article_nom dest=all/
 /fields

uniqueKeyarticle_id/uniqueKey
defaultSearchFieldall/defaultSearchField
###


I have no errors with that code but when I search for a term who is present in 
article_nom this give me 0 results ... I don't know why and where im doing 
wrong :s


Thank you for your help



-Message d'origine-
De : deniz [mailto:denizdurmu...@gmail.com] 
Envoyé : lundi 13 août 2012 08:54
À : solr-user@lucene.apache.org
Objet : RE: multi-searching problem

well i dont know much stuff about dismax, but for making a search as default on 
multiple fields, you can use copyField which is simpler than dismax (though 
performance could be effected, I am not so sure) basically, you can copy the 
other fields into one field and make it your default search field and you are 
done... I have done a similar thing for providing a universal search, where 
all of the fields on a document are checked by default



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/multi-searching-problem-tp4000433p4000748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

Re: AW: Indexing wildcard patterns

2012-08-13 Thread Tomas Zerolo

On Fri, Aug 10, 2012 at 12:38:46PM -0400, Jack Krupansky wrote:
 Doc1 has the pattern AB%CD% associated with it (somehow?!).
 
 You need to clarify what you mean by that.

I'm not the OP, but I think (s)he means the patterns are in the
database and the string to match is given in the query. Perhaps
this inversion is a bit unusual, and most optimizers aren't
prepared for that, but still reasonable, IMHO.

 To be clear, Solr support for wildcards is a superset of the SQL
 LIKE operator, and the patterns used in the LIKE operator are NOT
 stored in the table data, but used at query time

I don't know about others, but PostgreSQL copes just fine:

 | tomas@rasputin:~$ psql template1
 | psql (9.1.2)
 | Type help for help.
 | 
 | template1=# create database test;
 | CREATE DATABASE
 | template1=# create table foo (
 | template1(#   pattern VARCHAR
 | template1(# );
 | CREATE TABLE
 | template1=# insert into foo values('%blah');
 | INSERT 0 1
 | template1=# insert into foo values('blah%');
 | INSERT 0 1
 | template1=# insert into foo values('%bloh%');
 | INSERT 0 1
 | template1=# select * from foo where 'blahblah' like pattern;
 |  pattern 
 | -
 |  %blah
 |  blah%
 | (2 rows)

Now don't ask whether the optimizer has a fair chance at this. Dunno
what happens when we have, say, 10^7 patterns... but the OP's pattern
set seems to be reasonably small.

  - same with Solr.
 In SQL you do not associate patterns with table data, but rather
 you query data using a pattern.

I'd guess that the above trick might be doable in SOLR as well, as
other posts in this thread seem to suggest. But I'm not that proficient
in SOLR, that's why I'm lurking here ;-)

tomás
-- 
Tomás Zerolo
Axel Springer AG
Axel Springer media Systems
BILD Produktionssysteme
Axel-Springer-Straße 65
10888 Berlin
Tel.: +49 (30) 2591-72875
tomas.zer...@axelspringer.de
www.axelspringer.de

Axel Springer AG, Sitz Berlin, Amtsgericht Charlottenburg, HRB 4998
Vorsitzender des Aufsichtsrats: Dr. Giuseppe Vita
Vorstand: Dr. Mathias Döpfner (Vorsitzender)
Jan Bayer, Ralph Büchi, Lothar Lanz, Dr. Andreas Wiele

AW: AW: Indexing wildcard patterns

2012-08-13 Thread Lochschmied, Alexander

Here is what we do in SQL:

mysql select * from _tbl;
+++
| id | field  |
+++
|  1 | plain text |
|  2 | wil_c% |
+++
2 rows in set (0.14 sec)

mysql SELECT * FROM _TBL WHERE 'wildcard' LIKE FIELD;
+++
| id | field  |
+++
|  2 | wil_c% |
+++
1 row in set (0.12 sec)

So the patterns are associated with the actual documents in the database. We 
use those fields as a means to manually customize some searches.

Thanks,
Alexander

-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Freitag, 10. August 2012 18:39
An: solr-user@lucene.apache.org
Betreff: Re: AW: Indexing wildcard patterns

Doc1 has the pattern AB%CD% associated with it (somehow?!).

You need to clarify what you mean by that.

To be clear, Solr support for wildcards is a superset of the SQL LIKE operator, 
and the patterns used in the LIKE operator are NOT stored in the table data, 
but used at query time - same with Solr. In SQL you do not associate patterns 
with table data, but rather you query data using a pattern.

Step back and describe the problem you are trying to solve rather than 
prematurely jumping into a proposed solution.

So, if there is something you already do in SQL and now wish to do it in Solr, 
please tell us about it.

-- Jack Krupansky

-Original Message-
From: Lochschmied, Alexander
Sent: Friday, August 10, 2012 5:25 AM
To: solr-user@lucene.apache.org
Subject: AW: Indexing wildcard patterns

I thought my question might be confusing...

I know about Solr providing wildcards in queries, but my problem is different.

I have those patterns associated with my searchable documents before any actual 
search is done.
I need Solr to return the document which is associated with matching patterns. 
User does not enter the wildcard pattern; wildcard pattern must be tested by 
Solr automatically.

So in the example I provided below, a user might enter  ABCDXYZ  and I need 
Solr to return Doc1, as Doc1 has the pattern AB%CD% associated with it 
(somehow?!).

Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com]
Gesendet: Freitag, 10. August 2012 10:34
An: solr-user@lucene.apache.org
Betreff: Re: Indexing wildcard patterns



--- On Fri, 8/10/12, Lochschmied, Alexander alexander.lochschm...@vishay.com 
wrote:

 From: Lochschmied, Alexander alexander.lochschm...@vishay.com
 Subject: Indexing wildcard patterns
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Friday, August 10, 2012, 11:07 AM Coming from a SQL database 
 based search system, we already have a set of defined patterns 
 associated with our searchable documents.

 % matches no or any number of characters _ matches one character

 Example:
 Doc 1: 'AB%CD', 'AB%CD%'
 Doc 2: 'AB_CD'
 ...

 Thus Doc 1 matches
 ABXYZCD
 ABCD
 ABCDXYZ
 ...

 Whereas Doc 2 matches only
 ABXCD
 ABYCD
 ABZCD
 ...

 This can be achieved in SQL WHERE statements using the LIKE operator.

 Is there a (similar) way to this in Solr?

Yes, wildcard search in solr

* matches no or any number of characters ? matches one character

http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Wildcard%20Searches

RE: multi-searching problem

I follow this exemple 
https://github.com/boonious/misc/blob/master/fedora-solr-integration-conf/schema.xml
 but still no results 

-Message d'origine-
De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] 
Envoyé : lundi 13 août 2012 08:59
À : solr-user@lucene.apache.org
Objet : RE: multi-searching problem

I always tried this in my schema.xml: 

###
field name=all type=string indexed=true  stored=true 
multiValued=true/
copyfield source=article_id dest=all/ 
copyfield source=article_nom dest=all/  /fields

uniqueKeyarticle_id/uniqueKey
defaultSearchFieldall/defaultSearchField
###


I have no errors with that code but when I search for a term who is present in 
article_nom this give me 0 results ... I don't know why and where im doing 
wrong :s


Thank you for your help



-Message d'origine-
De : deniz [mailto:denizdurmu...@gmail.com] Envoyé : lundi 13 août 2012 08:54 
À : solr-user@lucene.apache.org Objet : RE: multi-searching problem

well i dont know much stuff about dismax, but for making a search as default on 
multiple fields, you can use copyField which is simpler than dismax (though 
performance could be effected, I am not so sure) basically, you can copy the 
other fields into one field and make it your default search field and you are 
done... I have done a similar thing for providing a universal search, where 
all of the fields on a document are checked by default



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/multi-searching-problem-tp4000433p4000748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.




Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

RE: solr indexing problem

Some ideas?

-Message d'origine-
De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] 
Envoyé : vendredi 10 août 2012 11:05
À : solr-user@lucene.apache.org
Objet : RE: solr indexing problem


This is schema.xml
###
?xml version=1.0 ?


schema name=db version=1.1
  types
  fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/
fieldType name=integer   class=solr.IntField   
omitNorms=true/
fieldType name=long  class=solr.LongField  
omitNorms=true/
fieldType name=float class=solr.FloatField 
omitNorms=true/
fieldType name=doubleclass=solr.DoubleField
omitNorms=true/
fieldType name=boolean   class=solr.BoolField 
sortMissingLast=true omitNorms=true/
fieldType name=date  class=solr.DateField 
sortMissingLast=true omitNorms=true/
fieldType name=sint  class=solr.SortableIntField 
sortMissingLast=true omitNorms=true/
fieldType name=slong class=solr.SortableLongField 
sortMissingLast=true omitNorms=true/
fieldType name=sfloatclass=solr.SortableFloatField 
sortMissingLast=true omitNorms=true/
fieldType name=sdouble   class=solr.SortableDoubleField 
sortMissingLast=true omitNorms=true/  
fieldType name=randomclass=solr.RandomSortField 
indexed=true/
fieldType name=text_ws   class=solr.TextField 
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType



fieldType name=text class=solr.TextField 
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!--tokenizer 
class=solr.LowerCaseTokenizerFactory/--
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory 
words=stopwords.txt ignoreCase=true/
filter class=solr.LengthFilterFactory min=4 
max=100/
filter class=solr.PorterStemFilterFactory/
filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
!--tokenizer 
class=solr.LowerCaseTokenizerFactory/--
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory 
words=stopwords.txt ignoreCase=true/
filter class=solr.LengthFilterFactory min=4 
max=100/
filter class=solr.PorterStemFilterFactory/
filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
fieldtype name=ignored stored=false indexed=false 
class=solr.StrField/
  /types

  
  !-- LOCAL CONFIG --
 fields
   
  !-- article --
  field name=article_id  
type=long indexed=true  stored=true  required=true/
  field name=article_nom 
type=string   indexed=true  stored=true/ 
  field name=article_auteur_id   
type=long indexed=true  stored=true/
  field name=article_typeArt_id  
type=long indexed=true  stored=true/ 
  field name=article_dateCreationtype=string   
indexed=true  stored=true/
  field name=article_dateDerniereModif   type=string   
indexed=true  stored=true/
  
  !-- article_categorie --
  field name=article_categorie_articles_id   type=long 
indexed=true  stored=true/
  field name=article_categorie_categorie_id  type=long 
indexed=true  stored=true/ 
  
  
  !-- article_groupe --
  field name=article_groupe_articles_id  type=long 
indexed=true  stored=true/
  field name=article_groupe_groupeArticle_id type=long 
indexed=true  stored=true/
  field name=article_groupe_Article_id   type=long 
indexed=true  stored=true/ 
  field name=article_groupe_groupe_idtype=long 
indexed=true  stored=true/

  !-- section --
  field name=section_id  
type=long indexed=true  stored=true/
  field name=section_article_id  
type=long indexed=true

Custom Geocoder with Solr and Autosuggest

2012-08-13 Thread Spadez

Hi,

I want to create a very simple geocoder for returning co-ordinates of a
place if a user enters in a town or city. There seems to be very little
information about doing it the way I suggest, so I hope I am on a good path.

My first decision was to divide SOLR into two cores, since I am already
using SOLR as my search server. One core would be for the main search of the
site and one for the geocoding.

My second decision is to store the name data in a normalised state, some
examples are shown below:
London, England
England
Swindon, Wiltshire, England

The third decision was to return “autosuggest” results, for example when the
user types “Lond” I would like to suggest “London, England”. For this to
work I think it makes sense to return up to 5 results via JSON based on
relevancy and have these displayed under the search box.

My fourth decision is that when the user actually hits the “search” button
on the location field, SOLR is again queries and returns the most relevant
result, including the co-ordinates which are stored.

Am I on a good path here? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Geocoder-with-Solr-and-Autosuggest-tp4000791.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom Plugins for solr

Hi ,

I would like to write a custom component for solr  to address a particular
issue.

This is what I have been doing ,write the custom code directly in the
downloaded code base and rebuild the war file and deploy the same. We
currently have multiple cores ,hence  I want to approach this in a core
specific way as opposed to affecting all the cores in the webapp .

If I have to write a plugin and move it to the lib directory of each core
,would I just need to add one single class file packed as a jar  and make
appropriate changes to the solrconfig file .When I reload the core , I am
assuming that apart from the  classes in the war file ,this jar file in the
lib will be automatically referenced.

Would I need to restart sevlet container?
Would I need to have other files to which this custom class is referencing
to in the custom jar file or will that be automatically taken care of?

Regards
Sujatha

RE: multi-searching problem

 schema
 fields
 ...
 dynamicField name=attr_*    
 type=text indexed=true stored=true
 multiValued=true/ 
    defType = edismax qf =
 article_id article_nom   
  /fields
 
 uniqueKeyarticle_id/uniqueKey
 
 solrQueryParser defaultOperator=OR/
  
 /schema
 
 
 But i have this error: 
 
 ###
 org.xml.sax.SAXParseException: The reference to entity
 defType must end with the ';' delimiter. at

Hi Videnova,

defType=edismaxqf=article_id article_nomstart=0rows=10 is meant to append 
to you search URL. Alternatively you can set these parameters in defaults 
section.  
 
These default definition are belong to solrconfig.xml. (not schema.xml) Please 
see example solrconfig and search for 'edismax'

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/solrconfig.xml

You should have something like that : 

 requestHandler name=/search class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str

   !-- Query settings --
   str name=defTypeedismax/str
   str name=qfarticle_id article_nom/str
   str name=dftext/str
 /lst

RE: multi-searching problem

 field name=all type=string indexed=true 
 stored=true multiValued=true/
     copyfield source=article_id
 dest=all/ 
     copyfield source=article_nom
 dest=all/
  /fields
 
 uniqueKeyarticle_id/uniqueKey
 defaultSearchFieldall/defaultSearchField

It is always good idea to edit example schema.xml according to your needs.
See  copyField declarations. 

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml

It is not copyfield  but copyField. Case-sensitive stuff. And copyField 
declarations are defined under the uniqueKeyarticle_id/uniqueKey definition 
in example schema.xml. 

By the way using edismax is more flexible than catch all field.

Re: Custom Plugins for solr

Michael Della Bitta
Hi Sujatha,

Are you adding a new class, or modifying one of the provided Solr classes?

Michael



Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun suja.a...@gmail.com wrote:
 Hi ,

 I would like to write a custom component for solr  to address a particular
 issue.

 This is what I have been doing ,write the custom code directly in the
 downloaded code base and rebuild the war file and deploy the same. We
 currently have multiple cores ,hence  I want to approach this in a core
 specific way as opposed to affecting all the cores in the webapp .

 If I have to write a plugin and move it to the lib directory of each core
 ,would I just need to add one single class file packed as a jar  and make
 appropriate changes to the solrconfig file .When I reload the core , I am
 assuming that apart from the  classes in the war file ,this jar file in the
 lib will be automatically referenced.

 Would I need to restart sevlet container?
 Would I need to have other files to which this custom class is referencing
 to in the custom jar file or will that be automatically taken care of?

 Regards
 Sujatha

Re: AW: AW: Indexing wildcard patterns

2012-08-13 Thread Jack Krupansky

Ah, okay, I see the usage now. In SQL the right operand of LIKE can be 
either a literal wildcard pattern or an expression which is evaluated 
per-row during the query. Solr/Lucene has the former, but not the latter. 
The wildcard pattern will be fixed at the start of the search.


-- Jack Krupansky

-Original Message- 
From: Lochschmied, Alexander

Sent: Monday, August 13, 2012 3:05 AM
To: solr-user@lucene.apache.org
Subject: AW: AW: Indexing wildcard patterns

Here is what we do in SQL:

mysql select * from _tbl;
+++
| id | field  |
+++
|  1 | plain text |
|  2 | wil_c% |
+++
2 rows in set (0.14 sec)

mysql SELECT * FROM _TBL WHERE 'wildcard' LIKE FIELD;
+++
| id | field  |
+++
|  2 | wil_c% |
+++
1 row in set (0.12 sec)

So the patterns are associated with the actual documents in the database. We 
use those fields as a means to manually customize some searches.


Thanks,
Alexander

-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com]
Gesendet: Freitag, 10. August 2012 18:39
An: solr-user@lucene.apache.org
Betreff: Re: AW: Indexing wildcard patterns

Doc1 has the pattern AB%CD% associated with it (somehow?!).

You need to clarify what you mean by that.

To be clear, Solr support for wildcards is a superset of the SQL LIKE 
operator, and the patterns used in the LIKE operator are NOT stored in the 
table data, but used at query time - same with Solr. In SQL you do not 
associate patterns with table data, but rather you query data using a 
pattern.


Step back and describe the problem you are trying to solve rather than 
prematurely jumping into a proposed solution.


So, if there is something you already do in SQL and now wish to do it in 
Solr, please tell us about it.


-- Jack Krupansky

-Original Message-
From: Lochschmied, Alexander
Sent: Friday, August 10, 2012 5:25 AM
To: solr-user@lucene.apache.org
Subject: AW: Indexing wildcard patterns

I thought my question might be confusing...

I know about Solr providing wildcards in queries, but my problem is 
different.


I have those patterns associated with my searchable documents before any 
actual search is done.
I need Solr to return the document which is associated with matching 
patterns. User does not enter the wildcard pattern; wildcard pattern must be 
tested by Solr automatically.


So in the example I provided below, a user might enter  ABCDXYZ  and I 
need Solr to return Doc1, as Doc1 has the pattern AB%CD% associated with 
it (somehow?!).


Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com]
Gesendet: Freitag, 10. August 2012 10:34
An: solr-user@lucene.apache.org
Betreff: Re: Indexing wildcard patterns



--- On Fri, 8/10/12, Lochschmied, Alexander 
alexander.lochschm...@vishay.com wrote:



From: Lochschmied, Alexander alexander.lochschm...@vishay.com
Subject: Indexing wildcard patterns
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date: Friday, August 10, 2012, 11:07 AM Coming from a SQL database
based search system, we already have a set of defined patterns
associated with our searchable documents.

% matches no or any number of characters _ matches one character

Example:
Doc 1: 'AB%CD', 'AB%CD%'
Doc 2: 'AB_CD'
...

Thus Doc 1 matches
ABXYZCD
ABCD
ABCDXYZ
...

Whereas Doc 2 matches only
ABXCD
ABYCD
ABZCD
...

This can be achieved in SQL WHERE statements using the LIKE operator.

Is there a (similar) way to this in Solr?


Yes, wildcard search in solr

* matches no or any number of characters ? matches one character

http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Wildcard%20Searches

RE: multi-searching problem

Hi Arslan,

Thank you for your answer, finally it was only my bad between copyfield and
copyField. Now all good.
I don't know how copyField and edismax working exactly, but can I do both?
Currently I copyed all fields in all
defaultSearchFieldall/defaultSearchField.
So can I use edismax as well in the solrconfig.xml side?

Thank you!

-Message d'origine-
De : Ahmet Arslan [mailto:iori...@yahoo.com]
Envoyé : lundi 13 août 2012 13:44
À : solr-user@lucene.apache.org
Objet : RE: multi-searching problem

field name=all type=string indexed=true stored=true
multiValued=true/
copyfield source=article_id
dest=all/
copyfield source=article_nom
dest=all/
/fields

uniqueKeyarticle_id/uniqueKey
defaultSearchFieldall/defaultSearchField

It is always good idea to edit example schema.xml according to your needs.
See copyField declarations.

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml

It is not copyfield but copyField. Case-sensitive stuff. And copyField
declarations are defined under the uniqueKeyarticle_id/uniqueKey definition
in example schema.xml.

By the way using edismax is more flexible than catch all field.

Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be copied,
disclosed to, retained or used by, any other party. If you are not an intended
recipient then please promptly delete this e-mail and any attachment and all
copies and inform the sender. Thank you.

Setting metadata while indexing custom file

2012-08-13 Thread 122jxgcn

Hello,

I'd like to set Content-Type of the file while I'm using
ExtractRequestHandler to pass file to Tika.
As I'm indexing custom file type, it seems that Tika is not matching my file
to the right custom parser.
So I really need to explicitly declare Content-Type of my custom file so
that it cannot miss the right parser.
Until now, passing filename by resource.name variable is not working for me.

How can I do this?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-metadata-while-indexing-custom-file-tp4000781.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Plugins for solr

Adding a new class

Regards
Sujatha

On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Michael Della Bitta
 Hi Sujatha,

 Are you adding a new class, or modifying one of the provided Solr classes?

 Michael


 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun suja.a...@gmail.com wrote:
  Hi ,
 
  I would like to write a custom component for solr  to address a
 particular
  issue.
 
  This is what I have been doing ,write the custom code directly in the
  downloaded code base and rebuild the war file and deploy the same. We
  currently have multiple cores ,hence  I want to approach this in a core
  specific way as opposed to affecting all the cores in the webapp .
 
  If I have to write a plugin and move it to the lib directory of each core
  ,would I just need to add one single class file packed as a jar  and make
  appropriate changes to the solrconfig file .When I reload the core , I am
  assuming that apart from the  classes in the war file ,this jar file in
 the
  lib will be automatically referenced.
 
  Would I need to restart sevlet container?
  Would I need to have other files to which this custom class is
 referencing
  to in the custom jar file or will that be automatically taken care of?
 
  Regards
  Sujatha

RE: solr indexing problem

Finally i found it:

In the dataconfig my sql request wasn't good.

-Message d'origine-
De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] 
Envoyé : lundi 13 août 2012 10:41
À : solr-user@lucene.apache.org
Objet : RE: solr indexing problem

Some ideas?

-Message d'origine-
De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] 
Envoyé : vendredi 10 août 2012 11:05
À : solr-user@lucene.apache.org
Objet : RE: solr indexing problem


This is schema.xml
###
?xml version=1.0 ?


schema name=db version=1.1
  types
  fieldType name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/
fieldType name=integer   class=solr.IntField   
omitNorms=true/
fieldType name=long  class=solr.LongField  
omitNorms=true/
fieldType name=float class=solr.FloatField 
omitNorms=true/
fieldType name=doubleclass=solr.DoubleField
omitNorms=true/
fieldType name=boolean   class=solr.BoolField 
sortMissingLast=true omitNorms=true/
fieldType name=date  class=solr.DateField 
sortMissingLast=true omitNorms=true/
fieldType name=sint  class=solr.SortableIntField 
sortMissingLast=true omitNorms=true/
fieldType name=slong class=solr.SortableLongField 
sortMissingLast=true omitNorms=true/
fieldType name=sfloatclass=solr.SortableFloatField 
sortMissingLast=true omitNorms=true/
fieldType name=sdouble   class=solr.SortableDoubleField 
sortMissingLast=true omitNorms=true/  
fieldType name=randomclass=solr.RandomSortField 
indexed=true/
fieldType name=text_ws   class=solr.TextField 
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType



fieldType name=text class=solr.TextField 
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!--tokenizer 
class=solr.LowerCaseTokenizerFactory/--
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory 
words=stopwords.txt ignoreCase=true/
filter class=solr.LengthFilterFactory min=4 
max=100/
filter class=solr.PorterStemFilterFactory/
filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
!--tokenizer 
class=solr.LowerCaseTokenizerFactory/--
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory 
words=stopwords.txt ignoreCase=true/
filter class=solr.LengthFilterFactory min=4 
max=100/
filter class=solr.PorterStemFilterFactory/
filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
fieldtype name=ignored stored=false indexed=false 
class=solr.StrField/
  /types

  
  !-- LOCAL CONFIG --
 fields
   
  !-- article --
  field name=article_id  
type=long indexed=true  stored=true  required=true/
  field name=article_nom 
type=string   indexed=true  stored=true/ 
  field name=article_auteur_id   
type=long indexed=true  stored=true/
  field name=article_typeArt_id  
type=long indexed=true  stored=true/ 
  field name=article_dateCreationtype=string   
indexed=true  stored=true/
  field name=article_dateDerniereModif   type=string   
indexed=true  stored=true/
  
  !-- article_categorie --
  field name=article_categorie_articles_id   type=long 
indexed=true  stored=true/
  field name=article_categorie_categorie_id  type=long 
indexed=true  stored=true/ 
  
  
  !-- article_groupe --
  field name=article_groupe_articles_id  type=long 
indexed=true  stored=true/
  field name=article_groupe_groupeArticle_id type=long 
indexed=true  stored=true/
  field name=article_groupe_Article_id   type=long 
indexed=true  stored=true/ 
  field name=article_groupe_groupe_idtype=long

luceneMatchVersion

2012-08-13 Thread Angelo Quaglia

Hi,

 

We are using Apache Solr 1.4.1 since last year and we are very happy about
it.

 

We are now looking into the upgrade to Solr 3.6.1 but we have stumbled
against a critical (for us) issue for which a workaround seems to be the use
of

 

luceneMatchVersionLUCENE_33/luceneMatchVersion

 

in the Solr configuration.

 

The issue is documented here:

https://issues.apache.org/jira/browse/LUCENE-3668?
https://issues.apache.org/jira/browse/SOLR-3390? 
http://lucene.472066.n3.nabble.com/Highlight-with-multi-word-synonyms-td3610
466.html#a3644439

 

 

I have been unable to find official documentation about using a Solr version
without a prior version of Lucene.

 

Is this officially supported?

 

Are there any recommended alternatives?

 

Thanks in advance,

 

Angelo

 

 

Ing. Angelo Quaglia

External Consultant

European Commission, DG Joint Research Centre
Institute for Environment and Sustainability

Digital Earth and Reference Data Unit, T.P. 262 

Via E. Fermi, 2749. 
I-21027 Ispra (VA)
Italy 

Tel: +39 0332 78 5325
Fax: +39 0332 78 6325
e-mail:  mailto:angelo.quag...@ext.jrc.ec.europa.eu
mailto:angelo.quag...@ext.jrc.ec.europa.eu

URL:
http://ies.jrc.ec.europa.eu/SDI/sdi-about-us/staff-profiles/angelo-quaglia.
html
http://ies.jrc.ec.europa.eu/SDI/sdi-about-us/staff-profiles/angelo-quaglia.h
tml

 

The views expressed are purely those of the writer and may not in any
circumstances be regarded as stating an official position of the European
Commission.

Index not loading

2012-08-13 Thread Jonatan Fournier

Hi,

I'm using Solr 4.0.0-ALPHA and the EmbeddedSolrServer.

Within my SolrJ application, the documents are added to the server
using the commitWithin parameter (in my case 60s). After 1 day my 125
millions document are all added to the server and I can see 89G of
index data files. I stop my SolrJ application and reload my Solr
instance in Tomcat.

From the Solr admin panel related to my Core (collection1) I see this info:


Last Modified:
Num Docs:0
Max Doc:0
Version:1
Segment Count:0
Optimized: (green check)
Current:  (green check)
Master: 
Version: 0
Gen: 1
Size: 88.14 GB


From the general Core Admin panel I see:

lastModified:
version:1
numDocs:0
maxDoc:0
optimized: (red circle)
current: (green check)
hasDeletions: (red circle)

If I query my index for *:* I get 0 result. If I trigger optimize it
wipes ALL my data inside the index and reset to empty. I've played
around my EmbeddedServer initially using autoCommit/softCommit and it
was working fine. Now that I've switched to commitWithin the document
add query, it always do that! I'm never able to reload my index within
Tomcat/Solr.

Any idea?

Cheers,

/jonathan

Re: Setting metadata while indexing custom file

2012-08-13 Thread Jack Krupansky


The wiki page show how to use the -H option of curl to set the Content-Type.

See:
http://wiki.apache.org/solr/ExtractingRequestHandler

SolrJ requires some extra coding.

-- Jack Krupansky

-Original Message- 
From: 122jxgcn

Sent: Monday, August 13, 2012 5:49 AM
To: solr-user@lucene.apache.org
Subject: Setting metadata while indexing custom file

Hello,

I'd like to set Content-Type of the file while I'm using
ExtractRequestHandler to pass file to Tika.
As I'm indexing custom file type, it seems that Tika is not matching my file
to the right custom parser.
So I really need to explicitly declare Content-Type of my custom file so
that it cannot miss the right parser.
Until now, passing filename by resource.name variable is not working for me.

How can I do this?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-metadata-while-indexing-custom-file-tp4000781.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Plugins for solr

Then you're on the right track.

1. You'd either have to restart Tomcat or in the case of Multicore
setups, reload the core.
2. If the jar has dependencies outside of the Solr provided classes,
you'll have to include those as well. If it only depends on Solr stuff
or things that are in the servlet container's classpath, you should be
fine with just the one class.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Aug 13, 2012 at 10:36 AM, Sujatha Arun suja.a...@gmail.com wrote:
 Adding a new class

 Regards
 Sujatha

 On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 Michael Della Bitta
 Hi Sujatha,

 Are you adding a new class, or modifying one of the provided Solr classes?

 Michael


 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun suja.a...@gmail.com wrote:
  Hi ,
 
  I would like to write a custom component for solr  to address a
 particular
  issue.
 
  This is what I have been doing ,write the custom code directly in the
  downloaded code base and rebuild the war file and deploy the same. We
  currently have multiple cores ,hence  I want to approach this in a core
  specific way as opposed to affecting all the cores in the webapp .
 
  If I have to write a plugin and move it to the lib directory of each core
  ,would I just need to add one single class file packed as a jar  and make
  appropriate changes to the solrconfig file .When I reload the core , I am
  assuming that apart from the  classes in the war file ,this jar file in
 the
  lib will be automatically referenced.
 
  Would I need to restart sevlet container?
  Would I need to have other files to which this custom class is
 referencing
  to in the custom jar file or will that be automatically taken care of?
 
  Regards
  Sujatha

Solr4.0 Partially update document

2012-08-13 Thread Bing Hua

Hi,

Several days ago I came across some solrj test code on partially updating
document field values. Sadly I forgot where that was. In Solr 4.0, /update
is able to take in document id and fields as hashmaps like

id: doc1
field1: {set:new_value}

Just trying to figure out what's the solrj client code that does this.

Thanks for any help on this,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: multi-searching problem

--- On Mon, 8/13/12, Videnova, Svetlana svetlana.viden...@logica.com wrote:

 From: Videnova, Svetlana svetlana.viden...@logica.com
 Subject: RE: multi-searching problem
  Thank you for your answer, finally it was only my bad
 between copyfield and copyField. Now all good.
 I don't know how copyField and edismax working exactly, but
 can I do both? 
 Currently I copyed all fields in all
 defaultSearchFieldall/defaultSearchField.
 So can I use edismax as well in the solrconfig.xml side?

(e)dismax is designed to search over multiple fields with different boosts. 
(article_id, article_norm, title etc)

Some advantages of (e)dismax over catch all field.
1) You can give different boosts to fields. qf=article_id^5 article_norm^3
2) If you want to add another search field you have to change your schema and 
re-index. With dismax no re-index is required.
3) With catch all field, you cannot use different fieldTypes. Think that you 
have two different field types for article_id and article_norm. (you may want 
different analysis for different fields) Once you copy them into all field, now 
you will be using fieldType of 'all' field. copyField copies raw content.
4-) Catch all field increases your index size.
5-) There is lots of useful parameters that you can fine-tune your relevancy. 
http://wiki.apache.org/solr/ExtendedDisMax

Re: Solr4.0 Partially update document

2012-08-13 Thread Bing Hua

Got it at 

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java

Problem solved.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-0-Partially-update-document-tp4000875p4000878.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.0.0, query, default port not changeable

2012-08-13 Thread Raghav Karol

Hello *,

Running Solr 4.0.0-ALHPA we have an issue with queries. 

We would like to use multiple jvm's to host solr cores but can not because the 
queries ignore the jetty.port settings. The following is they query generated 
using the admin interface, solr is running in jetty under port 8080.

http://solr-cluster-1.issuu.com:8983/solr/core0/select?q=*%3A*wt=xml

Has anyone tried to change to deploy solr using an external jetty, i.e., not 
the example start.jar on a port other than 8983?

--
Raghav

Re: Solr 4.0.0, query, default port not changeable

2012-08-13 Thread Jack Krupansky


Did you try this:
http://lucene.472066.n3.nabble.com/How-to-change-a-port-td490375.html

-- Jack Krupansky

-Original Message- 
From: Raghav Karol

Sent: Monday, August 13, 2012 11:49 AM
To: solr-user@lucene.apache.org
Subject: Solr 4.0.0, query, default port not changeable

Hello *,

Running Solr 4.0.0-ALHPA we have an issue with queries.

We would like to use multiple jvm's to host solr cores but can not because 
the queries ignore the jetty.port settings. The following is they query 
generated using the admin interface, solr is running in jetty under port 
8080.


http://solr-cluster-1.issuu.com:8983/solr/core0/select?q=*%3A*wt=xml

Has anyone tried to change to deploy solr using an external jetty, i.e., not 
the example start.jar on a port other than 8983?


--
Raghav=

Re: Solr 4.0.0, query, default port not changeable

2012-08-13 Thread Chris Hostetter


: We would like to use multiple jvm's to host solr cores but can not 
: because the queries ignore the jetty.port settings. The following is 
: they query generated using the admin interface, solr is running in jetty 
: under port 8080.
: 
: http://solr-cluster-1.issuu.com:8983/solr/core0/select?q=*%3A*wt=xml

can you please elaborate on what you mean...

how exactly are you running solr? 
how are you configuring jetty? 
how are you executing the query?
where did you see that URL?

It sounds like you are asking about the Query form in the Admin UI.
If i start Solr up in jetty using port 8080, and load the Admin UI query 
form...

http://localhost:8080/solr/#/collection1/query

Then when i click Execute Query, the URL fetched by the UI is...

http://localhost:8080/solr/collection1/select?q=*%3A*wt=xml


-Hoss

Are there any comparisons of Elastic Search specifically with SOLR 4?

2012-08-13 Thread Alexandre Rafalovitch

Hello,

Has anybody compared feature set of SOLR 4 with Elastic Search? I saw
some earlier comparisons and they talked about sharding and
distributed service, etc. Seems to me, most of those are addressed in
version 4.

The only big issue I see is a better support from ES for nested items
and/or parent linking, which I guess on lucene level is done with some
sort of dynamic-style fields? Not sure what's SOLR's answer for that
(JOIN seems to be a little different).

I don't need that information for myself (I like SOLR well enough),
but I was asked to compare and am trying to actually understand the
difference. And, for our use-case, I think we can consider JSON vs.
QueryString and dynamic schema changes to be less important. I am
looking at non-reproducible difference between those two solutions
(given that they are both built on Lucene (4?) ).

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

How to configure Spell check component in SOLR 3.6.1

2012-08-13 Thread bbarani

Hi,

I am trying to configure spell check component in SOLR. I just want to
confirm if I am on right path

I have a text field - field name: name_spell (with no analyzers, uses
solr.TextField field type). This field will be used for building terms for
spell check. I have copied necessary data (for building spell check index)
from other fields to this field.

I configured the spell check component to include the above field type and
field name (Is this correct),
str name=queryAnalyzerFieldTypespell/str 

 lst name=spellchecker
  str name=namedefault/str
  str name=fieldname_spell/str 
  str name=spellcheckIndexDirspellchecker
/str/lst

Now, I created a spellcheck index via URL command (spellcheck.build=true)
and it seems to work fine for few keywords but doesnt seems to work for few
keywords hence not sure if I have configured spell chec k component
properly? It would be great if someone can confirm the same?

Thanks a lot!!!

Thanks,
BB



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-configure-Spell-check-component-in-SOLR-3-6-1-tp4000893.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Plugins for solr

Thanks ,I am going to try this on solr 1.3 version .Would the approach be
any different for the recent sorl versions?

Regards
Sujatha

On Mon, Aug 13, 2012 at 8:53 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Then you're on the right track.

 1. You'd either have to restart Tomcat or in the case of Multicore
 setups, reload the core.
 2. If the jar has dependencies outside of the Solr provided classes,
 you'll have to include those as well. If it only depends on Solr stuff
 or things that are in the servlet container's classpath, you should be
 fine with just the one class.

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Mon, Aug 13, 2012 at 10:36 AM, Sujatha Arun suja.a...@gmail.com
 wrote:
  Adding a new class
 
  Regards
  Sujatha
 
  On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  Michael Della Bitta
  Hi Sujatha,
 
  Are you adding a new class, or modifying one of the provided Solr
 classes?
 
  Michael
 
 
  
  Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
  www.appinions.com
  Where Influence Isn’t a Game
 
 
  On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun suja.a...@gmail.com
 wrote:
   Hi ,
  
   I would like to write a custom component for solr  to address a
  particular
   issue.
  
   This is what I have been doing ,write the custom code directly in the
   downloaded code base and rebuild the war file and deploy the same. We
   currently have multiple cores ,hence  I want to approach this in a
 core
   specific way as opposed to affecting all the cores in the webapp .
  
   If I have to write a plugin and move it to the lib directory of each
 core
   ,would I just need to add one single class file packed as a jar  and
 make
   appropriate changes to the solrconfig file .When I reload the core ,
 I am
   assuming that apart from the  classes in the war file ,this jar file
 in
  the
   lib will be automatically referenced.
  
   Would I need to restart sevlet container?
   Would I need to have other files to which this custom class is
  referencing
   to in the custom jar file or will that be automatically taken care of?
  
   Regards
   Sujatha

Re: Custom Plugins for solr

No, the jar would be exactly the same, with the caveat that you'd have
to build against the newer Solr version of course.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Aug 13, 2012 at 12:55 PM, Sujatha Arun suja.a...@gmail.com wrote:
 Thanks ,I am going to try this on solr 1.3 version .Would the approach be
 any different for the recent sorl versions?

 Regards
 Sujatha

 On Mon, Aug 13, 2012 at 8:53 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 Then you're on the right track.

 1. You'd either have to restart Tomcat or in the case of Multicore
 setups, reload the core.
 2. If the jar has dependencies outside of the Solr provided classes,
 you'll have to include those as well. If it only depends on Solr stuff
 or things that are in the servlet container's classpath, you should be
 fine with just the one class.

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Mon, Aug 13, 2012 at 10:36 AM, Sujatha Arun suja.a...@gmail.com
 wrote:
  Adding a new class
 
  Regards
  Sujatha
 
  On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  Michael Della Bitta
  Hi Sujatha,
 
  Are you adding a new class, or modifying one of the provided Solr
 classes?
 
  Michael
 
 
  
  Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
  www.appinions.com
  Where Influence Isn’t a Game
 
 
  On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun suja.a...@gmail.com
 wrote:
   Hi ,
  
   I would like to write a custom component for solr  to address a
  particular
   issue.
  
   This is what I have been doing ,write the custom code directly in the
   downloaded code base and rebuild the war file and deploy the same. We
   currently have multiple cores ,hence  I want to approach this in a
 core
   specific way as opposed to affecting all the cores in the webapp .
  
   If I have to write a plugin and move it to the lib directory of each
 core
   ,would I just need to add one single class file packed as a jar  and
 make
   appropriate changes to the solrconfig file .When I reload the core ,
 I am
   assuming that apart from the  classes in the war file ,this jar file
 in
  the
   lib will be automatically referenced.
  
   Would I need to restart sevlet container?
   Would I need to have other files to which this custom class is
  referencing
   to in the custom jar file or will that be automatically taken care of?
  
   Regards
   Sujatha

Confused with suggestion / collate suggest - spell check component

2012-08-13 Thread bbarani

Hi,

I am trying to figure out if this is the expected behaviour of spell check
component. (when using collate=true)

I am searching for keyword 'high tet', the suggester returns expected result
'test' but I expected the collated results to return 'high test' (corrected
word returned by suggester) but it returns 'high tet'. Does the collation
returns proper suggested results only if both the words are contiguous in a
document?

- lst name=suggestions
- lst name=tet
  int name=numFound1/int 
  int name=startOffset5/int 
  int name=endOffset8/int 
  int name=origFreq0/int 
- arr name=suggestion
- lst
  str name=wordtest/str 
  int name=freq35/int 
  /lst
  /arr
  /lst
  bool name=correctlySpelledfalse/bool 
  str name=collationhigh tet/str 
  /lst
  /lst
  /response

Thanks in advance

Thanks,
BB



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Confused-with-suggestion-collate-suggest-spell-check-component-tp4000903.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Plugins for solr

What I would be doing is this ..

Create a custom class that refer to all  org,apache.* classes (import stt)
,the custom file's  location is  independent of the solr core class files.
compile this separately
package this as a jar
move this to lib dir of each solr core
refer to this in lib directory of solrconfig.xml
realod the core.

I am assuming that I am not directly handling the solr download src files
or the war files ,Is this correct?do I have to be concerned with build
files etc? How then does the approach differ in the later versions?

Regards
Sujatha









On Mon, Aug 13, 2012 at 10:30 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 No, the jar would be exactly the same, with the caveat that you'd have
 to build against the newer Solr version of course.

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Mon, Aug 13, 2012 at 12:55 PM, Sujatha Arun suja.a...@gmail.com
 wrote:
  Thanks ,I am going to try this on solr 1.3 version .Would the approach be
  any different for the recent sorl versions?
 
  Regards
  Sujatha
 
  On Mon, Aug 13, 2012 at 8:53 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  Then you're on the right track.
 
  1. You'd either have to restart Tomcat or in the case of Multicore
  setups, reload the core.
  2. If the jar has dependencies outside of the Solr provided classes,
  you'll have to include those as well. If it only depends on Solr stuff
  or things that are in the servlet container's classpath, you should be
  fine with just the one class.
 
  Michael Della Bitta
 
  
  Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
  www.appinions.com
  Where Influence Isn’t a Game
 
 
  On Mon, Aug 13, 2012 at 10:36 AM, Sujatha Arun suja.a...@gmail.com
  wrote:
   Adding a new class
  
   Regards
   Sujatha
  
   On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta 
   michael.della.bi...@appinions.com wrote:
  
   Michael Della Bitta
   Hi Sujatha,
  
   Are you adding a new class, or modifying one of the provided Solr
  classes?
  
   Michael
  
  
   
   Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
   www.appinions.com
   Where Influence Isn’t a Game
  
  
   On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun suja.a...@gmail.com
  wrote:
Hi ,
   
I would like to write a custom component for solr  to address a
   particular
issue.
   
This is what I have been doing ,write the custom code directly in
 the
downloaded code base and rebuild the war file and deploy the same.
 We
currently have multiple cores ,hence  I want to approach this in a
  core
specific way as opposed to affecting all the cores in the webapp .
   
If I have to write a plugin and move it to the lib directory of
 each
  core
,would I just need to add one single class file packed as a jar
  and
  make
appropriate changes to the solrconfig file .When I reload the core
 ,
  I am
assuming that apart from the  classes in the war file ,this jar
 file
  in
   the
lib will be automatically referenced.
   
Would I need to restart sevlet container?
Would I need to have other files to which this custom class is
   referencing
to in the custom jar file or will that be automatically taken care
 of?
   
Regards
Sujatha

Re: Custom Plugins for solr

Sujatha,

As the API of the classes you're compiling against may have changed
with a different Solr version, it's always a good idea to build
against the new version of Solr, otherwise you might see weird issues
at runtime.

You wouldn't have to do anything special other than to drop your src
file into the new Solr project as you have with this one, recompile,
and rebuild your jar.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Aug 13, 2012 at 1:11 PM, Sujatha Arun suja.a...@gmail.com wrote:
 What I would be doing is this ..

 Create a custom class that refer to all  org,apache.* classes (import stt)
 ,the custom file's  location is  independent of the solr core class files.
 compile this separately
 package this as a jar
 move this to lib dir of each solr core
 refer to this in lib directory of solrconfig.xml
 realod the core.

 I am assuming that I am not directly handling the solr download src files
 or the war files ,Is this correct?do I have to be concerned with build
 files etc? How then does the approach differ in the later versions?

 Regards
 Sujatha









 On Mon, Aug 13, 2012 at 10:30 PM, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:

 No, the jar would be exactly the same, with the caveat that you'd have
 to build against the newer Solr version of course.

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Mon, Aug 13, 2012 at 12:55 PM, Sujatha Arun suja.a...@gmail.com
 wrote:
  Thanks ,I am going to try this on solr 1.3 version .Would the approach be
  any different for the recent sorl versions?
 
  Regards
  Sujatha
 
  On Mon, Aug 13, 2012 at 8:53 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  Then you're on the right track.
 
  1. You'd either have to restart Tomcat or in the case of Multicore
  setups, reload the core.
  2. If the jar has dependencies outside of the Solr provided classes,
  you'll have to include those as well. If it only depends on Solr stuff
  or things that are in the servlet container's classpath, you should be
  fine with just the one class.
 
  Michael Della Bitta
 
  
  Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
  www.appinions.com
  Where Influence Isn’t a Game
 
 
  On Mon, Aug 13, 2012 at 10:36 AM, Sujatha Arun suja.a...@gmail.com
  wrote:
   Adding a new class
  
   Regards
   Sujatha
  
   On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta 
   michael.della.bi...@appinions.com wrote:
  
   Michael Della Bitta
   Hi Sujatha,
  
   Are you adding a new class, or modifying one of the provided Solr
  classes?
  
   Michael
  
  
   
   Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
   www.appinions.com
   Where Influence Isn’t a Game
  
  
   On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun suja.a...@gmail.com
  wrote:
Hi ,
   
I would like to write a custom component for solr  to address a
   particular
issue.
   
This is what I have been doing ,write the custom code directly in
 the
downloaded code base and rebuild the war file and deploy the same.
 We
currently have multiple cores ,hence  I want to approach this in a
  core
specific way as opposed to affecting all the cores in the webapp .
   
If I have to write a plugin and move it to the lib directory of
 each
  core
,would I just need to add one single class file packed as a jar
  and
  make
appropriate changes to the solrconfig file .When I reload the core
 ,
  I am
assuming that apart from the  classes in the war file ,this jar
 file
  in
   the
lib will be automatically referenced.
   
Would I need to restart sevlet container?
Would I need to have other files to which this custom class is
   referencing
to in the custom jar file or will that be automatically taken care
 of?
   
Regards
Sujatha

Re: Running out of memory

2012-08-13 Thread Jon Drukman

On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote:

  It would be vastly preferable if Solr could just exit when it gets a
 memory
  error, because we have it running under daemontools, and that would cause
  an automatic restart.
 -XX:OnOutOfMemoryError=cmd args; cmd args
 Run user-defined commands when an OutOfMemoryError is first thrown.

  Does Solr require the entire index to fit in memory at all times?
 No.

 But it's hard to say about your particular problem without additional
 information. How often do you commit? Do you use faceting? Do you sort
 by Solr fields and if yes what are those fields? And you should also
 check caches.


I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so
we'll see if that helps.  So far no out of memory errors.

Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-13 Thread Fuad Efendi

SOLR-4.0

I am trying to implement this; funny idea to share:

1. http://wiki.apache.org/solr/HierarchicalFaceting
unfortunately it does not support date ranges. However, workaround: use
String type instead of *_tdt and define fields such as
published_hour
published_day
published_week


Of course you will need to stick with timezone; but you can add an index(es)
for each timezone. And most important, string facets are much faster than
Date Trie ranges.



2. Our index is overs 100 millions (from social networks) and rapidly grows
(millions a day); cache warm up takes few minutes; Near-Real-Time does not
work with faceting.

However another workaround: we can have Daily Core (optimized at midnight),
plus Current Core (only today's data, optimized), plus Last Hour Core (near
real time)

Last Hour Data is small enough and we can use Facets with Near Real Time
feature

Service layer will accumulate search results from three layers, it will be
near real time.



Any thoughts? Thanks,




-- 
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca
http://www.linkedin.com/in/lucene

Solr Index linear growth - Performance degradation.

We have 4 shards with 14GB index on each of them
Each shard has a master and 3 slaves(each of them with 32GB RAM)

We're expecting that the index size will grow to double or triple in near
future.
So we thought of merging our indexes to 28GB index so that each shard has
28GB index and also increased our RAM on each slave to 48GB.

We made this changes locally and tested the server by sending same 10K
realistic queries to each server with 14GB  28GB index, we found that
1. For server with 14GB index(48GB RAM): search time was 480ms, number of
index hits: 3.8GB
2. For server with 28GB index(48GB RAM): search time was 900ms, number of
index hits: 7.2GB.

So we saw that having the whole index in RAM doesn't help in sustaining the
performance in terms of search time . Search time increased linearly to
double when the index size was doubled.

We were thinking of keeping only 4 shards configuration but it looks like
now we have to add another shard or another slave to each shard.

Is there way we can configure our servers so that the performance isn't
affected even when index size doubles or triples ?







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

2012-08-13 Thread Lance Norskog

How much ram do you assign to the JVM? The JVM should be allocated
maybe 1/2 gb more than it needs to run comfortably. Also, how large
are your caches?

How large are the documents? How many search terms are there? If you
add more documents are there new search terms?

On Mon, Aug 13, 2012 at 11:17 AM, feroz_kh feroz...@yahoo.com wrote:
We have 4 shards with 14GB index on each of them
Each shard has a master and 3 slaves(each of them with 32GB RAM)

We're expecting that the index size will grow to double or triple in near
future.
So we thought of merging our indexes to 28GB index so that each shard has
28GB index and also increased our RAM on each slave to 48GB.

We made this changes locally and tested the server by sending same 10K
realistic queries to each server with 14GB 28GB index, we found that
1. For server with 14GB index(48GB RAM): search time was 480ms, number of
index hits: 3.8GB
2. For server with 28GB index(48GB RAM): search time was 900ms, number of
index hits: 7.2GB.

So we saw that having the whole index in RAM doesn't help in sustaining the
performance in terms of search time . Search time increased linearly to
double when the index size was doubled.

We were thinking of keeping only 4 shards configuration but it looks like
now we have to add another shard or another slave to each shard.

Is there way we can configure our servers so that the performance isn't
affected even when index size doubles or triples ?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Lance Norskog
goks...@gmail.com

Getting Suggestions without Search Results

2012-08-13 Thread Bing Hua

Hi,

I'm having a spell check component that does auto-complete suggestions. It
is part of last-components of my /select search handler. So apart from
normal search results I also get a list of suggestions.

Now I want to split things up. Is there a way that I can only get
suggestions of a query without getting the normal search results? I may need
to create a new handler for this. Can anyone please give me some ideas on
that?

Thanks,
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Suggestions-without-Search-Results-tp4000968.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting Suggestions without Search Results

Does querying with rows=0 work?

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Aug 13, 2012 at 4:21 PM, Bing Hua bh...@cornell.edu wrote:


 Now I want to split things up. Is there a way that I can only

RE: Getting Suggestions without Search Results

2012-08-13 Thread Markus Jelsma

I haven't tried it but i'd try to use spellcheck.q as input and specifiy the 
spellcheck component in the components section, not the last-components section 
because components because it has (iirc) the five default components, query, 
debug, mlt, highlighter and facet.

-Original message-
 From:Michael Della Bitta michael.della.bi...@appinions.com
 Sent: Mon 13-Aug-2012 22:33
 To: solr-user@lucene.apache.org
 Subject: Re: Getting Suggestions without Search Results

 Does querying with rows=0 work?

 Michael Della Bitta

 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game

 On Mon, Aug 13, 2012 at 4:21 PM, Bing Hua bh...@cornell.edu wrote:

  Now I want to split things up. Is there a way that I can only

Re: Getting Suggestions without Search Results

 Now I want to split things up. Is there a way that I can
 only get
 suggestions of a query without getting the normal search
 results? I may need
 to create a new handler for this. Can anyone please give me
 some ideas on
 that?

Appending query=false disables QueryComponent. I am not sure id spellcheck 
component uses/needs results of QueryComponent. For example FacetComponent 
uses/needs results of QueryComponent.

Re: Are there any comparisons of Elastic Search specifically with SOLR 4?

2012-08-13 Thread Otis Gospodnetic

Hi,

I saw some old posts on the Solr vs. ES topic, but they were about 
performance/benchmarks only, and even those were not done correctly.

We have a couple of posts on that topic pending over on Sematext Blog.  We can 
publish them next week, so keep an eye on http://blog.sematext.com/ and/or 
http://twitter.com/sematext .

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Alexandre Rafalovitch arafa...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Monday, August 13, 2012 12:22 PM
Subject: Are there any comparisons of Elastic Search specifically with SOLR 4?
 
Hello,

Has anybody compared feature set of SOLR 4 with Elastic Search? I saw
some earlier comparisons and they talked about sharding and
distributed service, etc. Seems to me, most of those are addressed in
version 4.

The only big issue I see is a better support from ES for nested items
and/or parent linking, which I guess on lucene level is done with some
sort of dynamic-style fields? Not sure what's SOLR's answer for that
(JOIN seems to be a little different).

I don't need that information for myself (I like SOLR well enough),
but I was asked to compare and am trying to actually understand the
difference. And, for our use-case, I think we can consider JSON vs.
QueryString and dynamic schema changes to be less important. I am
looking at non-reproducible difference between those two solutions
(given that they are both built on Lucene (4?) ).

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Re: Solr Index linear growth - Performance degradation.

Here's few list of queries
---
parallel zur xml beschreibungsdatei gibt es eine
die verbindung zwischen beiden sei ten geschieht
die owner klasse muss sich aus der
benutzer ein oder mehrere lieblingsfarben ausw hlen kann
found sample questions at http bjs ojp
but more important parents need to keep
---
Here's the jvm ram assignment
-Xms24576m -Xmx24576m -XX:NewSize=6168m -XX:MaxNewSize=6168m
-XX:MaxPermSize=1024m
I believe that's enough assigned there...
-
I am not dealing with adding new documents here
Just testing the solr index search - i just the have the indexes.
For 14GB index the RAM cache gets filled with 14 GB around
For 28GB index the RAM cache gets filled with 28GB around
The Document cache size is 200MB max and initial 20MB.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001011.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr with UIMA

2012-08-13 Thread introfini


Rahul Warawdekar wrote
 
 Hi Divakar,
 
 Try making your updateRequestProcessorChain as default. Simply add
 default=true as follows and check if that works.
 
 updateRequestProcessorChain name=uima *default=true*
 
 

Rahul,

This fixed my problem, you saved my week!

I was following the README.txt instructions and they didn't work, after
adding the default=true it immediately start working. 

Maybe that should go into the README.txt?

Thank you.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p4001014.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

2012-08-13 Thread Erick Erickson

Instant reactions:

1 that's probably too much memory. Try, as Lance said, 1/2 of your
memory. Uwe Schindler wrote an excellent blog about this issue as it
relates to MMapDirectory
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

2 You've doubled the number of docs on the server and you're seeing a
doubling of the response time, right? On average, the number of
documents that have to be scored has also doubled, so I'm not entirely
surprised. If everything is in memory (which it sounds like it may
well be) then this isn't particularly surprising.

3 One note of caution. Saying a 14 GB index (or 28G index) isn't
very meaningful. The *.fdt and *.fdx files in your index directory are
where the verbatim copy of the data is stored for those fields where
stored=true in your schema. The contents of these files are almost
totally irrelevant to the memory requirements for searching. I've seen
these files range form 5% of the index to over 80%.

Best
Erick

On Mon, Aug 13, 2012 at 4:40 PM, feroz_kh feroz...@yahoo.com wrote:
Here's few list of queries
---
parallel zur xml beschreibungsdatei gibt es eine
die verbindung zwischen beiden sei ten geschieht
die owner klasse muss sich aus der
benutzer ein oder mehrere lieblingsfarben ausw hlen kann
found sample questions at http bjs ojp
but more important parents need to keep
---
Here's the jvm ram assignment
-Xms24576m -Xmx24576m -XX:NewSize=6168m -XX:MaxNewSize=6168m
-XX:MaxPermSize=1024m
I believe that's enough assigned there...
-
I am not dealing with adding new documents here
Just testing the solr index search - i just the have the indexes.
For 14GB index the RAM cache gets filled with 14 GB around
For 28GB index the RAM cache gets filled with 28GB around
The Document cache size is 200MB max and initial 20MB.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001011.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

1. So we have 24.5GB assigned to jvm which is half of the total memory, which
is 48GB RAM.(If that's what you meant, and if i am getting that right ?)
2. Size of *.fdt and *fdx is around 300m and 50m respectively.So that's
definitely less that 5%.
Do you see a problem there ?

Is there a way that we can force or tune in such a way that the response
time remains constant or doesn't degrade a lot(i.e. almost doubling) when
the index size is doubled ?
Or we cannot do anything about it ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001034.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query facet count and its matching documents

2012-08-13 Thread Chris Hostetter


: We're passing multiple Arbitrary Faceting Query (facet.query) to get the
: number of matching documents (the facet count) evaluated over the search
: results in a *single* Solr query.  My use case demands the actual matching
: facet results/documents/fields also along with facet count.
: 
: My question is, is it possible to get facet query matching results along
: with facet count in a single Solr query call?

nope .. there's nothing in Solr to give you these results in a single 
query -- you could write a component to do something like this, but you'd 
have to think about how you want to deal with the fl/start/rows of these 
extra queries.

Note: one thing you can do with the psudo-fields feature of Solr 4.0-ALPHA 
is include a psuedo-field with each document in the main result 
set indicating it's score from arbitrary queries -- including tose queries 
you are using in facet.query.

Something like this works with the Solr 4.0-ALPHA example data...

http://localhost:8983/solr/select
?q=*:*
apple=name:apple
electronics=cat:electronics
facet=true
facet.query={!key=apple v=$apple}
facet.query={! key=electronics v=$electronics}
fl=id,electronics:query($electronics),apple:query($apple)






-Hoss

Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-13 Thread Mark Miller

There is a per segment faceting option - but I think just for single value
fields right now?


On Mon, Aug 13, 2012 at 2:38 PM, Fuad Efendi f...@efendi.ca wrote:

 SOLR-4.0

 I am trying to implement this; funny idea to share:

 1. http://wiki.apache.org/solr/HierarchicalFaceting
 unfortunately it does not support date ranges. However, workaround: use
 String type instead of *_tdt and define fields such as
 published_hour
 published_day
 published_week
 Š

 Of course you will need to stick with timezone; but you can add an
 index(es)
 for each timezone. And most important, string facets are much faster than
 Date Trie ranges.



 2. Our index is overs 100 millions (from social networks) and rapidly grows
 (millions a day); cache warm up takes few minutes; Near-Real-Time does not
 work with faceting.

 HoweverŠ another workaround: we can have Daily Core (optimized at
 midnight),
 plus Current Core (only today's data, optimized), plus Last Hour Core (near
 real time)

 Last Hour Data is small enough and we can use Facets with Near Real Time
 feature

 Service layer will accumulate search results from three layers, it will be
 near real time.



 Any thoughts? Thanks,




 --
 Fuad Efendi
 416-993-2060
 Tokenizer Inc., Canada
 http://www.tokenizer.ca
 http://www.linkedin.com/in/lucene






-- 
- Mark

http://www.lucidimagination.com

Indexing thousands file on solr

2012-08-13 Thread troya

HI All,

I have thousands file on some folder which i want to index using solr.
Firstly my file only 9 until 20 file, so i upload them manually into solr
using curl.

But Now, my file is thousands file, how i can index it using solr ? should i
upload them one by one ?

i've tried using curl command like bellow :

java -Durl=http://localhost:8906/solr/update/extract?literal.id=PPN
-Dtype=text/html -jar post.jar *.htm

But when i search it, only one file appear.Not all of them.


Help me to solve this 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-thousands-file-on-solr-tp4001050.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLR3.6:Field Collapsing/Grouping throws OOM

2012-08-13 Thread Tirthankar Chatterjee

Hi,
I have a beefy box with 24Gb RAM (12GB for Tomcat7 which houses SOLR3.6)  2 
Processors Intel Xeon 64 bit Server, 30TB HDD. JDK 1.7.0_03 x64 bit


Data Index Dir Size: 400GB
Metadata of files is stored in it. I have around 15 schema fields.
Total number of items:150million approx.

I have a scenario which I will try to explain to the best of my knowledge here:

Let us consider the fields I am interested in

Url: Entire path of a file in windows file system including the filename. 
ex:C:\Documents\A.txt
mtm: Modified Time of the file
Jid:JOb ID
conv_sort is string field type where the filename is stored.

I run a job where the following gets inserted

Total Items:2
Url:C:\personal\A1.txt
mtm:08/14/2012 12:00:00
Jid:1
Conv_sort:A1.txt
---
Url:C:\personal\B1.txt
mtm:08/14/2012 12:01:00
Jid:1
Conv_sort:B1.txt
In the second run only one item changes:

Url:C:\personal\A1.txt
mtm:08/15/2012 1:00:00
Jid:2
Conv_sort=A1.txt

When queried I would like to return the latest A1.txt and B1.txt back to the 
end user. I am trying to use grouping with no luck. It keeps throwing OOM… can 
someone please help… as it is critical for my project

The query I am trying is under a folder there are 1000 files and I putting a 
filtered query param too asking it to group by filenames or url and none of 
them work…what am I doing wrong here


http://172.19.108.78:8080/solr/select/?q=*:*version=2.2start=0rows=10indent=ongroup.query=filefolder:E\:\\pd_dst\\646c6907-a948-4b83-ac1d-d44742bb0307smb://pd_dst//646c6907-a948-4b83-ac1d-d44742bb0307group=truegroup.limit=1group.field=conv_sortgroup.ngroup=true


The stack trace:


SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.init(Unknown Source)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184
)
at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(
FieldCacheImpl.java:882)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java
:233)
at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl
.java:856)
at org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.setN
extReader(TermFirstPassGroupingCollector.java:74)
at org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.
java:113)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:576)

at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:364)

at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:3
76)
at org.apache.solr.search.Grouping.execute(Grouping.java:298)
at org.apache.solr.handler.component.QueryComponent.process(QueryCompone
nt.java:372)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
rchHandler.java:186)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
alve.java:225)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
alve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authentica
torBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
ava:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
ava:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
927)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
ve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
a:407)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp
11Processor.java:1001)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(
AbstractProtocol.java:585)
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoin
t.java:1770)



**Legal Disclaimer***
This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you.

Re: Solr Index linear growth - Performance degradation.

2012-08-13 Thread Lance Norskog

How many documents does each search find? What does this mean: number
of index hits: 7.2GB.

Above a threshold, the more memory you give Java, the more time it
spends collecting. You want to start with very little memory and
gradually increase memory size until the program stops using it all,
and then add maybe 10%. The operating system is better at managing
memory than Java, and it is faster to leave the full index data in the
OS disk buffers. It is counterintuitive, but is true.

Another problem you will find is 'Large Pages'. This is an OS tuning
parameter, not a Java or Solr tuning. You did not say which OS you
use, but here is an explanation for Linux:
http://lwn.net/Articles/423584/

On Mon, Aug 13, 2012 at 6:16 PM, feroz_kh feroz...@yahoo.com wrote:
1. So we have 24.5GB assigned to jvm which is half of the total memory, which
is 48GB RAM.(If that's what you meant, and if i am getting that right ?)
2. Size of *.fdt and *fdx is around 300m and 50m respectively.So that's
definitely less that 5%.
Do you see a problem there ?

Is there a way that we can force or tune in such a way that the response
time remains constant or doesn't degrade a lot(i.e. almost doubling) when
the index size is doubled ?
Or we cannot do anything about it ?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001034.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Lance Norskog
goks...@gmail.com

Re: Solr Index linear growth - Performance degradation.

It looks like reducing the jvm heap allocation did help in lowering the
response time to some extent.
Thanks for the pointer.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001056.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

index hits == total number of documents found by search query.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001063.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.