date:20150618

I had the very same issue,
because I had some document with a redundant field, and I was using the
Infix Suggester as well.

Because the Infix Suggester returns the whole field content, if you have
duplicated fields across your docs, you will se duplicate suggestions.

Do you have any intermediate API in your application ? In the case you can
modify the API using a Collection that prevent duplicates to contain and
return the suggestions.

In the case you want it directly from Solr I assume it is a bug .
I think the suggestions should return by default no duplicates ( because
the only information returned is the  field value and not the document id.
Anyway could be a nice parameter to get better suggestions ( sending the
avoidDuplicate parameter to the suggester 0.

Cheers

2015-06-18 10:48 GMT+01:00 jon kerling jonkerl...@yahoo.com.invalid:

 Hi,
 I am using solr 5.1. I'm getting duplicate suggestions when using my
 solrsuggester. I'm using AnalyzingInfixLookupFactory 
 DocumentDictionaryFactory. can i configure it to suggest me only different
 suggestions?

 here are details about my configuration:

 from schema.xml:searchComponent name=suggest
 class=solr.SuggestComponent
lst name=suggester
   str name=namemySuggester1a/str
   str name=lookupImplAnalyzingInfixLookupFactory/str
   str name=indexPathsuggester_infix_dir1a/str
   str name=allTermsRequiredtrue/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldf1/str
   str name=weightFieldweightField/str
   str name=suggestAnalyzerFieldTypetext_general/str
   str name=buildOnStartupfalse/str
 /lst

   lst name=suggester
   str name=namemySuggester2a/str
   str name=lookupImplAnalyzingInfixLookupFactory/str
   str name=indexPathsuggester_infix_dir2a/str
   str name=allTermsRequiredtrue/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldf2/str
   str name=weightFieldweightField/str
   str name=suggestAnalyzerFieldTypetext_general/str
   str name=buildOnStartupfalse/str
 /lst
   /searchComponent

   requestHandler name=/suggest class=solr.SearchHandler
 startup=lazy
 lst name=defaults
   str name=suggesttrue/str
   str name=suggest.count6/str
   str name=suggest.dictionarymySuggester1a/str
   str name=suggest.dictionarymySuggester2a/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler

 from schema.xml:field name=f1 type=string indexed=true
 stored=true required=false multiValued=false /
 field name=f2 type=string indexed=true stored=true
 required=false multiValued=false /Field name=weightField
 type=float  indexed=true  stored=true/
 ** weightField is ignored by me, I'm not adding any values in it at all.

 document example:docstr name=f12015-04-01/strstr
 name=f212:06:00/strstr name=f3BOOO/strstr
 name=f4/str name=f57.52.11.212/strstr
 name=f67.52.11.213/strstr name=OID52358424/str/doc
 After i build the suggester I'm trying to get suggests like here:
 http://localhost/solr/core1/suggest?/suggest=truesuggest.q=12

 ?xml version=1.0 encoding=UTF-8?
 response
lst name=responseHeader
   int name=status0/int
   int name=QTime62/int
/lst
lst name=suggest
   lst name=mySuggester2a
  lst name=12
 int name=numFound6/int
 arr name=suggestions
lst
   str name=term18:34:lt;bgt;12lt;/bgt;/str
   long name=weight0/long
   str name=payload /
/lst
lst
   str name=term18:34:lt;bgt;12lt;/bgt;/str
   long name=weight0/long
   str name=payload /
/lst
lst
   str name=term18:35:lt;bgt;12lt;/bgt;/str
   long name=weight0/long
   str name=payload /
/lst
lst
   str name=term18:35:lt;bgt;12lt;/bgt;/str
   long name=weight0/long
   str name=payload /
/lst
lst
   str name=term18:35:lt;bgt;12lt;/bgt;/str
   long name=weight0/long
   str name=payload /
/lst
lst
   str name=termlt;bgt;12lt;/bgt;:06:02/str
   long name=weight0/long
   str name=payload /
/lst
 /arr
  /lst
   /lst
   lst name=mySuggester1a
  lst name=12
 int name=numFound0/int
 arr name=suggestions /
  /lst
   /lst
/lst
 /response

 I would like to get this kind of suggester response ( no duplicates ):

 ?xml version=1.0 encoding=UTF-8?
 response
lst name=responseHeader
   int name=status0/int
   int name=QTime62/int
/lst
lst name=suggest
   lst name=mySuggester2a
  lst

Solr 4.10.4: Could not create instance of 'SolrInputDocument'

2015-06-18 Thread Paul Revere

Our web site is created using PaperThin's CommonSpot CMS in a ColdFusion 10 and 
Windows Server 2008 R2 environment, using Apache Solr 4.10.4 instead of CF 
Solr. We create collections through the CMS interface and they do appear in 
both the CMS and the Solr dashboard when created. However, when we try indexing 
our collections through the CMS interface, our CMS error logs show the entry 
'Could not create instance of 'SolrInputDocument'' for each member of the 
collection. This is not a fatal error, as the indexing appears to cycle through 
all members, but each member errors out with log entries for each member.  
I've Googled this error message without success. What might this error message 
indicate please??
Paul

Help: Problem in customized token filter

Hi,

I created a *token concat filter* to concat all the tokens from token
stream. It creates the concatenated token as expected.

But when I am posting the xml containing more than 30,000 documents, then
only first document is having the data of that field.

*Schema:*

*field name=titlex type=text indexed=true stored=false
 required=false omitNorms=false multiValued=false /*






 *fieldType name=text class=solr.TextField positionIncrementGap=100*
 *  analyzer type=index*
 *charFilter class=solr.HTMLStripCharFilterFactory/*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=true tokenSeparator=/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true
 expand=true/*
 *  /analyzer*
 *  analyzer type=query*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/*
 *filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_text_prime_search.txt enablePositionIncrements=true /*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *  /analyzer**/fieldType*


Please help me, The code for the filter is as follows, please take a look.

Here is the picture of what filter is doing
http://i.imgur.com/THCsYtG.png?1

The code of concat filter is :

*package com.xyz.analysis.concat;*

 *import java.io.IOException;*


 *import org.apache.lucene.analysis.TokenFilter;*

 *import org.apache.lucene.analysis.TokenStream;*

 *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*

 *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*

 *import
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*

 *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*


 *public class ConcatenateWordsFilter extends TokenFilter {*


 *  private CharTermAttribute charTermAttribute =
 addAttribute(CharTermAttribute.class);*

 *  private OffsetAttribute offsetAttribute =
 addAttribute(OffsetAttribute.class);*

 *  PositionIncrementAttribute posIncr =
 addAttribute(PositionIncrementAttribute.class);*

 *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*


 *  private StringBuilder stringBuilder = new StringBuilder();*

 *  private boolean exhausted = false;*


 *  /***

 *   * Creates a new ConcatenateWordsFilter*

 *   * @param input TokenStream that will be filtered*

 *   */*

 *  public ConcatenateWordsFilter(TokenStream input) {*

 *super(input);*

 *  }*


 *  /***

 *   * {@inheritDoc}*

 *   */*

 *  @Override*

 *  public final boolean incrementToken() throws IOException {*

 *while (!exhausted  input.incrementToken()) {*

 *  char terms[] = charTermAttribute.buffer();*

 *  int termLength = charTermAttribute.length();*

 *  if(typeAtrr.type().equals(ALPHANUM)){*

 * stringBuilder.append(terms, 0, termLength);*

 *  }*

 *  charTermAttribute.copyBuffer(terms, 0, termLength);*

 *  return true;*

 *}*


 *if (!exhausted) {*

 *  exhausted = true;*

 *  String sb = stringBuilder.toString();*

 *  System.err.println(The Data got is +sb);*

 *  int sbLength = sb.length();*

 *  //posIncr.setPositionIncrement(0);*

 *  charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);*

 *  offsetAttribute.setOffset(offsetAttribute.startOffset(),
 offsetAttribute.startOffset()+sbLength);*

 *  stringBuilder.setLength(0);*

 *  //typeAtrr.setType(CONCATENATED);*

 *  return true;*

 *}*

 *return false;*

 *  }*

 *}*



With Regards
Aman Tandon

RE: XPathentity processor on CLOB field

2015-06-18 Thread Pattabiraman, Meenakshisundaram


This is the error cause reported.  I also see that it has been reported earlier 
(http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201103.mbox/%3cd0f0d26c-3ac0-4982-9e2b-09dc96937...@535consulting.com%3E)
 but could not find a solution.


I am nesting the FieldReaderDataSource within the Entity definition that has a 
CLOB field. With this it fails only after transforming the clob. 
If I do not nest, I get this error when the FieldReaderDataSource is 
initialized thus failing even before the SQL is executed.
Either case, the error is happening at the same place. 


Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or 
null
at 
oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at 
oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:112)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:173)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:229)
at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:403)
at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:110)
at 
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1761)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1739)
at 
oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:298)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:314)
... 14 more






Pattabi Meenakshisundaram



-Original Message-
From: Pattabiraman, Meenakshisundaram 
[mailto:pattabiraman.meenakshisunda...@aig.com] 
Sent: Wednesday, June 17, 2015 9:33 PM
To: 'solr-user@lucene.apache.org'
Subject: XPathentity processor on CLOB field

My requirement is to read the XML from a CLOB field and parse it to get the 
entity.

The data config is as shown below. I am trying to map two fields 'event' and 
'policyNumber' for the entity 'catreport'.


dataSource name=mbdev driver=oracle.jdbc.driver.OracleDriver 
url=jdbc:oracle:thin:@localhost:1521:orcl user=xyz password=xyz/ 
document name=insight
entity name=input query=select * from test logLevel=debug 
datasource=mbdev transformer=ClobTransformer, script:toDate
field column=LOAD_DATE name=load_date /

field column=RESPONSE_XML name=RESPONSE_XML clob=true / 
dataSource name=xmldata type=FieldReaderDataSource/

entity name=catReport dataSource=xmldata 
dataField=input.RESPONSE_XML processor=XPathEntityProcessor  
forEach=/*:DecisionServiceRs  rootEntity=true logLevel=debug
field column=event 
xpath=/dec:DecisionServiceRs/@event/


I am getting this error


Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: null Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:70)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:321)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:278)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:53)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283)
at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224)

I see that the Clob is getting converted to String correctly and the log has 
this entry where xml is printed Exception while processing: input document : 
SolrInputDocument(fields: [RESPONSE_XML=dec:Deci


I do not know why the error is thrown at Jdbc when the Clob is converted to 
string and passed to the FieldReader and do not know how to make this work.

Thanks
Pattabi

Re: Suggester for text array

Hi Advait ,
First of all I suggest you to study Solr a little bit [1]. because your
requirements are actually really simple :

1) You can simply use more than one suggest dictionary if you care to keep
the suggestions separated ( keeping if a term is coming from the name or
from the the category)

if you don't care to keep them separated, simply use a copy field to copy
both the fields in.

2) Solr supports multi valued fields since the beginning.
I really suggest you to split by comma in your indexer application,
providing to Solr the multi values already separated.
Because they are multi values for the category field ( so it's nor analysis
responsibility to split them)

Cheers

[1]
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

2015-06-18 13:43 GMT+01:00 Advait Suhas Pandit adv...@retailwave.com:

 Hi,

 We run an ecommerce company and would like to use SOLR for our product
 database searches.

 We have products along with the categories that they belong to. In case
 the product belongs to more than 1 category, we have a comma separated
 field of categories.

 How do we do auto complete on -
 1. Multiple fields - product name, category
 2. On categories which are not first in the list in the case of the comma
 separated values
 E.g. If a product belongs to Hair Care Products, Personal Care Products
 how do we ensure that the suggester will even suggest if someone starts
 typing in Personal Care. Also, how do we show only Personal Care in the
 auto complete and not as Hair Care Products, Personal Care Products.

 Thanks,
 Advait




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Suggester for text array

2015-06-18 Thread Advait Suhas Pandit

Hi,

We run an ecommerce company and would like to use SOLR for our product database 
searches.

We have products along with the categories that they belong to. In case the 
product belongs to more than 1 category, we have a comma separated field of 
categories. 

How do we do auto complete on -
1. Multiple fields - product name, category
2. On categories which are not first in the list in the case of the comma 
separated values
E.g. If a product belongs to Hair Care Products, Personal Care Products how do 
we ensure that the suggester will even suggest if someone starts typing in 
Personal Care. Also, how do we show only Personal Care in the auto complete and 
not as Hair Care Products, Personal Care Products.

Thanks,
Advait

Re: Solr 5.2.1 on Solaris

2015-06-18 Thread Shawn Heisey

On 6/18/2015 8:05 AM, Bence Vass wrote:
 Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris
 10)? The script (solr start) doesn't work out of the box, is anyone running
 Solaris 5.x on Solaris?

I think the biggest problem on Solaris will be the options used on the
ps command.  The ps usage in the solr script appears to be formulated
for the version of ps found on Linux and other free UNIX-like operating
systems, and I know from experience that those options don't work on
Solaris.

The solr script also uses lsof, which I don't think is normally
installed on Solaris.  I'm not sure whether lsof is actually required,
or if the script will work without it.

I won't have time right away, but I will be able to look into this at
some point in the next few days and come up with a patch to make the
script work on Solaris.  If anybody else has the time and skill to do so
immediately, feel free to step in.

Thanks,
Shawn

Re: Help: Problem in customized token filter

Please help, what wrong I am doing here. please guide me.

With Regards
Aman Tandon

On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi,

 I created a *token concat filter* to concat all the tokens from token
 stream. It creates the concatenated token as expected.

 But when I am posting the xml containing more than 30,000 documents, then
 only first document is having the data of that field.

 *Schema:*

 *field name=titlex type=text indexed=true stored=false
 required=false omitNorms=false multiValued=false /*






 *fieldType name=text class=solr.TextField
 positionIncrementGap=100*
 *  analyzer type=index*
 *charFilter class=solr.HTMLStripCharFilterFactory/*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=true tokenSeparator=/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true
 expand=true/*
 *  /analyzer*
 *  analyzer type=query*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/*
 *filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_text_prime_search.txt enablePositionIncrements=true /*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *  /analyzer**/fieldType*


 Please help me, The code for the filter is as follows, please take a look.

 Here is the picture of what filter is doing
 http://i.imgur.com/THCsYtG.png?1

 The code of concat filter is :

 *package com.xyz.analysis.concat;*

 *import java.io.IOException;*


 *import org.apache.lucene.analysis.TokenFilter;*

 *import org.apache.lucene.analysis.TokenStream;*

 *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*

 *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*

 *import
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*

 *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*


 *public class ConcatenateWordsFilter extends TokenFilter {*


 *  private CharTermAttribute charTermAttribute =
 addAttribute(CharTermAttribute.class);*

 *  private OffsetAttribute offsetAttribute =
 addAttribute(OffsetAttribute.class);*

 *  PositionIncrementAttribute posIncr =
 addAttribute(PositionIncrementAttribute.class);*

 *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*


 *  private StringBuilder stringBuilder = new StringBuilder();*

 *  private boolean exhausted = false;*


 *  /***

 *   * Creates a new ConcatenateWordsFilter*

 *   * @param input TokenStream that will be filtered*

 *   */*

 *  public ConcatenateWordsFilter(TokenStream input) {*

 *super(input);*

 *  }*


 *  /***

 *   * {@inheritDoc}*

 *   */*

 *  @Override*

 *  public final boolean incrementToken() throws IOException {*

 *while (!exhausted  input.incrementToken()) {*

 *  char terms[] = charTermAttribute.buffer();*

 *  int termLength = charTermAttribute.length();*

 *  if(typeAtrr.type().equals(ALPHANUM)){*

 * stringBuilder.append(terms, 0, termLength);*

 *  }*

 *  charTermAttribute.copyBuffer(terms, 0, termLength);*

 *  return true;*

 *}*


 *if (!exhausted) {*

 *  exhausted = true;*

 *  String sb = stringBuilder.toString();*

 *  System.err.println(The Data got is +sb);*

 *  int sbLength = sb.length();*

 *  //posIncr.setPositionIncrement(0);*

 *  charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);*

 *  offsetAttribute.setOffset(offsetAttribute.startOffset(),
 offsetAttribute.startOffset()+sbLength);*

 *  stringBuilder.setLength(0);*

 *  //typeAtrr.setType(CONCATENATED);*

 *  return true;*

 *}*

 *return false;*

 *  }*

 *}*



 With Regards
 Aman Tandon

Solr 5.2.1 on Solaris

2015-06-18 Thread Bence Vass

Hello,

Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris
10)? The script (solr start) doesn't work out of the box, is anyone running
Solaris 5.x on Solaris?

- Thanks

Re: Error when submitting PDF to Solr w/text fields using SolrJ

We would like more information, but the first thing I notice is that hardly
would make any sense to use a string type for a file content.

Can you give more details about the exception ?
Have you debugged a little bit ?
How does the solr input document look before it is sent to Solr ?

Furthermore please give us all the stack trace. THe message you post is
almost useless without all the details ...

2015-06-18 15:39 GMT+01:00 Paden rumsey...@gmail.com:

 Hello,

 I'm using Solr to pull information from a Database and a file system
 simultaneously. The database houses the file path of the file in the file
 system. It pulls all of those just fine. In fact, it combines the metadata
 from the database and the metadata from the file system great. The problem
 occurs when I try to index the text. The error does not occur at the point
 when it tries to add the field text to the document. The error occurs
 when
 I try to submit that document to Solr. It gives me this error,


 org.apache.solr.common.SolrException: Exception writing document id
 /some/filepath to the index; possible analysis error.


 This is how the field is defined in schema:

 field name=text type=string indexed=true stored=false
 required=false multiValued=true /

 and this is the code I use to add it to the document:

 File file = new File(filepath);

 ContentHandler textHandler = new BodyContentHandler();

 Metadata metadata = new Metadata();

 ParseContext context = new ParseContext();

 Input Stream = new FileInputStream(file);

 try{

  autoParser.parse(input, textHandler, metadata, context);

 } catch (Exception e) {

   //prints out error message

  continue;

 }

 if(textHandler != null){

   doc.addField(text,textHandler.toString());

 }

 try{

 server.add(doc);

 } catch (Exception ex){

  //logmessage

  continue;

 }

 I think it has something to do with how the field is defined in schema but
 I
 don't know. All the files that get error messages are PDF's if that helps.
 There are .doc s in the file system but they don't error out.






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Solr Logging

2015-06-18 Thread rbkumar88

Hi,

I want to log Solr search queries/response time and Solr indexing log
separately in different set of log files.
Is there any convenient framework/way to do it.

Thanks
Bharath



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Logging-tp4212730.html
Sent from the Solr - User mailing list archive at Nabble.com.

Managed schema and schema.xml file

2015-06-18 Thread Steven White

Hi everyone,

I just upgraded from 5.1.0 to 5.2.1 and noticed a behavior change which I
consider a bug.

In my solrconfig.xml, I have the following:

   !-- schemaFactory class=ClassicIndexSchemaFactory/ --
   schemaFactory class=ManagedIndexSchemaFactory
 bool name=mutabletrue/bool
 str name=managedSchemaResourceNamemy-schema.xml/str
   /schemaFactory

In 5.1.0 (and maybe prior ver.?) when I enable managed schema per the
above, the existing schema.xml file is left as-is, a copy of it is created
as schema.xml.bak and a new one is created based on the name I gave it
my-schema.xml.

With 5.2.1 schema.xml is renamed to schema.xml.bak and my-schema.xml is
created (e.g.: schema.xml is deleted).

Is this an expected behavior or is this a bug?  I see it as a bug because
if I revert the change I made in my solrconfig.xml back to (i.e.: not
managed schema any more):

  schemaFactory class=ClassicIndexSchemaFactory/

Solr will not restart because it cannot find schema.xml

Thanks

Steve

Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden

Hello, 

I'm using Solr to pull information from a Database and a file system
simultaneously. The database houses the file path of the file in the file
system. It pulls all of those just fine. In fact, it combines the metadata
from the database and the metadata from the file system great. The problem
occurs when I try to index the text. The error does not occur at the point
when it tries to add the field text to the document. The error occurs when
I try to submit that document to Solr. It gives me this error, 


org.apache.solr.common.SolrException: Exception writing document id
/some/filepath to the index; possible analysis error. 


This is how the field is defined in schema:

field name=text type=string indexed=true stored=false
required=false multiValued=true / 

and this is the code I use to add it to the document:

File file = new File(filepath); 

ContentHandler textHandler = new BodyContentHandler(); 

Metadata metadata = new Metadata();

ParseContext context = new ParseContext();

Input Stream = new FileInputStream(file); 

try{

 autoParser.parse(input, textHandler, metadata, context); 

} catch (Exception e) { 

  //prints out error message

 continue;

} 

if(textHandler != null){

  doc.addField(text,textHandler.toString()); 

} 

try{
 
server.add(doc); 

} catch (Exception ex){ 

 //logmessage

 continue; 

} 

I think it has something to do with how the field is defined in schema but I
don't know. All the files that get error messages are PDF's if that helps.
There are .doc s in the file system but they don't error out. 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Managed schema and schema.xml file

2015-06-18 Thread Shawn Heisey

On 6/18/2015 8:10 AM, Steven White wrote:
 In 5.1.0 (and maybe prior ver.?) when I enable managed schema per the
 above, the existing schema.xml file is left as-is, a copy of it is created
 as schema.xml.bak and a new one is created based on the name I gave it
 my-schema.xml.
 
 With 5.2.1 schema.xml is renamed to schema.xml.bak and my-schema.xml is
 created (e.g.: schema.xml is deleted).
 
 Is this an expected behavior or is this a bug?  I see it as a bug because
 if I revert the change I made in my solrconfig.xml back to (i.e.: not
 managed schema any more):
 
   schemaFactory class=ClassicIndexSchemaFactory/
 
 Solr will not restart because it cannot find schema.xml

As I understand it, the managed schema system will complain if it sees a
file named schema.xml -- having both the managed schema file and
schema.xml is confusing, so if the classic file exists, it's an error.

Because of that, if you switch your config from managed to classic
schema, you must also create the schema.xml file (or rename the managed
version).  Neither factory is aware of the other, so there's no
automated way to handle that.

Thanks,
Shawn

Re: Dedupe in a SolrCloud

2015-06-18 Thread Markus Mirsberger

Thanks :) 
exactly what I was looking for...as I only need to create the signature once 
this works perfect for me:)

Cheers,
Markus 


Sent from my iPhone

 On 17.06.2015, at 20:32, Shalin Shekhar Mangar shalinman...@gmail.com wrote:
 
 Comments inline:
 
 On Wed, Jun 17, 2015 at 3:18 PM, Markus.Mirsberger
 markus.mirsber...@gmx.de wrote:
 Hi,
 
 I am trying to use the dedupe feature to detect and mark near duplicate
 content in my collections.
 I dont want to prevent duplicate content. I woud like to detect it and keep
 it for further processing. Thats why Im using an extra field and not the
 documents unique field.
 
 Here is how I added it to the solrConfig.xml :
 
 requestHandler name=/update class=solr.UpdateRequestHandler
   lst name=defaults
 str name=update.chainfill_signature/str
   /lst
 /requestHandler
 
 updateRequestProcessorChain name=fill_signature
 processor=signature
processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain
 
 updateProcessor class=solr.processor.SignatureUpdateProcessorFactory
 name=signature
 bool name=enabledtrue/bool
 str name=signatureFieldsignature/str
 bool name=overwriteDupesfalse/bool
 str name=fieldscontent/str
 str
 name=signatureClasssolr.processor.TextProfileSignature/str
 str name=quantRate.2/str
 str name=minTokenLen3/str
 /updateProcessor
 
 When I initially add the documents to the cloud everything works as expected
 . the documents are added and the signature will be created and
 added.perfect:)
 The problem occours when I want to update an exisiting document. In that
 case the update.chain=fill_signature parameter will of course be set too and
 I get a bad request error.
 
 I found this solr issue: https://issues.apache.org/jira/browse/SOLR-3473
 
 Is it that problem I am running into?
 
 You haven't pasted the complete error response so I am guessing a bit
 here. It is possible that you are running into the same problem i.e.
 the signature is being calculated again and the signature field not
 multi-valued, causes an error.
 
 Is it somehow possible to add parameters or set a specific update Handler
 when Im adding documents to the cloud using solrJ?
 
 Yes, any custom parameter can be added to a SolrJ request. There is a
 setParam(String param, String value) method available in
 AbstractUpdateRequest which can be used to set a custom update.chain
 for each SolrJ request.
 
 In that case I could ether set the update.chain manually and remove it from
 the request handler or write a second request Handler which I only use if I
 want set the signature field.
 I know I can do that manually when Im using eg curl but is it also possible
 with SolrJ? :)
 
 
 Thanks,
 Markus
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden

USING Solr 5.1.0

This is the schema file

?xml version=1.0 encoding=UTF-8 ?


schema name=example version=1.5
  
   field name=_version_ type=long indexed=true stored=true/
 
   field name=_root_ type=string indexed=true stored=false/

   field name=id type=string indexed=true stored=true
required=false multiValued=false /
   field name=filepath type=string indexed=true stored =true
required=false multiValued=false /  
   field name=title type=string indexed=true stored =true
required=false multiValued=false /  
   field name=author type=string indexed=true stored =true
required=false multiValued=false /  
   field name=text type=string indexed=true stored =false
required=false multiValued=true /  
   field name=key type=string indexed=true stored =false
required=false multiValued=false / 

 
   
   dynamicField name=*_name  type=text_general   multiValued=false
indexed=true  stored=true /

   dynamicField name=*_i  type=intindexed=true  stored=true/
   dynamicField name=*_is type=intindexed=true  stored=true 
multiValued=true/
   dynamicField name=*_s  type=string  indexed=true  stored=true /
   dynamicField name=*_ss type=string  indexed=true  stored=true
multiValued=true/
   dynamicField name=*_l  type=long   indexed=true  stored=true/
   dynamicField name=*_ls type=long   indexed=true  stored=true 
multiValued=true/
   dynamicField name=*_t  type=text_generalindexed=true 
stored=true/
   dynamicField name=*_txt type=text_general   indexed=true 
stored=true multiValued=true/
   dynamicField name=*_en  type=text_enindexed=true 
stored=true multiValued=true/
   dynamicField name=*_b  type=boolean indexed=true stored=true/
   dynamicField name=*_bs type=boolean indexed=true stored=true 
multiValued=true/
   dynamicField name=*_f  type=float  indexed=true  stored=true/
   dynamicField name=*_fs type=float  indexed=true  stored=true 
multiValued=true/
   dynamicField name=*_d  type=double indexed=true  stored=true/
   dynamicField name=*_ds type=double indexed=true  stored=true 
multiValued=true/


   dynamicField name=*_coordinate  type=tdouble indexed=true 
stored=false /

   dynamicField name=*_dt  type=dateindexed=true  stored=true/
   dynamicField name=*_dts type=dateindexed=true  stored=true
multiValued=true/
   dynamicField name=*_p  type=location indexed=true stored=true/

 
   dynamicField name=*_ti type=tintindexed=true  stored=true/
   dynamicField name=*_tl type=tlong   indexed=true  stored=true/
   dynamicField name=*_tf type=tfloat  indexed=true  stored=true/
   dynamicField name=*_td type=tdouble indexed=true  stored=true/
   dynamicField name=*_tdt type=tdate  indexed=true  stored=true/

   dynamicField name=*_c   type=currency indexed=true 
stored=true/

   dynamicField name=ignored_* type=ignored multiValued=true/
   dynamicField name=attr_* type=text_general indexed=true
stored=true multiValued=true/

   dynamicField name=random_* type=random /



 uniqueKeyfilepath/uniqueKey


fieldType name=string class=solr.StrField sortMissingLast=true /


fieldType name=boolean class=solr.BoolField
sortMissingLast=true/

fieldType name=int class=solr.TrieIntField precisionStep=0
positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField precisionStep=0
positionIncrementGap=0/
fieldType name=long class=solr.TrieLongField precisionStep=0
positionIncrementGap=0/
fieldType name=double class=solr.TrieDoubleField precisionStep=0
positionIncrementGap=0/

fieldType name=tint class=solr.TrieIntField precisionStep=8
positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
positionIncrementGap=0/
fieldType name=tlong class=solr.TrieLongField precisionStep=8
positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
positionIncrementGap=0/

 
fieldType name=date class=solr.TrieDateField precisionStep=0
positionIncrementGap=0/

  
fieldType name=tdate class=solr.TrieDateField precisionStep=6
positionIncrementGap=0/


fieldType name=binary class=solr.BinaryField/

 
fieldType name=random class=solr.RandomSortField indexed=true /


fieldType name=text_ws class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /

filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

Re: Collections API and adding new boxes

See particularly the ADDREPLICA command and the
node parameter. You might not even need the node
parameter since when you add a replica Solr does its
best to put the new replica on an underutilized node.

Best,
Erick

On Thu, Jun 18, 2015 at 2:58 PM, Shawn Heisey apa...@elyograg.org wrote:
 On 6/18/2015 3:23 PM, Jim.Musil wrote:
 Let's say I have a zookeeper ensemble with several Solr nodes connected to 
 it. I've created a collection successfully and all is well.

 What happens when I want to add another solr node?

 I've tried spinning one up and connecting it to zookeeper, but the new node 
 doesn't join the collection.  What's the expected next step?

 This is Solr 5.1.

 The new node will be part of the cloud as soon as it starts, but until
 you take action with the Collections API, it will not have any indexes
 on it.  SolrCloud does not automatically create replicas except in a
 very specific set of circumstances that I do not think are very common.

 You'll need to either create a new collection or take steps to modify
 your current collection(s) so that one or more shard replicas are
 located on the new node.

 https://cwiki.apache.org/confluence/display/solr/Collections+API

 Thanks,
 Shawn

Re: Error when submitting PDF to Solr w/text fields using SolrJ

The stack trace is what gets returned to the client, right? It's often
much more informative to see the Solr log output, the error message
is often much more helpful there. By the time the exception bubbles
up through the various layers vital information is sometimes not returned
to the client in the error message.

One precaution I would take since you've changed the schema is to
_completely_ remove the index.
1 shut down Solr
2 rm -rf coreX/data
3 restart Solr.
4 try it again.

Lucene doesn't really care at all whether a field gets indexed one way in
one document and another way in the next document and occasionally
having fields indexed different ways (string and text) in different documents
at the same time confuses things.

Best,
Erick

On Thu, Jun 18, 2015 at 10:31 AM, Paden rumsey...@gmail.com wrote:
Just rolling out a little bit more information as it is coming. I changed the
field type in the schema to text_general and that didn't change a thing.

Another thing is that it's consistently submitting/not submitting the same
documents. I will run over it one time and it won't index a set of
documents. When I clear the index and run the program again it
submits/doesn't submit the same documents.

And it will index certain PDF's it just won't index others. Which is weird
because I printed the strings that are submitted to Solr and the ones that
get submitted are really similar to the ones that aren't submitted.

I can't post the actual strings for sensitivity reasons.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212757.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.10.4: Could not create instance of 'SolrInputDocument'

No clue whatsoever, you haven't provided near enough details. I rather
doubt that many people
on this list really understand the interactions of that technology
stack, I certainly don't.

I'd ask on the ColdFusion list, as they're (apparently) the ones
who've integrated a Solr
connector of sorts. What evidence do you have that using a stock Solr
is even possible? For
all I know, the Solr provided with CF has some kind of customizations
(maybe a plugin?) that is
required.

Best,
Erick

On Thu, Jun 18, 2015 at 5:22 AM, Paul Revere pere...@mail.iad.gov wrote:
 Our web site is created using PaperThin's CommonSpot CMS in a ColdFusion 10 
 and Windows Server 2008 R2 environment, using Apache Solr 4.10.4 instead of 
 CF Solr. We create collections through the CMS interface and they do appear 
 in both the CMS and the Solr dashboard when created. However, when we try 
 indexing our collections through the CMS interface, our CMS error logs show 
 the entry 'Could not create instance of 'SolrInputDocument'' for each member 
 of the collection. This is not a fatal error, as the indexing appears to 
 cycle through all members, but each member errors out with log entries for 
 each member.  I've Googled this error message without success. What might 
 this error message indicate please??
 Paul

Re: Help: Problem in customized token filter

Hi Steve,


  you never set exhausted to false, and when the filter got reused, *it
 incorrectly carried state from the previous document.*


Thanks for replying, but I am not able to understand this.

With Regards
Aman Tandon

On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe sar...@gmail.com wrote:

 Hi Aman,

 The admin UI screenshot you linked to is from an older version of Solr -
 what version are you using?

 Lots of extraneous angle brackets and asterisks got into your email and
 made for a bunch of cleanup work before I could read or edit it.  In the
 future, please put your code somewhere people can easily read it and
 copy/paste it into an editor: into a github gist or on a paste service, etc.

 Looks to me like your use of “exhausted” is unnecessary, and is likely the
 cause of the problem you saw (only one document getting processed): you
 never set exhausted to false, and when the filter got reused, it
 incorrectly carried state from the previous document.

 Here’s a simpler version that’s hopefully more correct and more efficient
 (2 fewer copies from the StringBuilder to the final token).  Note: I didn’t
 test it:

 https://gist.github.com/sarowe/9b9a52b683869ced3a17

 Steve
 www.lucidworks.com

  On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 
  Please help, what wrong I am doing here. please guide me.
 
  With Regards
  Aman Tandon
 
  On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
  Hi,
 
  I created a *token concat filter* to concat all the tokens from token
  stream. It creates the concatenated token as expected.
 
  But when I am posting the xml containing more than 30,000 documents,
 then
  only first document is having the data of that field.
 
  *Schema:*
 
  *field name=titlex type=text indexed=true stored=false
  required=false omitNorms=false multiValued=false /*
 
 
 
 
 
 
  *fieldType name=text class=solr.TextField
  positionIncrementGap=100*
  *  analyzer type=index*
  *charFilter class=solr.HTMLStripCharFilterFactory/*
  *tokenizer class=solr.StandardTokenizerFactory/*
  *filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/*
  *filter class=solr.LowerCaseFilterFactory/*
  *filter class=solr.ShingleFilterFactory maxShingleSize=3
  outputUnigrams=true tokenSeparator=/*
  *filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/*
  *filter
  class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
  *filter class=solr.SynonymFilterFactory
  synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true
  expand=true/*
  *  /analyzer*
  *  analyzer type=query*
  *tokenizer class=solr.StandardTokenizerFactory/*
  *filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/*
  *filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords_text_prime_search.txt
 enablePositionIncrements=true /*
  *filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/*
  *filter class=solr.LowerCaseFilterFactory/*
  *filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/*
  *filter
  class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
  *  /analyzer**/fieldType*
 
 
  Please help me, The code for the filter is as follows, please take a
 look.
 
  Here is the picture of what filter is doing
  http://i.imgur.com/THCsYtG.png?1
 
  The code of concat filter is :
 
  *package com.xyz.analysis.concat;*
 
  *import java.io.IOException;*
 
 
  *import org.apache.lucene.analysis.TokenFilter;*
 
  *import org.apache.lucene.analysis.TokenStream;*
 
  *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
 
  *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
 
  *import
 
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
 
  *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
 
 
  *public class ConcatenateWordsFilter extends TokenFilter {*
 
 
  *  private CharTermAttribute charTermAttribute =
  addAttribute(CharTermAttribute.class);*
 
  *  private OffsetAttribute offsetAttribute =
  addAttribute(OffsetAttribute.class);*
 
  *  PositionIncrementAttribute posIncr =
  addAttribute(PositionIncrementAttribute.class);*
 
  *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
 
 
  *  private StringBuilder stringBuilder = new StringBuilder();*
 
  *  private boolean exhausted = false;*
 
 
  *  /***
 
  *   * Creates a new ConcatenateWordsFilter*
 
  *   * @param input TokenStream that will be filtered*
 
  *   */*
 
  *  public ConcatenateWordsFilter(TokenStream input) {*
 
  *super(input);*
 
  *

Re: Help: Problem in customized token filter

Aman,

My version won’t produce anything at all, since incrementToken() always returns 
false…

I updated the gist (at the same URL) to fix the problem by returning true from 
incrementToken() once and then false until reset() is called.  It also handles 
the case when the concatenated token is zero length by not emitting a token.

Steve
www.lucidworks.com

 On Jun 19, 2015, at 12:55 AM, Steve Rowe sar...@gmail.com wrote:
 
 Hi Aman,
 
 The admin UI screenshot you linked to is from an older version of Solr - what 
 version are you using?
 
 Lots of extraneous angle brackets and asterisks got into your email and made 
 for a bunch of cleanup work before I could read or edit it.  In the future, 
 please put your code somewhere people can easily read it and copy/paste it 
 into an editor: into a github gist or on a paste service, etc.
 
 Looks to me like your use of “exhausted” is unnecessary, and is likely the 
 cause of the problem you saw (only one document getting processed): you never 
 set exhausted to false, and when the filter got reused, it incorrectly 
 carried state from the previous document.
 
 Here’s a simpler version that’s hopefully more correct and more efficient (2 
 fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
 it:
 
https://gist.github.com/sarowe/9b9a52b683869ced3a17
 
 Steve
 www.lucidworks.com
 
 On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote:
 
 Please help, what wrong I am doing here. please guide me.
 
 With Regards
 Aman Tandon
 
 On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 Hi,
 
 I created a *token concat filter* to concat all the tokens from token
 stream. It creates the concatenated token as expected.
 
 But when I am posting the xml containing more than 30,000 documents, then
 only first document is having the data of that field.
 
 *Schema:*
 
 *field name=titlex type=text indexed=true stored=false
 required=false omitNorms=false multiValued=false /*
 
 
 
 
 
 
 *fieldType name=text class=solr.TextField
 positionIncrementGap=100*
 *  analyzer type=index*
 *charFilter class=solr.HTMLStripCharFilterFactory/*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=true tokenSeparator=/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true
 expand=true/*
 *  /analyzer*
 *  analyzer type=query*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/*
 *filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_text_prime_search.txt enablePositionIncrements=true /*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *  /analyzer**/fieldType*
 
 
 Please help me, The code for the filter is as follows, please take a look.
 
 Here is the picture of what filter is doing
 http://i.imgur.com/THCsYtG.png?1
 
 The code of concat filter is :
 
 *package com.xyz.analysis.concat;*
 
 *import java.io.IOException;*
 
 
 *import org.apache.lucene.analysis.TokenFilter;*
 
 *import org.apache.lucene.analysis.TokenStream;*
 
 *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
 
 *import
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
 
 
 *public class ConcatenateWordsFilter extends TokenFilter {*
 
 
 *  private CharTermAttribute charTermAttribute =
 addAttribute(CharTermAttribute.class);*
 
 *  private OffsetAttribute offsetAttribute =
 addAttribute(OffsetAttribute.class);*
 
 *  PositionIncrementAttribute posIncr =
 addAttribute(PositionIncrementAttribute.class);*
 
 *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
 
 
 *  private StringBuilder stringBuilder = new StringBuilder();*
 
 *  private boolean exhausted = false;*
 
 
 *  /***
 
 *   * Creates a new ConcatenateWordsFilter*
 
 *   * @param input TokenStream that will be filtered*
 
 *   */*
 
 *  public

Auto-suggest in Solr

2015-06-18 Thread Zheng Lin Edwin Yeo

I'm implementing an auto-suggest feature in Solr, and I'll like to achieve
the follwing:

For example, if the user enters mp3, Solr might suggest mp3 player,
mp3 nano and mp3 music.
When the user enters mp3 p, the suggestion should narrow down to mp3
player.

Currently, when I type mp3 p, the suggester is returning words that
starts with the letter p only, and I'm getting results like plan,
production, etc, and it does not take the mp3 token into consideration.

I'm using Solr 5.1 and below is my configuration:

In solrconfig.xml:

searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester

 str name=lookupImplFreeTextLookupFactory/str
 str name=indexPathsuggester_freetext_dir/str

str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldSuggestion/str
str name=weightFieldProject/str
str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
int name=ngrams5/int
str name=buildOnStartupfalse/str
str name=buildOnCommitfalse/str
  /lst
/searchComponent


In schema.xml

fieldType name=suggestType class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^a-zA-Z0-9] replacement=  /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=6 outputUnigrams=false/
/analyzer
analyzer type=query
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^a-zA-Z0-9] replacement=  /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2
maxShingleSize=6 outputUnigrams=true/
/analyzer
/fieldType


Is there anything that I configured wrongly?


Regards,
Edwin

Re: Help: Problem in customized token filter

Hi Aman,

The admin UI screenshot you linked to is from an older version of Solr - what 
version are you using?

Lots of extraneous angle brackets and asterisks got into your email and made 
for a bunch of cleanup work before I could read or edit it.  In the future, 
please put your code somewhere people can easily read it and copy/paste it into 
an editor: into a github gist or on a paste service, etc.

Looks to me like your use of “exhausted” is unnecessary, and is likely the 
cause of the problem you saw (only one document getting processed): you never 
set exhausted to false, and when the filter got reused, it incorrectly carried 
state from the previous document.

Here’s a simpler version that’s hopefully more correct and more efficient (2 
fewer copies from the StringBuilder to the final token).  Note: I didn’t test 
it:

https://gist.github.com/sarowe/9b9a52b683869ced3a17

Steve
www.lucidworks.com

 On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com wrote:
 
 Please help, what wrong I am doing here. please guide me.
 
 With Regards
 Aman Tandon
 
 On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 Hi,
 
 I created a *token concat filter* to concat all the tokens from token
 stream. It creates the concatenated token as expected.
 
 But when I am posting the xml containing more than 30,000 documents, then
 only first document is having the data of that field.
 
 *Schema:*
 
 *field name=titlex type=text indexed=true stored=false
 required=false omitNorms=false multiValued=false /*
 
 
 
 
 
 
 *fieldType name=text class=solr.TextField
 positionIncrementGap=100*
 *  analyzer type=index*
 *charFilter class=solr.HTMLStripCharFilterFactory/*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=true tokenSeparator=/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true
 expand=true/*
 *  /analyzer*
 *  analyzer type=query*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/*
 *filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_text_prime_search.txt enablePositionIncrements=true /*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *  /analyzer**/fieldType*
 
 
 Please help me, The code for the filter is as follows, please take a look.
 
 Here is the picture of what filter is doing
 http://i.imgur.com/THCsYtG.png?1
 
 The code of concat filter is :
 
 *package com.xyz.analysis.concat;*
 
 *import java.io.IOException;*
 
 
 *import org.apache.lucene.analysis.TokenFilter;*
 
 *import org.apache.lucene.analysis.TokenStream;*
 
 *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
 
 *import
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
 
 
 *public class ConcatenateWordsFilter extends TokenFilter {*
 
 
 *  private CharTermAttribute charTermAttribute =
 addAttribute(CharTermAttribute.class);*
 
 *  private OffsetAttribute offsetAttribute =
 addAttribute(OffsetAttribute.class);*
 
 *  PositionIncrementAttribute posIncr =
 addAttribute(PositionIncrementAttribute.class);*
 
 *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
 
 
 *  private StringBuilder stringBuilder = new StringBuilder();*
 
 *  private boolean exhausted = false;*
 
 
 *  /***
 
 *   * Creates a new ConcatenateWordsFilter*
 
 *   * @param input TokenStream that will be filtered*
 
 *   */*
 
 *  public ConcatenateWordsFilter(TokenStream input) {*
 
 *super(input);*
 
 *  }*
 
 
 *  /***
 
 *   * {@inheritDoc}*
 
 *   */*
 
 *  @Override*
 
 *  public final boolean incrementToken() throws IOException {*
 
 *while (!exhausted  input.incrementToken()) {*
 
 *  char terms[] = charTermAttribute.buffer();*
 
 *  int termLength = charTermAttribute.length();*
 
 *  if(typeAtrr.type().equals(ALPHANUM)){*
 
 * stringBuilder.append(terms, 0,

How to append new data to index i solr?

2015-06-18 Thread ??????

Hello,
 I'm a solr user with some question. I want to append new data to the 
existing index. Does Solr support to append new data to index?
 Thanks for any reply.
Best wishes.
Jason

Re: Help: Problem in customized token filter

Yes I just saw.

With Regards
Aman Tandon

On Fri, Jun 19, 2015 at 10:39 AM, Steve Rowe sar...@gmail.com wrote:

 Aman,

 My version won’t produce anything at all, since incrementToken() always
 returns false…

 I updated the gist (at the same URL) to fix the problem by returning true
 from incrementToken() once and then false until reset() is called.  It also
 handles the case when the concatenated token is zero length by not emitting
 a token.

 Steve
 www.lucidworks.com

  On Jun 19, 2015, at 12:55 AM, Steve Rowe sar...@gmail.com wrote:
 
  Hi Aman,
 
  The admin UI screenshot you linked to is from an older version of Solr -
 what version are you using?
 
  Lots of extraneous angle brackets and asterisks got into your email and
 made for a bunch of cleanup work before I could read or edit it.  In the
 future, please put your code somewhere people can easily read it and
 copy/paste it into an editor: into a github gist or on a paste service, etc.
 
  Looks to me like your use of “exhausted” is unnecessary, and is likely
 the cause of the problem you saw (only one document getting processed): you
 never set exhausted to false, and when the filter got reused, it
 incorrectly carried state from the previous document.
 
  Here’s a simpler version that’s hopefully more correct and more
 efficient (2 fewer copies from the StringBuilder to the final token).
 Note: I didn’t test it:
 
 https://gist.github.com/sarowe/9b9a52b683869ced3a17
 
  Steve
  www.lucidworks.com
 
  On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 
  Please help, what wrong I am doing here. please guide me.
 
  With Regards
  Aman Tandon
 
  On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
  Hi,
 
  I created a *token concat filter* to concat all the tokens from token
  stream. It creates the concatenated token as expected.
 
  But when I am posting the xml containing more than 30,000 documents,
 then
  only first document is having the data of that field.
 
  *Schema:*
 
  *field name=titlex type=text indexed=true stored=false
  required=false omitNorms=false multiValued=false /*
 
 
 
 
 
 
  *fieldType name=text class=solr.TextField
  positionIncrementGap=100*
  *  analyzer type=index*
  *charFilter class=solr.HTMLStripCharFilterFactory/*
  *tokenizer class=solr.StandardTokenizerFactory/*
  *filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/*
  *filter class=solr.LowerCaseFilterFactory/*
  *filter class=solr.ShingleFilterFactory maxShingleSize=3
  outputUnigrams=true tokenSeparator=/*
  *filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/*
  *filter
  class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
  *filter class=solr.SynonymFilterFactory
  synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true
  expand=true/*
  *  /analyzer*
  *  analyzer type=query*
  *tokenizer class=solr.StandardTokenizerFactory/*
  *filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/*
  *filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords_text_prime_search.txt
 enablePositionIncrements=true /*
  *filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/*
  *filter class=solr.LowerCaseFilterFactory/*
  *filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/*
  *filter
  class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
  *  /analyzer**/fieldType*
 
 
  Please help me, The code for the filter is as follows, please take a
 look.
 
  Here is the picture of what filter is doing
  http://i.imgur.com/THCsYtG.png?1
 
  The code of concat filter is :
 
  *package com.xyz.analysis.concat;*
 
  *import java.io.IOException;*
 
 
  *import org.apache.lucene.analysis.TokenFilter;*
 
  *import org.apache.lucene.analysis.TokenStream;*
 
  *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
 
  *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
 
  *import
 
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
 
  *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
 
 
  *public class ConcatenateWordsFilter extends TokenFilter {*
 
 
  *  private CharTermAttribute charTermAttribute =
  addAttribute(CharTermAttribute.class);*
 
  *  private OffsetAttribute offsetAttribute =
  addAttribute(OffsetAttribute.class);*
 
  *  PositionIncrementAttribute posIncr =
  addAttribute(PositionIncrementAttribute.class);*
 
  *  TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);*
 
 
  *  private StringBuilder stringBuilder = new StringBuilder();*

Re: Help: Problem in customized token filter

Aman,

Solr uses the same Token filter instances over and over, calling reset() before 
sending each document through.  Your code sets “exhausted to true and then 
never sets it back to false, so the next time the token filter instance is 
used, its “exhausted value is still true, so no input stream tokens are 
concatenated ever again.

Does that make sense?

Steve
www.lucidworks.com

 On Jun 19, 2015, at 1:10 AM, Aman Tandon amantandon...@gmail.com wrote:
 
 Hi Steve,
 
 
 you never set exhausted to false, and when the filter got reused, *it
 incorrectly carried state from the previous document.*
 
 
 Thanks for replying, but I am not able to understand this.
 
 With Regards
 Aman Tandon
 
 On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe sar...@gmail.com wrote:
 
 Hi Aman,
 
 The admin UI screenshot you linked to is from an older version of Solr -
 what version are you using?
 
 Lots of extraneous angle brackets and asterisks got into your email and
 made for a bunch of cleanup work before I could read or edit it.  In the
 future, please put your code somewhere people can easily read it and
 copy/paste it into an editor: into a github gist or on a paste service, etc.
 
 Looks to me like your use of “exhausted” is unnecessary, and is likely the
 cause of the problem you saw (only one document getting processed): you
 never set exhausted to false, and when the filter got reused, it
 incorrectly carried state from the previous document.
 
 Here’s a simpler version that’s hopefully more correct and more efficient
 (2 fewer copies from the StringBuilder to the final token).  Note: I didn’t
 test it:
 
https://gist.github.com/sarowe/9b9a52b683869ced3a17
 
 Steve
 www.lucidworks.com
 
 On Jun 18, 2015, at 11:33 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 Please help, what wrong I am doing here. please guide me.
 
 With Regards
 Aman Tandon
 
 On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 Hi,
 
 I created a *token concat filter* to concat all the tokens from token
 stream. It creates the concatenated token as expected.
 
 But when I am posting the xml containing more than 30,000 documents,
 then
 only first document is having the data of that field.
 
 *Schema:*
 
 *field name=titlex type=text indexed=true stored=false
 required=false omitNorms=false multiValued=false /*
 
 
 
 
 
 
 *fieldType name=text class=solr.TextField
 positionIncrementGap=100*
 *  analyzer type=index*
 *charFilter class=solr.HTMLStripCharFilterFactory/*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.ShingleFilterFactory maxShingleSize=3
 outputUnigrams=true tokenSeparator=/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=stemmed_synonyms_text_prime_ex_index.txt ignoreCase=true
 expand=true/*
 *  /analyzer*
 *  analyzer type=query*
 *tokenizer class=solr.StandardTokenizerFactory/*
 *filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/*
 *filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_text_prime_search.txt
 enablePositionIncrements=true /*
 *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/*
 *filter class=solr.LowerCaseFilterFactory/*
 *filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/*
 *filter
 class=com.xyz.analysis.concat.ConcatenateWordsFilterFactory/*
 *  /analyzer**/fieldType*
 
 
 Please help me, The code for the filter is as follows, please take a
 look.
 
 Here is the picture of what filter is doing
 http://i.imgur.com/THCsYtG.png?1
 
 The code of concat filter is :
 
 *package com.xyz.analysis.concat;*
 
 *import java.io.IOException;*
 
 
 *import org.apache.lucene.analysis.TokenFilter;*
 
 *import org.apache.lucene.analysis.TokenStream;*
 
 *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;*
 
 *import
 
 org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;*
 
 *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;*
 
 
 *public class ConcatenateWordsFilter extends TokenFilter {*
 
 
 *  private CharTermAttribute charTermAttribute =
 addAttribute(CharTermAttribute.class);*
 
 *  private OffsetAttribute offsetAttribute =
 addAttribute(OffsetAttribute.class);*
 
 *  PositionIncrementAttribute posIncr =

Re: How to do a Data sharding for data in a database table

You've repeated your original statement. Shawn's
observation is that 10M docs is a very small corpus
by Solr standards. You either have very demanding
document/search combinations or you have a poorly
tuned Solr installation.

On reasonable hardware I expect 25-50M documents to have
sub-second response time.

So what we're trying to do is be sure this isn't
an XY problem, from Hossman's apache page:

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue. Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

So again, how would you characterize your documents? How many
fields? What do queries look like? How much physical memory on the
machine? How much memory have you allocated to the JVM?

You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Thu, Jun 18, 2015 at 3:23 PM, wwang525 wwang...@gmail.com wrote:
The query without load is still under 1 second. But under load, response time
can be much longer due to the queued up query.

We would like to shard the data to something like 6 M / shard, which will
still give a under 1 second response time under load.

What are some best practice to shard the data? for example, we could shard
the data by date range, but that is pretty dynamic, and we could shard data
by some other properties, but if the data is not evenly distributed, you may
not be able shard it anymore.

--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525

Hi,

We probably would like to shard the data since the response time for
demanding queries at  10M records is getting  1 second in a single request
scenario.

I have not done any data sharding before. What are some recommended way to
do data sharding. For example, may be by a criteria with a list of specific
values?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MappingCharFilterFactory and start and end offsets

Hi Dmitry,

It’s weird that start and end offsets are the same - what do you see for the 
start/end of ‘$’, i.e. if you take out MCFF?  (I think it should be start:5, 
end:6.)

As far as offsets “respecting the remapped token”, are you asking for offsets 
to be set as if ‘dollarsign' were part of the original text?  If so, there is 
no setting that would do that - the intent is for offsets to map to the 
*original* text.  You can work around this by performing the substitution prior 
to Solr analysis, e.g. in an update processor like RegexReplaceProcessorFactory.

Steve
www.lucidworks.com

 On Jun 18, 2015, at 3:07 AM, Dmitry Kan solrexp...@gmail.com wrote:
 
 Hi,
 
 It looks like MappingCharFilter sets start and end offset to the same
 value. Can this be affected on by some setting?
 
 For a string: test $ test2 and mapping $ =  dollarsign  (we insert
 extra space to separate $ into its own token)
 
 we get: http://snag.gy/eJT1H.jpg
 
 Ideally, we would like to have start and end offset respecting the remapped
 token. Can this be achieved with settings?
 
 -- 
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info

[ANN] Solr in Action book release (Solr 4.7)

2015-06-18 Thread Roy Silva




Sent from my iPhone

Re: Error when submitting PDF to Solr w/text fields using SolrJ

2015-06-18 Thread Paden

Just rolling out a little bit more information as it is coming. I changed the
field type in the schema to text_general and that didn't change a thing. 

Another thing is that it's consistently submitting/not submitting the same
documents. I will run over it one time and it won't index a set of
documents. When I clear the index and run the program again it
submits/doesn't submit the same documents. 

And it will index certain PDF's it just won't index others. Which is weird
because I printed the strings that are submitted to Solr and the ones that
get submitted are really similar to the ones that aren't submitted. 

I can't post the actual strings for sensitivity reasons. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212757.html
Sent from the Solr - User mailing list archive at Nabble.com.

Collections API and adding new boxes

2015-06-18 Thread Jim . Musil

Hi,

Let's say I have a zookeeper ensemble with several Solr nodes connected to it. 
I've created a collection successfully and all is well.

What happens when I want to add another solr node?

I've tried spinning one up and connecting it to zookeeper, but the new node 
doesn't join the collection.  What's the expected next step?

This is Solr 5.1.

Thanks!
Jim Musil

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525

The query without load is still under 1 second. But under load, response time
can be much longer due to the queued up query.

We would like to shard the data to something like 6 M / shard, which will
still give a under 1 second response time under load.

What are some best practice to shard the data? for example, we could shard
the data by date range, but that is pretty dynamic, and we could shard data
by some other properties, but if the data is not evenly distributed, you may
not be able shard it anymore.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to do a Data sharding for data in a database table

2015-06-18 Thread Jack Krupansky

10M doesn't sound too demanding.

How complex are your queries?

How complex is your data - like number of fields and size, like very large
documents?

Are you sure you have enough RAM to fully cache your index?

Are your queries compute-bound or I/O bound? If I/O-bound, get more RAM. If
compute-bound, sharding may help, but have to examine query complexity
first.


-- Jack Krupansky

On Thu, Jun 18, 2015 at 2:05 PM, wwang525 wwang...@gmail.com wrote:

 Hi,

 We probably would like to shard the data since the response time for
 demanding queries at  10M records is getting  1 second in a single
 request
 scenario.

 I have not done any data sharding before. What are some recommended way to
 do data sharding. For example, may be by a criteria with a list of specific
 values?





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Collections API and adding new boxes

2015-06-18 Thread Shawn Heisey

On 6/18/2015 3:23 PM, Jim.Musil wrote:
 Let's say I have a zookeeper ensemble with several Solr nodes connected to 
 it. I've created a collection successfully and all is well.

 What happens when I want to add another solr node?

 I've tried spinning one up and connecting it to zookeeper, but the new node 
 doesn't join the collection.  What's the expected next step?

 This is Solr 5.1.

The new node will be part of the cloud as soon as it starts, but until
you take action with the Collections API, it will not have any indexes
on it.  SolrCloud does not automatically create replicas except in a
very specific set of circumstances that I do not think are very common.

You'll need to either create a new collection or take steps to modify
your current collection(s) so that one or more shard replicas are
located on the new node.

https://cwiki.apache.org/confluence/display/solr/Collections+API

Thanks,
Shawn

Re: How to create concatenated token

Hi Erick,

In that issue you forwarded to me, they want to make one token from all
tokens received from token stream but in my case I want to keep the tokens
same and create and extra new token which is concat of all the tokens.


 I'd guess, is the case
 here. I mean do you really want to concatenate 50 tokens?

We are applying it on *title field* of product  so max length can be 10 I
guess and that too will be in rare case.

With Regards
Aman Tandon

On Wed, Jun 17, 2015 at 7:16 PM, Erick Erickson erickerick...@gmail.com
wrote:

 If you used the JIRA I linked, vote for it, add any improvements etc.
 Anyone can attach a patch to a JIRA, you just have to create a login.

 That said, this may be too rare a use-case to deal with. I just thought
 of shingling which I should have suggested before that will work for
 concatenating small numbers of tokens which, I'd guess, is the case
 here. I mean do you really want to concatenate 50 tokens?

 Best,
 Erick

 On Wed, Jun 17, 2015 at 12:07 AM, Aman Tandon amantandon...@gmail.com
 wrote:
  Dear Erick,
 
  e.g. Solr training
  *Porter:-*  solr  train
Position 1 2
  *Concatenated :-*   solr  train
 solrtrain
 Position 1  2
 
 
  I did implemented the filter as per my requirement. Thank you so much for
  your help and guidance. So how could I contribute it to the solr.
 
  With Regards
  Aman Tandon
 
  On Wed, Jun 17, 2015 at 10:14 AM, Aman Tandon amantandon...@gmail.com
  wrote:
 
  Hi Erick,
 
  Thank you so much, it will be helpful for me to learn how to save the
  state of token. I has no idea of how to save state of previous tokens
 due
  to this it was difficult to generate a concatenated token in the last.
 
  So is there anything should I read to learn more about it.
 
  With Regards
  Aman Tandon
 
  On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  I really question the premise, but have a look at:
  https://issues.apache.org/jira/browse/SOLR-7193
 
  Note that this is not committed and I haven't reviewed
  it so I don't have anything to say about that. And you'd
  have to implement it as a custom Filter.
 
  Best,
  Erick
 
  On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com
  wrote:
   Hi,
  
   Any guesses, how could I achieve this behaviour.
  
   With Regards
   Aman Tandon
  
   On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon 
 amantandon...@gmail.com
   wrote:
  
   e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr
  training)
  
  
   typo error
   e.g. Intent for solr training: fq=id:(234 456 545) title:(solr
  training)
  
   With Regards
   Aman Tandon
  
   On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon 
 amantandon...@gmail.com
   wrote:
  
   We has some business logic to search the user query in user
 intent
  or
   finding the exact matching products.
  
   e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr
  training)
  
   As we can see it is phrase query so it will took more time than the
   single stemmed token query. There are also 5-7 words phrase query.
 So
  we
   want to reduce the search time by implementing this feature.
  
   With Regards
   Aman Tandon
  
   On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
   Can I ask you why you need to concatenate the tokens ? Maybe we
 can
  find
   a
   better solution to concat all the tokens in one single big token .
   I find it difficult to understand the reasons behind tokenising,
  token
   filtering and then un-tokenizing again :)
   It would be great if you explain a little bit better what you
 would
  like
   to
   do !
  
  
   Cheers
  
   2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com:
  
Hi,
   
I have a requirement to create the concatenated token of all the
  tokens
created from the last item of my analyzer chain.
   
*Suppose my analyzer chain is :*
   
   
   
   
   
* tokenizer class=solr.WhitespaceTokenizerFactory /  filter
class=solr.WordDelimiterFilterFactory catenateAll=1
   splitOnNumerics=1
preserveOriginal=1/filter
  class=solr.EdgeNGramFilterFactory
minGramSize=2 maxGramSize=15 side=front /filter
class=solr.PorterStemmerFilterFactory/*
I want to create a concatenated token plugin to add at
 concatenated
   token
along with the last token.
   
e.g. Solr training
   
*Porter:-*  solr  train
  Position 1 2
   
*Concatenated :-*   solr  train
   solrtrain
   Position 1  2
   
Please help me out. How to create custom filter for this
  requirement.
   
With Regards
Aman Tandon
   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti

MappingCharFilterFactory and start and end offsets

2015-06-18 Thread Dmitry Kan

Hi,

It looks like MappingCharFilter sets start and end offset to the same
value. Can this be affected on by some setting?

For a string: test $ test2 and mapping $ =  dollarsign  (we insert
extra space to separate $ into its own token)

we get: http://snag.gy/eJT1H.jpg

Ideally, we would like to have start and end offset respecting the remapped
token. Can this be achieved with settings?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Contribute the Customized Phonetic Filter to Apache Solr

Hi,

We created the new phonetic filter, It is working great on our products,
mostly of our suppliers are Indian, it is quite helpful for us to provide
the exact result e.g.

1) rikshaw, still able to find the suppliers of rickshaw
2) telefone, still able to find the suppliers of telephone

We also analyzed our search satisfaction feedback, it improved by 13% (54%
- 67%) just after implementing the same.

And we want to contribute the same to solr, So how could I do it.

With Regards
Aman Tandon

Extended Dismax Query Parser with AND as default operator

2015-06-18 Thread Dirk Buchhorn

Hello,

I have a question to the extended dismax query parser. If the default operator 
is changed to AND (q.op=AND) then the search results seems to be incorrect. I 
will explain it on some examples. For this test I use solr v5.1 and the tika 
core from the example directory.
== Preparation ==
Add the following lines to the schema.xml file
  field name=id type=string indexed=true stored=true required=true/
  uniqueKeyid/uniqueKey
Change the field text to stored=true
Remove the multiValued attribute from the title and text field (we don't need 
multivaled fields in our test)

Add test data (use curl or fiddler)
Url:http://localhost:8983/solr/tika/update/json?commit=true
Header: Content-type: application/json
[
  {id:1, title:green, author:Jon, text:blue},
  {id:2, title:green, author:Jon Jessie, text:red},
  {id:3, title:yellow, author:Jessie, text:blue},
  {id:4, title:green, author:Jessie, text:blue},
  {id:5, title:blue, author:Jon, text:yellow},
  {id:6, title:red, author:Jon, text:green}
]

== Test ==
The following parameter are always set.
default operator is AND: q.op=AND
use the extended dismax query parser: defType=edismax
set the default query fields to title and text: qf=title text
sort: id asc

=== #1 test ===
q=red green
response:
{ numFound:2,start:0,
  docs:[
{id:2,title:green,author:Jon Jessie,text:red},
{id:6,title:red,author:Jon,text:green}]
}
parsedquery_toString: +(((text:green | title:green) (text:red | title:red))~2)

This test works as expected.

=== #2 test ===
We use a group
q=(red green)
Same response as test one.
parsedquery_toString: +(((text:green | title:green) (text:red | title:red))~2)

This test works as expected.

=== #3 test ===
q=green red author:Jessie
response:
{ numFound:1,start:0,
  docs:[{id:2,title:green,author:Jon Jessie,text:red}]
}
parsedquery_toString: +(((text:green | title:green) (text:red | title:red) 
author:jessie)~3)

This test works as expected.

=== #4 test ===
q=(green red) author:Jessie
response:
{ numFound:2,start:0,
  docs:[
{id:2,title:green,author:Jon Jessie,text:red},
{id:4,title:green,author:Jessie,text:blue}]
}
parsedquery_toString: +text:green | title:green) (text:red | title:red)) 
author:jessie)~2)

The same result as the 3th test was expected. Why no AND is used for the query 
group?

=== #5 test ===
q=(+green +red) author:Jessie
response:
{ numFound:4,start:0,
  docs:[
{id:2,title:green,author:Jon Jessie,text:red},
{id:3,title:yellow,author:Jessie,text:blue},
{id:4,title:green,author:Jessie,text:blue},
{id:6,title:red,author:Jon,text:green}]
}
parsedquery_toString: +((+(text:green | title:green) +(text:red | title:red)) 
author:jessie)

Now AND is used for the group but the author is concatenated with OR. Why?

=== #6 test ===
q=(+green +red) +author:Jessie
response:
{ numFound:3,start:0,
  docs:[
{id:2,title:green,author:Jon Jessie,text:red},
{id:3,title:yellow,author:Jessie,text:blue},
{id:4,title:green,author:Jessie,text:blue}]
}
parsedquery_toString: +((+(text:green | title:green) +(text:red | title:red)) 
+author:jessie)

Still not the expected result.

=== #7 test ===
q=+(+green +red) +author:Jessie
response:
{ numFound:1,start:0,
  docs:[{id:2,title:green,author:Jon Jessie,text:red}]
}
parsedquery_toString: +(+(+(text:green | title:green) +(text:red | title:red)) 
+author:jessie)

Now the result is ok. But if all operators must be given then q.op=AND is 
useless.

=== #8 test ===
q=green author:(Jon Jessie)
Found four results, expected are one. The query must changed to '+green 
+author:(+Jon +Jessie)' to get the expected result.

Is this a bug in the extended dismax parser or what is the reason for not 
consequently applying q.op=AND to the query expression?

Kind regards

Dirk Buchhorn

Re: Contribute the Customized Phonetic Filter to Apache Solr

2015-06-18 Thread davidphilip cherian

Hi Aman,

https://wiki.apache.org/solr/HowToContribute

HTH

On Thu, Jun 18, 2015 at 12:11 PM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi,

 We created the new phonetic filter, It is working great on our products,
 mostly of our suppliers are Indian, it is quite helpful for us to provide
 the exact result e.g.

 1) rikshaw, still able to find the suppliers of rickshaw
 2) telefone, still able to find the suppliers of telephone

 We also analyzed our search satisfaction feedback, it improved by 13% (54%
 - 67%) just after implementing the same.

 And we want to contribute the same to solr, So how could I do it.

 With Regards
 Aman Tandon

facet query is not working

2015-06-18 Thread Midas A

http://localhost:8983/solr/col/select?q=*:*sfield=geolocationpt=26.697,83.1876facet.query={!frange%20l=0%20u=50}geodist()facet.query={!frange%20l=50.001%20u=100}geodist()wt=json


I am not getting facet results .

schema:
field name=geolocation type=location indexed=true stored=true/ 
dynamicField name=*_coordinate type=tdouble indexed=true stored=
false/

Re: facet query is not working

2015-06-18 Thread Mikhail Khludnev

isn't facet=true necessary?

On Thu, Jun 18, 2015 at 12:03 PM, Midas A test.mi...@gmail.com wrote:


 http://localhost:8983/solr/col/select?q=*:*sfield=geolocationpt=26.697,83.1876facet.query={!frange%20l=0%20u=50}geodist()facet.query={!frange%20l=50.001%20u=100}geodist()wt=json


 I am not getting facet results .

 schema:
 field name=geolocation type=location indexed=true stored=true/ 
 dynamicField name=*_coordinate type=tdouble indexed=true stored=
 false/




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

Duplicate suggestions

2015-06-18 Thread jon kerling

Hi,
I am using solr 5.1. I'm getting duplicate suggestions when using my 
solrsuggester. I'm using AnalyzingInfixLookupFactory  
DocumentDictionaryFactory. can i configure it to suggest me only different 
suggestions?

here are details about my configuration:

from schema.xml:searchComponent name=suggest class=solr.SuggestComponent
   lst name=suggester
  str name=namemySuggester1a/str
  str name=lookupImplAnalyzingInfixLookupFactory/str  
  str name=indexPathsuggester_infix_dir1a/str
  str name=allTermsRequiredtrue/str
  str name=dictionaryImplDocumentDictionaryFactory/str 
  str name=fieldf1/str
  str name=weightFieldweightField/str
  str name=suggestAnalyzerFieldTypetext_general/str
  str name=buildOnStartupfalse/str
    /lst

  lst name=suggester
  str name=namemySuggester2a/str
  str name=lookupImplAnalyzingInfixLookupFactory/str  
  str name=indexPathsuggester_infix_dir2a/str
  str name=allTermsRequiredtrue/str
  str name=dictionaryImplDocumentDictionaryFactory/str 
  str name=fieldf2/str
  str name=weightFieldweightField/str
  str name=suggestAnalyzerFieldTypetext_general/str
  str name=buildOnStartupfalse/str
    /lst
  /searchComponent

  requestHandler name=/suggest class=solr.SearchHandler startup=lazy
    lst name=defaults
  str name=suggesttrue/str
  str name=suggest.count6/str
  str name=suggest.dictionarymySuggester1a/str
  str name=suggest.dictionarymySuggester2a/str
    /lst
    arr name=components
  strsuggest/str
    /arr
  /requestHandler 

from schema.xml:field name=f1 type=string indexed=true stored=true 
required=false multiValued=false /
field name=f2 type=string indexed=true stored=true required=false 
multiValued=false /Field name=weightField  type=float  indexed=true  
stored=true/
** weightField is ignored by me, I'm not adding any values in it at all.

document example:doc    str name=f12015-04-01/str    str 
name=f212:06:00/str    str name=f3BOOO/str    str name=f4/    
str name=f57.52.11.212/str    str name=f67.52.11.213/str    str 
name=OID52358424/str/doc
After i build the suggester I'm trying to get suggests like here:
http://localhost/solr/core1/suggest?/suggest=truesuggest.q=12

?xml version=1.0 encoding=UTF-8?
response
   lst name=responseHeader
  int name=status0/int
  int name=QTime62/int
   /lst
   lst name=suggest
  lst name=mySuggester2a
 lst name=12
int name=numFound6/int
arr name=suggestions
   lst
  str name=term18:34:lt;bgt;12lt;/bgt;/str
  long name=weight0/long
  str name=payload /
   /lst
   lst
  str name=term18:34:lt;bgt;12lt;/bgt;/str
  long name=weight0/long
  str name=payload /
   /lst
   lst
  str name=term18:35:lt;bgt;12lt;/bgt;/str
  long name=weight0/long
  str name=payload /
   /lst
   lst
  str name=term18:35:lt;bgt;12lt;/bgt;/str
  long name=weight0/long
  str name=payload /
   /lst
   lst
  str name=term18:35:lt;bgt;12lt;/bgt;/str
  long name=weight0/long
  str name=payload /
   /lst
   lst
  str name=termlt;bgt;12lt;/bgt;:06:02/str
  long name=weight0/long
  str name=payload /
   /lst
/arr
 /lst
  /lst
  lst name=mySuggester1a
 lst name=12
int name=numFound0/int
arr name=suggestions /
 /lst
  /lst
   /lst
/response

I would like to get this kind of suggester response ( no duplicates ):

?xml version=1.0 encoding=UTF-8?
response
   lst name=responseHeader
  int name=status0/int
  int name=QTime62/int
   /lst
   lst name=suggest
  lst name=mySuggester2a
 lst name=12
int name=numFound3/int
arr name=suggestions
   lst
  str name=term18:34:lt;bgt;12lt;/bgt;/str
  long name=weight0/long
  str name=payload /
   /lst
   lst
  str name=term18:35:lt;bgt;12lt;/bgt;/str
  long name=weight0/long
  str name=payload /
   /lst
   lst
  str name=termlt;bgt;12lt;/bgt;:06:02/str
  long name=weight0/long
  str name=payload /
   /lst
/arr
 /lst
  /lst
  lst name=mySuggester1a
 lst name=12
int name=numFound0/int
arr name=suggestions /
 /lst
  /lst
   /lst
/responseThank you.

Re: facet query is not working