Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
Tried the following config for setting the autoGeneratePhraseQueries but it
didn't seem to change anything. Tested both true and false.

fieldType name=keyword class=solr.TextField positionIncrementGap=100
autoGeneratePhraseQueries=true analyzer type=index tokenizer class=
solr.KeywordTokenizerFactory/ /analyzer analyzer type=query 
tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType

Still I do not get any matches when searching for FE 009 without quotes.

Set debugQuery to on and this is what it shows. Definitely looks like it
does this MultiPhraseQuery thing.
lst name=debug
str name=rawquerystringFE 009/str
str name=querystringFE 009/str
str name=parsedquery
(+(DisjunctionMaxQuery((number:FE))
DisjunctionMaxQuery((number:009/no_coord
/str
str name=parsedquery_toString+((number:FE) (number:009))/str
lst name=explain/
str name=QParserExtendedDismaxQParser/str

I also looked into these query-parsers, but as it may look like the
splitting on whitespace is something that is done by the dismax queryparser
before the terms are passed to any analyzers. And it is vital to me that I
can differentiate this on a per field basis.

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-29 Aleksander Akerø aleksan...@gurusoft.no

 Thanks a lot, I'll try the autoGeneratePhraseQueries property and see how
 that works.

 Regarding the reindexing tip, it's a good tip but due to the my current
 on the fly setup on the servers at work i basically have do build a
 project with maven and deploy to tomcat, wherein the index lies, and I
 therefore have to reindex each time otherwise the index would be empty.
 Also i usually add use the clean parameter when testing with DIH. So that
 shouldn't be a problem.

 *Aleksander Akerø*
 Systemkonsulent
 Mobil: 944 89 054
 E-post: aleksan...@gurusoft.no

 *Gurusoft AS*
 Telefon: 92 44 09 99
 Østre Kullerød
 www.gurusoft.no


 2014-01-29 Alexandre Rafalovitch arafa...@gmail.com

 I think the whitespace might also be the issue. The query gets parsed
 by standard component that splits it on space before passing
 individual components into the field searches.

 Try enabling autoGeneratePhraseQueries on the field (or field type)
 and reindexing. See if that makes a difference.

 Regards,
   Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Jan 29, 2014 at 9:55 PM, Aleksander Akerø
 aleksan...@gurusoft.no wrote:
  update:
 
  Guessing that this has nothing to do with the tokenizer. Tried to use
 the
  string fieldtype as well, but still the same results. So this must have
 to
  do with some other solr config.
 
  What confuses me is that when I search 1005 which is another valid
 value
  to search for, it works perfectly, but then again, this query contains
 no
  whitespace.
 
  Any ideas?
 
  *Aleksander Akerø*
  Systemkonsulent
  Mobil: 944 89 054
  E-post: aleksan...@gurusoft.no
 
  *Gurusoft AS*
  Telefon: 92 44 09 99
  Østre Kullerød
  www.gurusoft.no
 
 
  2014-01-29 Aleksander Akerø aleksan...@gurusoft.no
 
  Thanks for the quick answer, but it doesn't help if I remove the
 lowercase
  analyzer like so:
 
  *fieldType name=keyword class=solr.TextField
  positionIncrementGap=100*
  *analyzer type=index*
  *tokenizer class=solr.KeywordTokenizerFactory/*
  */analyzer*
  *analyzer type=query*
  *tokenizer class=solr.KeywordTokenizerFactory/*
  */analyzer*
  */fieldType*
 
   I still need to add quotes to the searchquery to get results. And the
  weird thing is that if I use the analyzer and put in FE 009 (again,
  without quotes) for both index and query values, it highlights the
 result
  as to show a match, but when i search using the GUI it gives me no
 results.
  The same happens when posting directly to the /select requestHandler
 via GET
 
  These is what i post using GET:
  http://mysite.com/solr/corename/select?q=number:FE%20009qf=number
  =
  this does not work
  http://mysite.com/solr/corename/select?q=number:FE%20009qf=number
  =
  this works
 
  Really starting to wonder if I am doing something terribly wrong
 somewhere.
 
  This is my requestHandler btw, pretty basic:
  !--  Default handler  --
  requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
  str name=echoParamsexplicit/str
  str name=defTypeedismax/str
  str name=q.alt*:*/str
  str name=rows10/str
  str name=fl*,score/str
  str name=qfnumber/str
  /lst
  /requestHandler
 
  *Aleksander Akerø*
  Systemkonsulent
  Mobil: 944 89 054
  E-post: 

Lucene Join

2014-01-30 Thread anand chandak

Hi,


I am trying to find whether the lucene joins (not solr join) if they are 
using any filter cache. The API that lucene uses is for joining 
joinutil.createjoinquery(), where can I find the source code for this API.



Thanks in advance

Thanks,

Anand



ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen

Hi

Earlier in used to be able to successfully run ant eclipse from 
branch_4x. With the newest code (tip of branch_4x today) I cant. ant 
eclipse hangs forever at the point showed by console output below. I 
noticed that this problem has been around for a while - not something 
that happened today. Any idea about what might be wrong? A solution? 
Help to debug?


Regards Per Steffensen

--- console when running ant eclipse -

...

resolve:
 [echo] Building solr-example-DIH...

ivy-availability-check:
 [echo] Building solr-example-DIH...

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml


resolve:

resolve:
 [echo] Building solr-core...

ivy-availability-check:
 [echo] Building solr-core...

ivy-fail:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml


resolve:

HERE IT JUST HANGS FOREVER
-


Re: Lucene Join

2014-01-30 Thread Michael McCandless
Look in lucene's join module?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 30, 2014 at 4:15 AM, anand chandak anand.chan...@oracle.com wrote:
 Hi,


 I am trying to find whether the lucene joins (not solr join) if they are
 using any filter cache. The API that lucene uses is for joining
 joinutil.createjoinquery(), where can I find the source code for this API.


 Thanks in advance

 Thanks,

 Anand



Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody
Hi All,

Can I please know about how concurrency is handled in the DIH?
What happens if multiple /dataimport requests are issued to the same
Datasource?

I'm doing some custom processing at the end of dataimport process as an
EventListener configured in the data-config.xml as below.
 document name=stanboldata
onImportEnd=com.solr.stanbol.processor.StanbolEventListener

Will each DIH request create a new EventListener object?

I'm copying some field values from my custom processor configured in the
/dataimport request handler to a static Map in my StanbolEventListener
class.
I need to figure out how to handle concurrency when data is copied to my
EvenetListener object to perform the rest of my update process.

Thanks,
Dileepa


Re: Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody
I would particularly like to know how DIH handles concurrency in JDBC
database connections during datamport..

dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/solrtest user=usr1 password=123
batchSize=1 /

Thanks,
Dileepa


On Thu, Jan 30, 2014 at 4:05 PM, Dileepa Jayakody dileepajayak...@gmail.com
 wrote:

 Hi All,

 Can I please know about how concurrency is handled in the DIH?
 What happens if multiple /dataimport requests are issued to the same
 Datasource?

 I'm doing some custom processing at the end of dataimport process as an
 EventListener configured in the data-config.xml as below.
  document name=stanboldata
 onImportEnd=com.solr.stanbol.processor.StanbolEventListener

 Will each DIH request create a new EventListener object?

 I'm copying some field values from my custom processor configured in the
 /dataimport request handler to a static Map in my StanbolEventListener
 class.
 I need to figure out how to handle concurrency when data is copied to my
 EvenetListener object to perform the rest of my update process.

 Thanks,
 Dileepa



Re: Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody
Hi All,

I triggered a /dataimport for first 100 rows from my database and while
it's running issued another import request for rows 101-200.

In my log I see below exception; It seems multiple JDBC connections cannot
be opened. Does this mean concurrency is not supported in DIH for JDBC
datasources?

Please share your thoughts on how to tackle concurrency in dataimport..

[Thread-15] ERROR org.apache.solr.handler.dataimport.JdbcDataSource  -
Ignoring Error when closing connection
java.sql.SQLException: Streaming result set
com.mysql.jdbc.RowDataDynamic@1e820764 is still active. No statements may
be issued when any streaming result sets are open and in use on a given
connection. Ensure that you have called .close() on any active streaming
result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
at
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3314)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2477)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809)
at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:5165)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5048)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4654)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1630)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:410)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:395)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:284)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)


Thanks,
Dileepa


On Thu, Jan 30, 2014 at 4:13 PM, Dileepa Jayakody dileepajayak...@gmail.com
 wrote:

 I would particularly like to know how DIH handles concurrency in JDBC
 database connections during datamport..

 dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:3306/solrtest user=usr1 password=123
 batchSize=1 /

 Thanks,
 Dileepa


 On Thu, Jan 30, 2014 at 4:05 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:

 Hi All,

 Can I please know about how concurrency is handled in the DIH?
 What happens if multiple /dataimport requests are issued to the same
 Datasource?

 I'm doing some custom processing at the end of dataimport process as an
 EventListener configured in the data-config.xml as below.
  document name=stanboldata
 onImportEnd=com.solr.stanbol.processor.StanbolEventListener

 Will each DIH request create a new EventListener object?

 I'm copying some field values from my custom processor configured in the
 /dataimport request handler to a static Map in my StanbolEventListener
 class.
 I need to figure out how to handle concurrency when data is copied to my
 EvenetListener object to perform the rest of my update process.

 Thanks,
 Dileepa





Re: Use a field without predefining it it the schema

2014-01-30 Thread Hakim Benoudjit
Thanks, That's a good feature since I dont have to reindex the whole data,
nor to restart solr app.


2014-01-30 Steve Rowe sar...@gmail.com

 Hakim,

 All the fields you have added manually to the schema will be kept when you
 switch to using managed schema.

 From the managed schema page on the Solr Reference Guide you linked to
 (describing what happens after you add schemaFactory
 class=ManagedIndexSchemaFactory.../schemaFactory to your solrconfig.xml,
 and then restart Solr in order for the change to take effect):

 Once Solr is restarted, the existing schema.xml file is renamed to
 schema.xml.bak and the contents are written to a file with the name
 defined as the managedSchemaResourceName.

 Steve

 On Jan 29, 2014, at 7:15 PM, Hakim Benoudjit h.benoud...@gmail.com
 wrote:

  I have found this link
 
 https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig
  .
  I dont know if it's required to modify the schema (see the link), to make
  it editable by the REST API. I wish that it doesnt clear all the fields
  that I have added manually to the schema.
 
 
  2014-01-30 Hakim Benoudjit h.benoud...@gmail.com
 
  Thanks Steve for the link.
  It seems very easy to create `new fields` in the `schema` using the
 `POST
  request`. But doest mean that I dont have to restart the `solr app`? Is
 so,
  is this feature available in latest solr version (`v4.6`)?
 
 
  2014-01-29 Alexandre Rafalovitch arafa...@gmail.com
 
  There is an example in the distribution that shows how new fields are
  auto-defined. I think it is example-schemaless. The secret is in the
  UpdateRequestProcessor chain that does cleanup and auto-mapping. Plus
  - I guess - automatically generated schema.
 
  Just remember that once the field is added the first time, it now
  exists. So careful not to send a date-looking thing into what should
  be a text field.
 
  Regards,
Alex.
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
  On Wed, Jan 29, 2014 at 5:45 AM, Steve Rowe sar...@gmail.com wrote:
  Hi Hakim,
 
  Check out the section of the Solr Reference Guide on modifying the
  schema via REST API:
 
 
 
 https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema
 
  Steve
 
  On Jan 28, 2014, at 5:00 PM, Hakim Benoudjit h.benoud...@gmail.com
  wrote:
 
  Hi guys
 
  With the new version of solr (4.6), can I add a field to the index,
  knowing
  that this field doesnt appear(isnt predefined) in the schema?
 
  I ask this question because I ve seen an issue (on jira) related to
  this.
 
  Thanks!
 
 
 
 




Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Srinivasa7
Hi, 

I  have similar kind of problem  where I want search for a words with spaces
in that. And I wanted to search by stripping all the spaces . 

I have used following schema for that 

fieldType name=nospaces class=solr.TextField
autoGeneratePhraseQueries=true  
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory 
pattern=[^\w]+  replacement= replace=all/
/analyzer
analyzer type=query

tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory 
pattern=[^\w]+  replacement= replace=all/
/analyzer
/fieldType 


And 


field name=text_nospaces type=nospaces  indexed=true stored=true
omitNorms=true /
copyField source=text dest=text_nospaces /



But it is not searching the right terms . we are stripping the spaces and
indexing lowercase values when we do that. 


Like : East Enders 

when I seach for   'east end ers'  text, its not returning any values saying
no document found.

I realised the solr uses QueryParser before passing query string to the
QueryAnalyzer in defined in schema. 

And The Query parser is tokenizing the query string providing in query . So
it is sending each token to the QueryAnalyser that is defined in schema. 


SO is there anyway that I can by pass this query parser or use a correct
query processor which can consider the entire string as single pharse. 

At the moment I am using dismax query processor.

Any suggestion would be much appreciated.

Thanks 
Srinivasa



--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Srinivasa7
Aleksander Akerø 
It would be great if you can share the solution how you are handling it on
field basis



--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114435.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Jack Krupansky
The standard, keyword-oriented query parsers will all treat unquoted, 
unescaped white space as term delimiters and ignore the what space. There is 
no way to bypass that behavior. So, your regex will never even see the white 
space - unless you enclose the text and white space in quotes or use a 
backslash to quote each white space character.


You can use the field and term query parsers to pass a query string as 
if it were fully enclosed in quotes, but that only handles a single term and 
does not allow for multiple terms or any query operators. For example:


{!field f=myfield}Foo Bar

See:
http://wiki.apache.org/solr/QueryParser

You can also pre-configure the field query parser with the defType=field 
parameter.


-- Jack Krupansky


-Original Message- 
From: Srinivasa7

Sent: Thursday, January 30, 2014 6:37 AM
To: solr-user@lucene.apache.org
Subject: Re: KeywordTokenizerFactory - trouble with exact matches

Hi,

I  have similar kind of problem  where I want search for a words with spaces
in that. And I wanted to search by stripping all the spaces .

I have used following schema for that

fieldType name=nospaces class=solr.TextField
autoGeneratePhraseQueries=true  
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.PatternReplaceFilterFactory
pattern=[^\w]+  replacement= replace=all/
   /analyzer
   analyzer type=query

   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.PatternReplaceFilterFactory
pattern=[^\w]+  replacement= replace=all/
   /analyzer
   /fieldType


And


field name=text_nospaces type=nospaces  indexed=true stored=true
omitNorms=true /
   copyField source=text dest=text_nospaces /



But it is not searching the right terms . we are stripping the spaces and
indexing lowercase values when we do that.


Like : East Enders

when I seach for   'east end ers'  text, its not returning any values saying
no document found.

I realised the solr uses QueryParser before passing query string to the
QueryAnalyzer in defined in schema.

And The Query parser is tokenizing the query string providing in query . So
it is sending each token to the QueryAnalyser that is defined in schema.


SO is there anyway that I can by pass this query parser or use a correct
query processor which can consider the entire string as single pharse.

At the moment I am using dismax query processor.

Any suggestion would be much appreciated.

Thanks
Srinivasa



--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114432.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: ant eclipse hangs - branch_4x

2014-01-30 Thread Steve Rowe
Hi Per,

You may be seeing the stale-Ivy-lock problem (see IVY-1388). LUCENE-4636 
upgraded the bootstrapped Ivy to 2.3.0 to reduce the likelihood of this 
problem, so the first thing is to make sure you have that version in 
~/.ant/lib/ - if not, remove the Ivy jar that’s there and run ‘ant 
ivy-bootstrap’ to download and put the 2.3.0 jar in place.

You should run the following and remove any files it finds:

find ~/.ivy2/cache -name ‘*.lck’

That should stop ‘ant resolve’ from hanging.

Steve 
 
On Jan 30, 2014, at 5:06 AM, Per Steffensen st...@designware.dk wrote:

 Hi
 
 Earlier in used to be able to successfully run ant eclipse from branch_4x. 
 With the newest code (tip of branch_4x today) I cant. ant eclipse hangs 
 forever at the point showed by console output below. I noticed that this 
 problem has been around for a while - not something that happened today. Any 
 idea about what might be wrong? A solution? Help to debug?
 
 Regards Per Steffensen
 
 --- console when running ant eclipse -
 
 ...
 
 resolve:
 [echo] Building solr-example-DIH...
 
 ivy-availability-check:
 [echo] Building solr-example-DIH...
 
 ivy-fail:
 
 ivy-configure:
 [ivy:configure] :: loading settings :: file = 
 /Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml
 
 resolve:
 
 resolve:
 [echo] Building solr-core...
 
 ivy-availability-check:
 [echo] Building solr-core...
 
 ivy-fail:
 
 ivy-fail:
 
 ivy-configure:
 [ivy:configure] :: loading settings :: file = 
 /Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml
 
 resolve:
 
 HERE IT JUST HANGS FOREVER
 -



Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
Hi Srinivasa

Yes I've come to understand that the analyzers will never see the
whitespace, thus no need for patternreplacement, like Jack points out. So
the solution would be to set wich parser to use for the query. Also Jack
has pointed out that the field queryparser should work in this particular
setting - http://wiki.apache.org/solr/QueryParser

My problem was though, that it was only for one of the fields in the schema
that i needed this for, but for all the other fields, e.g. name,
description etc., I would very much like to make use of the eDisMax
functionality. And it seems that there can only be defined one query parser
per query. in other words: for all fields. Jack, you may correct me if I'm
wrong here :)

This particular customer wanted a wildcard search at both ends of the
phrase, and that sort of ambiguated the problem. And therefore I chose to
replace all whitespace for this field in sql at index time, using the DIH.
And then using EdgeNGramFilterFactory on both sides of the keyword like the
config below, and that seemed to work pretty nicely.

!--  WildCard search number  -- fieldType name=keyword class=
solr.TextField positionIncrementGap=100 analyzer type=index 
tokenizer class=solr.KeywordTokenizerFactory/ filter class=
solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory
minGramSize=2 maxGramSize=25 side=front/ filter class=
solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25 side=back/
/analyzer analyzer type=query tokenizer class=
solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory
/ /analyzer /fieldType

I also added a bit of extra weighting for the keyword field so that exact
matches recieved a higher score.

What this solution doesn't do is to exclude values like EE 009, when
searching for FE 009, but they return far down on the list, which for the
customer is ok, because usually these results are somewhat related og
within the same category.

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Jack Krupansky j...@basetechnology.com

 The standard, keyword-oriented query parsers will all treat unquoted,
 unescaped white space as term delimiters and ignore the what space. There
 is no way to bypass that behavior. So, your regex will never even see the
 white space - unless you enclose the text and white space in quotes or use
 a backslash to quote each white space character.

 You can use the field and term query parsers to pass a query string as
 if it were fully enclosed in quotes, but that only handles a single term
 and does not allow for multiple terms or any query operators. For example:

 {!field f=myfield}Foo Bar

 See:
 http://wiki.apache.org/solr/QueryParser

 You can also pre-configure the field query parser with the defType=field
 parameter.

 -- Jack Krupansky


 -Original Message- From: Srinivasa7
 Sent: Thursday, January 30, 2014 6:37 AM

 To: solr-user@lucene.apache.org
 Subject: Re: KeywordTokenizerFactory - trouble with exact matches

 Hi,

 I  have similar kind of problem  where I want search for a words with
 spaces
 in that. And I wanted to search by stripping all the spaces .

 I have used following schema for that

 fieldType name=nospaces class=solr.TextField
 autoGeneratePhraseQueries=true  
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory
 pattern=[^\w]+  replacement= replace=all/
/analyzer
analyzer type=query

tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory
 pattern=[^\w]+  replacement= replace=all/
/analyzer
/fieldType


 And


 field name=text_nospaces type=nospaces  indexed=true stored=true
 omitNorms=true /
copyField source=text dest=text_nospaces /



 But it is not searching the right terms . we are stripping the spaces and
 indexing lowercase values when we do that.


 Like : East Enders

 when I seach for   'east end ers'  text, its not returning any values
 saying
 no document found.

 I realised the solr uses QueryParser before passing query string to the
 QueryAnalyzer in defined in schema.

 And The Query parser is tokenizing the query string providing in query . So
 it is sending each token to the QueryAnalyser that is defined in schema.


 SO is there anyway that I can by pass this query parser or use a correct
 query processor which can consider the entire string as single pharse.

 At the moment I am using dismax query processor.

 Any suggestion would be much appreciated.

 Thanks
 Srinivasa



 --
 View this message in context: http://lucene.472066.n3.nabble.com/
 

Re: Not finding part of fulltext field when word ends in dot

2014-01-30 Thread Jack Krupansky
The word delimiter filter will turn 26KA into two tokens, as if you had 
written 26 KA without the quotes. The autoGeneratePhraseQueries option 
will cause the multiple terms to be treated as if they actually were 
enclosed within quotes, otherwise they will be treated as separate and 
unquoted terms. If you do enclose 26KA in quotes in your query then 
autoGeneratePhraseQueries is not relevant.


Ah... maybe the problem is that you have preserveOriginal=true in your 
query analyzer. Do you have your default query operator set to AND? If so, 
it would treat 26KA as 26 AND KA AND 26KA, which requires that 
26KA (without the trailing dot) to be in the index.


It seems counter-intuitive, but the attributes of the index and query word 
delimiter filters need to be slightly asymmetric.


-- Jack Krupansky

-Original Message- 
From: Thomas Michael Engelke

Sent: Thursday, January 30, 2014 2:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Not finding part of fulltext field when word ends in dot

I'm not sure I got my problem across. If I understand the snippet of
documentation right, autoGeneratePhraseQueries only affects queries that
result in multiple tokens, which mine does not. The version also is
3.6.0.1, and we're not planning on upgrading to any 4.x version.


2014-01-29 Jack Krupansky j...@basetechnology.com


You might want to add autoGeneratePhraseQueries=true to your field
type, but I don't think that would cause a break when going from 3.6 to
4.x. The default for that attribute changed in Solr 3.5. What release was
your data indexed using? There may have been some subtle word delimiter
filter changes between 3.x and 4.x.

Read:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%
3CC0551C512C863540BC59694A118452AA0764A434@ITS-EMBX-03.
adsroot.itcs.umich.edu%3E



-Original Message- From: Thomas Michael Engelke
Sent: Wednesday, January 29, 2014 11:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Not finding part of fulltext field when word ends in dot


The fieldType definition is a tad on the longer side:

   fieldType name=text class=solr.TextField
positionIncrementGap=100
   analyzer type=index
   tokenizer
class=solr.WhitespaceTokenizerFactory/

   filter
class=solr.WordDelimiterFilterFactory
   catenateWords=1
   catenateNumbers=1
   generateNumberParts=1
   splitOnCaseChange=1
   generateWordParts=1
   catenateAll=0
   preserveOriginal=1
   splitOnNumerics=0
   /

   filter
class=solr.LowerCaseFilterFactory/
   filter class=solr.SynonymFilterFactory
synonyms=german/synonyms.txt ignoreCase=true expand=true/
   filter
class=solr.DictionaryCompoundWordTokenFilterFactory

dictionary=german/german-common-nouns.txt
   minWordSize=5
   minSubwordSize=4
   maxSubwordSize=15
   onlyLongestMatch=true
   /

   filter class=solr.StopFilterFactory
words=german/stopwords.txt ignoreCase=true
enablePositionIncrements=true/
   filter
class=solr.SnowballPorterFilterFactory language=German2
protected=german/protwords.txt/
   filter
class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
   tokenizer
class=solr.WhitespaceTokenizerFactory/

   filter
class=solr.WordDelimiterFilterFactory
   catenateWords=0
   catenateNumbers=0
   generateWordParts=1
   splitOnCaseChange=1
   generateNumberParts=1
   catenateAll=0
   preserveOriginal=1
   splitOnNumerics=0
   /
   filter
class=solr.LowerCaseFilterFactory/
   filter class=solr.StopFilterFactory
words=german/stopwords.txt ignoreCase=true
enablePositionIncrements=true/
   filter
class=solr.SnowballPorterFilterFactory language=German2
protected=german/protwords.txt/
   filter
class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   

Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-30 Thread Jack Krupansky
Lucene's default scoring should give you much of what you want - ranking 
hits of low-frequency terms higher - without any special query syntax - just 
list out your terms and use OR as your default operator.


-- Jack Krupansky

-Original Message- 
From: svante karlsson

Sent: Thursday, January 23, 2014 6:42 AM
To: solr-user@lucene.apache.org
Subject: how to write an efficient query with a subquery to restrict the 
search space?


I have a solr db containing 1 billion records that I'm trying to use in a
NoSQL fashion.

What I want to do is find the best matches using all search terms but
restrict the search space to the most unique terms

In this example I know that val2 and val4 is rare terms and val1 and val3
are more common. In my real scenario I'll have 20 fields that I want to
include or exclude in the inner query depending on the uniqueness of the
requested value.


my first approach was:
q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
OR field4:val4)rows=100fl=*

but what I think I get is
.  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
OR'ed with the rest

if I write
q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
(field2:val2 OR field4:val4)rows=100fl=*

then what I think I get is two sub-queries that is evaluated separately and
then joined - performance wise this is bad.

Whats the best way to write these types of queries?


Are there any performance issues when running it on several solrcloud nodes
vs a single instance or should it scale?



/svante 



SOLR suggester with highlighting

2014-01-30 Thread Jorge Sanchez
Hello,

I am trying to make a typehead autocomplete with SOLR using the suggester.

The search will be done for users and group names which aggregate users.
The search will be done on usernames , bio , web page and other stuff. What
I want to achieve is sort of facebook or twitter alike search. For this
I need to enrich the result from SOLR with additional data (user type, url
of the profile, his avatar url etc.).

The user and group would have the ID field in SOLR which would correspond
to the ID in the DB to get these information. I am stuck on how to do that.

Currently I have the suggester working but it only returns the suggesting
value, when I try to return some other attribute from the document it
doesn't work.

Here is the part of the solrconfig:

searchComponent class=solr.SpellCheckComponent name=suggest
  !-- configure the spellchecker used
 for autocomplete (dictionary) --
  lst name=spellchecker
  str name=namesuggester_dictionary/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str 
name=lookupImplorg.apache.solr.spelling.suggest.fst.FSTLookupFactory/str
  !-- The indexed field to derive suggestions from --
  str name=fieldautocomplete/str
  !-- buildOnCommit must be set to true because
suggester keeps data in memory --
  str name=buildOnCommittrue/str
  /lst
  /searchComponent


requestHandler class=solr.SearchHandler name=/suggest
  lst name=defaults
  !-- by default use the suggester_dictionary --
  str name=spellcheck.dictionarysuggester_dictionary/str
  str name=spellcheck.count5/str
  str name=spellcheck.onlyMorePopularfalse/str
  /lst
  lst name=invariants
  !-- always run the Suggester for queries to this handler --
  str name=spellchecktrue/str
  !-- collate not needed, query if tokenized as keyword, we
need only suggestions for that term --
  str name=spellcheck.collatefalse/str
  /lst
  !-- this handler uses only the needed component :
suggest defined above --
  arr name=components
  strsuggest/str
  strhighlight/str
  /arr
  /requestHandler

and scheme:

field name=groupid type=int indexed=true stored=true
   required=true multiValued=false/
   field name=groupusername type=text_general indexed=true
   stored=true multiValued=true/
   field name=groupname type=text_general indexed=true
   stored=true multiValued=false/
   field name=grouporuser type=boolean indexed=true
   stored=true multiValued=false/

field name=autocomplete type=text_autocomplete/

copyField source=groupusername dest=autocomplete/
copyField source=groupname dest=autocomplete/

The query: http://gruppu.com:8983/solr/suggest?q=*:*
spellcheck.q=jospellcheck=truehl=onhl.fl=groupid

The respond:

response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=jo
int name=numFound2/int
int name=startOffset0/int
int name=endOffset2/int
arr name=suggestion
strjorge/str
strjorgen/str
/arr
/lst
/lst
/lst
/response

I would like to have the groupid and grouporuser fields returned ... No
luck so far.


Re: Solr middle-ware?

2014-01-30 Thread Jack Krupansky
It would be great if an example were available as part of the Solr release. 
Please file a Jira request. Maybe this could be one of the GSOC (Google 
Summer of Code) projects, or maybe somebody/everybody could submit their 
search middleware code as possible examples, attached to the Jira, so that 
even if these examples are not formally released, at least people can view 
and copy them.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Tuesday, January 21, 2014 8:00 AM
To: solr-user@lucene.apache.org
Subject: Solr middle-ware?

Hello,

All the Solr documents talk about not running Solr directly to the
cloud. But I see people keep asking for a thin secure layer in front
of Solr they can talk from JavaScript to, perhaps with some basic
extension options.

Has anybody actually written one? Open source or in a community part
of larger project? I would love to be able to point people at
something.

Is there something particularly difficult about writing one? Does
anybody has a story of aborted attempt or mid-point reversal? I would
like to know.

Regards,
  Alex.
P.s. Personal context: I am thinking of doing a series of lightweight
examples of how to use Solr. Like I did for a book, but with a bit
more depth and something that can actually be exposed to the live web
with live data. I don't want to reinvent the wheel of the thin Solr
middleware.
P.p.s. Though I keep thinking that Dart could make an interesting
option for the middleware as it could have the same codebase on the
server and in the client. Like NodeJS, but with saner syntax.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book) 



Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Alexei Martchenko
I believe its not possible to facet only the page you are, facet is
supposed to work only with the full resultset. I never tried but i've never
seen a way this could be done.


alexei martchenko
Facebook http://www.facebook.com/alexeiramone |
Linkedinhttp://br.linkedin.com/in/alexeimartchenko|
Steam http://steamcommunity.com/id/alexeiramone/ |
4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone |
Github https://github.com/alexeiramone | (11) 9 7613.0966 |


2014-01-30 Mikhail Khludnev mkhlud...@griddynamics.com:

 Hello
 Do you mean setting
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
 you want to facet only returned page (rows) instead of full resultset
 (numFound) ?


 On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
 kuchekar.nil...@gmail.comwrote:

  Yeah it's a typo... I meant company:Apple
 
  Thanks
  Nilesh
 
   On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch arafa...@gmail.com
 
  wrote:
  
   On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar kuchekar.nil...@gmail.com
  wrote:
   company=Apple
   Did you mean company:Apple ?
  
   Otherwise, that could be the issue.
  
   Regards,
 Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: Solr middle-ware?

2014-01-30 Thread Furkan KAMACI
Hi;

If you need such kind of thing and if you/we can define the requirements I
can contribute to Solr as a part of GSOC.

Thanks;
Furkan KAMACI



2014-01-30 Jack Krupansky j...@basetechnology.com:

 It would be great if an example were available as part of the Solr
 release. Please file a Jira request. Maybe this could be one of the GSOC
 (Google Summer of Code) projects, or maybe somebody/everybody could submit
 their search middleware code as possible examples, attached to the Jira, so
 that even if these examples are not formally released, at least people can
 view and copy them.

 -- Jack Krupansky

 -Original Message- From: Alexandre Rafalovitch
 Sent: Tuesday, January 21, 2014 8:00 AM

 To: solr-user@lucene.apache.org
 Subject: Solr middle-ware?

 Hello,

 All the Solr documents talk about not running Solr directly to the
 cloud. But I see people keep asking for a thin secure layer in front
 of Solr they can talk from JavaScript to, perhaps with some basic
 extension options.

 Has anybody actually written one? Open source or in a community part
 of larger project? I would love to be able to point people at
 something.

 Is there something particularly difficult about writing one? Does
 anybody has a story of aborted attempt or mid-point reversal? I would
 like to know.

 Regards,
   Alex.
 P.s. Personal context: I am thinking of doing a series of lightweight
 examples of how to use Solr. Like I did for a book, but with a bit
 more depth and something that can actually be exposed to the live web
 with live data. I don't want to reinvent the wheel of the thin Solr
 middleware.
 P.p.s. Though I keep thinking that Dart could make an interesting
 option for the middleware as it could have the same codebase on the
 server and in the client. Like NodeJS, but with saner syntax.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)



Re: high memory usage with small data set

2014-01-30 Thread Erick Erickson
Do your used entries in your caches increase in parallel? This would be the case
if you aren't updating your index and would explain it. BTW, take a look at your
cache statistics (from the admin page) and look at the cache hit ratios. If they
are very small (and my guess is that with 1,500 boolean operations, you aren't
getting significant re-use) then you're just wasting space, try the cache=false
option.

Also, how are you measuring memory? It's sometimes confusing that virtual
memory can be include, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Wed, Jan 29, 2014 at 7:49 AM, Johannes Siegert
johannes.sieg...@marktjagd.de wrote:
 Hi,

 we are using Apache Solr Cloud within a production environment. If the
 maximum heap-space is reached the Solr access time slows down, because of
 the working garbage collector for a small amount of time.

 We use the following configuration:

 - Apache Tomcat as webserver to run the Solr web application
 - 13 indices with about 150 entries (300 MB)
 - 5 server with one replication per index (5 GB max heap-space)
 - All indices have the following caches
- maximum document-cache-size is 4096 entries, all other indices have
 between 64 and 1536 entries
- maximum query-cache-size is 1024 entries, all other indices have
 between 64 and 768
- maximum filter-cache-size is 1536 entries, all other i ndices have
 between 64 and 1024
 - the directory-factory-implementation is NRTCachingDirectoryFactory
 - the index is updated once per hour (no auto commit)
 - ca. 5000 requests per hour per server
 - large filter-queries (up to 15000 bytes and 1500 boolean operations)
 - many facet-queries (30%)

 Behaviour:

 Started with 512 MB heap space. Over several days the heap-space grow up,
 until the 5 GB was reached. At this moment the described problem occurs.
 From this time on the heap-space-useage is between 50 and 90 percent. No
 OutOfMemoryException occurs.

 Questions:


 1. Why does Solr use 5 GB ram, with this small amount of data?
 2. Which impact does the large filter-queries have in relation to ram usage?

 Thanks!

 Johannes Siegert


Re: 4.6 Core Discovery coreRootDirectory not working

2014-01-30 Thread Erick Erickson
I'm traveling and can't pursue this right now, but a couple of questions:

/home/user1/solr/core.properties exists in all these cases, right?

Tangential, but I'd be very cautious about setting core root the way you are,
since it'll walk each and every directory under /home looking for cores. Perhaps
you're just caught in that file-traversal loop (guessing here).

Do the log files show anything interesting?

I'll be able to respond occasionally between now and next week, since
we're on the road...

Best
Erick

On Wed, Jan 29, 2014 at 3:41 PM, Sam Batschelet sbatsche...@mac.com wrote:
 On Jan 29, 2014, at 4:31 PM, Sam Batschelet wrote:

 Hello this is my 1st post to you group I am in the process of setting up a 
 development environment using solr.  We will require multiple cores managed 
 by multiple users in the following layout.  I am running a fairly vanilla 
 version of 4.6

 solrHome
 /home/camp/example/solr/solr.xml

 cores
 /home/user1/solr/core.properties
 /home/user2/solr/core.properties

 If I manually add the core from admin everything works fine I can index etc 
 but when I kill the server the core information is no longer available.  I 
 need to delete the core.properties file and recreate core from admin.

 I since have learned that this should be done with Core Discovery.  Mainly 
 setting coreRootDirectory which logically in this case should be /home.  But 
 solr is not finding the core even if I set the directory directly. ie 
 /home/user1/solr/ or /home/user1/.  I must be missing another config and was 
 hoping for some insight.


 ## solr.xml
 solr
  !-- str name=coreRootDirectory${coreRootDirectory:/home}/str --

 Just to point out the obvious before I get 20 responses to such I did test 
 this without the commenting :).


Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Erick Erickson
Note, the comments about lowercasetokenizer were a red herring. You were
using LowerCaseFilterFactory. note Filter rather than Tokenizer. So it would
just do what you expected, lowercase the entire input. You would have used
LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a Filter.

As for the rest, I expect Jack is right, it's the query parsing above
the field input.

Best
Erick

On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
aleksan...@gurusoft.no wrote:
 Hi Srinivasa

 Yes I've come to understand that the analyzers will never see the
 whitespace, thus no need for patternreplacement, like Jack points out. So
 the solution would be to set wich parser to use for the query. Also Jack
 has pointed out that the field queryparser should work in this particular
 setting - http://wiki.apache.org/solr/QueryParser

 My problem was though, that it was only for one of the fields in the schema
 that i needed this for, but for all the other fields, e.g. name,
 description etc., I would very much like to make use of the eDisMax
 functionality. And it seems that there can only be defined one query parser
 per query. in other words: for all fields. Jack, you may correct me if I'm
 wrong here :)

 This particular customer wanted a wildcard search at both ends of the
 phrase, and that sort of ambiguated the problem. And therefore I chose to
 replace all whitespace for this field in sql at index time, using the DIH.
 And then using EdgeNGramFilterFactory on both sides of the keyword like the
 config below, and that seemed to work pretty nicely.

 !--  WildCard search number  -- fieldType name=keyword class=
 solr.TextField positionIncrementGap=100 analyzer type=index 
 tokenizer class=solr.KeywordTokenizerFactory/ filter class=
 solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory
 minGramSize=2 maxGramSize=25 side=front/ filter class=
 solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25 side=back/
 /analyzer analyzer type=query tokenizer class=
 solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory
 / /analyzer /fieldType

 I also added a bit of extra weighting for the keyword field so that exact
 matches recieved a higher score.

 What this solution doesn't do is to exclude values like EE 009, when
 searching for FE 009, but they return far down on the list, which for the
 customer is ok, because usually these results are somewhat related og
 within the same category.

 *Aleksander Akerø*
 Systemkonsulent
 Mobil: 944 89 054
 E-post: aleksan...@gurusoft.no

 *Gurusoft AS*
 Telefon: 92 44 09 99
 Østre Kullerød
 www.gurusoft.no


 2014-01-30 Jack Krupansky j...@basetechnology.com

 The standard, keyword-oriented query parsers will all treat unquoted,
 unescaped white space as term delimiters and ignore the what space. There
 is no way to bypass that behavior. So, your regex will never even see the
 white space - unless you enclose the text and white space in quotes or use
 a backslash to quote each white space character.

 You can use the field and term query parsers to pass a query string as
 if it were fully enclosed in quotes, but that only handles a single term
 and does not allow for multiple terms or any query operators. For example:

 {!field f=myfield}Foo Bar

 See:
 http://wiki.apache.org/solr/QueryParser

 You can also pre-configure the field query parser with the defType=field
 parameter.

 -- Jack Krupansky


 -Original Message- From: Srinivasa7
 Sent: Thursday, January 30, 2014 6:37 AM

 To: solr-user@lucene.apache.org
 Subject: Re: KeywordTokenizerFactory - trouble with exact matches

 Hi,

 I  have similar kind of problem  where I want search for a words with
 spaces
 in that. And I wanted to search by stripping all the spaces .

 I have used following schema for that

 fieldType name=nospaces class=solr.TextField
 autoGeneratePhraseQueries=true  
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory
 pattern=[^\w]+  replacement= replace=all/
/analyzer
analyzer type=query

tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory
 pattern=[^\w]+  replacement= replace=all/
/analyzer
/fieldType


 And


 field name=text_nospaces type=nospaces  indexed=true stored=true
 omitNorms=true /
copyField source=text dest=text_nospaces /



 But it is not searching the right terms . we are stripping the spaces and
 indexing lowercase values when we do that.


 Like : East Enders

 when I seach for   'east end ers'  text, its not returning any values
 saying
 no document found.

 I realised the solr uses QueryParser before passing query string to the
 QueryAnalyzer in defined in schema.

 And The Query parser is 

Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
Yes, I actually noted that about the filter vs. tokenizer. It's easy to get
confused if you don't have a good understanding of the differences between
tokenizers and filters.

As for the query parser problem, there's always a workaround, but it was
nice to be made aware of. It sort of was a ghost-like problem before.
Allthough it would be great to have the opportunity to disable the
splitting on whitespace even for DisMax, I understand that it probably not
the most wanted feature for next solr release :)

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Erick Erickson erickerick...@gmail.com:

 Note, the comments about lowercasetokenizer were a red herring. You were
 using LowerCaseFilterFactory. note Filter rather than Tokenizer. So it
 would
 just do what you expected, lowercase the entire input. You would have used
 LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a
 Filter.

 As for the rest, I expect Jack is right, it's the query parsing above
 the field input.

 Best
 Erick

 On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
 aleksan...@gurusoft.no wrote:
  Hi Srinivasa
 
  Yes I've come to understand that the analyzers will never see the
  whitespace, thus no need for patternreplacement, like Jack points out. So
  the solution would be to set wich parser to use for the query. Also Jack
  has pointed out that the field queryparser should work in this
 particular
  setting - http://wiki.apache.org/solr/QueryParser
 
  My problem was though, that it was only for one of the fields in the
 schema
  that i needed this for, but for all the other fields, e.g. name,
  description etc., I would very much like to make use of the eDisMax
  functionality. And it seems that there can only be defined one query
 parser
  per query. in other words: for all fields. Jack, you may correct me if
 I'm
  wrong here :)
 
  This particular customer wanted a wildcard search at both ends of the
  phrase, and that sort of ambiguated the problem. And therefore I chose to
  replace all whitespace for this field in sql at index time, using the
 DIH.
  And then using EdgeNGramFilterFactory on both sides of the keyword like
 the
  config below, and that seemed to work pretty nicely.
 
  !--  WildCard search number  -- fieldType name=keyword
 class=
  solr.TextField positionIncrementGap=100 analyzer type=index 
  tokenizer class=solr.KeywordTokenizerFactory/ filter class=
  solr.LowerCaseFilterFactory/ filter
 class=solr.EdgeNGramFilterFactory
  minGramSize=2 maxGramSize=25 side=front/ filter class=
  solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25
 side=back/
  /analyzer analyzer type=query tokenizer class=
  solr.KeywordTokenizerFactory/ filter
 class=solr.LowerCaseFilterFactory
  / /analyzer /fieldType
 
  I also added a bit of extra weighting for the keyword field so that
 exact
  matches recieved a higher score.
 
  What this solution doesn't do is to exclude values like EE 009, when
  searching for FE 009, but they return far down on the list, which for
 the
  customer is ok, because usually these results are somewhat related og
  within the same category.
 
  *Aleksander Akerø*
  Systemkonsulent
  Mobil: 944 89 054
  E-post: aleksan...@gurusoft.no
 
  *Gurusoft AS*
  Telefon: 92 44 09 99
  Østre Kullerød
  www.gurusoft.no
 
 
  2014-01-30 Jack Krupansky j...@basetechnology.com
 
  The standard, keyword-oriented query parsers will all treat unquoted,
  unescaped white space as term delimiters and ignore the what space.
 There
  is no way to bypass that behavior. So, your regex will never even see
 the
  white space - unless you enclose the text and white space in quotes or
 use
  a backslash to quote each white space character.
 
  You can use the field and term query parsers to pass a query string
 as
  if it were fully enclosed in quotes, but that only handles a single term
  and does not allow for multiple terms or any query operators. For
 example:
 
  {!field f=myfield}Foo Bar
 
  See:
  http://wiki.apache.org/solr/QueryParser
 
  You can also pre-configure the field query parser with the defType=field
  parameter.
 
  -- Jack Krupansky
 
 
  -Original Message- From: Srinivasa7
  Sent: Thursday, January 30, 2014 6:37 AM
 
  To: solr-user@lucene.apache.org
  Subject: Re: KeywordTokenizerFactory - trouble with exact matches
 
  Hi,
 
  I  have similar kind of problem  where I want search for a words with
  spaces
  in that. And I wanted to search by stripping all the spaces .
 
  I have used following schema for that
 
  fieldType name=nospaces class=solr.TextField
  autoGeneratePhraseQueries=true  
 analyzer type=index
   tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PatternReplaceFilterFactory
  pattern=[^\w]+  replacement= replace=all/
   

SolR performance problem

2014-01-30 Thread MayurPanchal
Hi, 

I am working on solr 4.2.1 jetty and we are facing some performance issue
and heap memory overflow issue as well. So i am searching the actual cause
for this exceptions. then i applied load test for different solr queries.
After few mins got below errors. 

WARN:oejs.Response:Committed before 500 {msg=Software caused connection
abort: socket write 

Caused by: java.net.SocketException: Software caused connection abort:
socket write error

SEVERE: null:org.eclipse.jetty.io.EofException


I also tried to set the maxIdleTime to 30 milliSeconds. But still
getting same error. 

Any ideas? 
Please help, how to tackle this. 

Thanks,
Mayur



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-performance-problem-tp4114459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Jack Krupansky
I vaguely recall that there was a Jira floating around for multi-word 
synonyms that dealt with parsing of spaces as well. And Robert Muir has 
(repeatedly) referred to this query parser feature as a bug. Somehow, 
eventually, I think it will be dealt with, but the difficulty remains for 
now.


-- Jack Krupansky

-Original Message- 
From: Aleksander Akerø

Sent: Thursday, January 30, 2014 9:31 AM
To: solr-user@lucene.apache.org
Subject: Re: KeywordTokenizerFactory - trouble with exact matches

Yes, I actually noted that about the filter vs. tokenizer. It's easy to get
confused if you don't have a good understanding of the differences between
tokenizers and filters.

As for the query parser problem, there's always a workaround, but it was
nice to be made aware of. It sort of was a ghost-like problem before.
Allthough it would be great to have the opportunity to disable the
splitting on whitespace even for DisMax, I understand that it probably not
the most wanted feature for next solr release :)

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Erick Erickson erickerick...@gmail.com:


Note, the comments about lowercasetokenizer were a red herring. You were
using LowerCaseFilterFactory. note Filter rather than Tokenizer. So it
would
just do what you expected, lowercase the entire input. You would have used
LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a
Filter.

As for the rest, I expect Jack is right, it's the query parsing above
the field input.

Best
Erick

On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
aleksan...@gurusoft.no wrote:
 Hi Srinivasa

 Yes I've come to understand that the analyzers will never see the
 whitespace, thus no need for patternreplacement, like Jack points out. 
 So

 the solution would be to set wich parser to use for the query. Also Jack
 has pointed out that the field queryparser should work in this
particular
 setting - http://wiki.apache.org/solr/QueryParser

 My problem was though, that it was only for one of the fields in the
schema
 that i needed this for, but for all the other fields, e.g. name,
 description etc., I would very much like to make use of the eDisMax
 functionality. And it seems that there can only be defined one query
parser
 per query. in other words: for all fields. Jack, you may correct me if
I'm
 wrong here :)

 This particular customer wanted a wildcard search at both ends of the
 phrase, and that sort of ambiguated the problem. And therefore I chose 
 to

 replace all whitespace for this field in sql at index time, using the
DIH.
 And then using EdgeNGramFilterFactory on both sides of the keyword like
the
 config below, and that seemed to work pretty nicely.

 !--  WildCard search number  -- fieldType name=keyword
class=
 solr.TextField positionIncrementGap=100 analyzer type=index 
 tokenizer class=solr.KeywordTokenizerFactory/ filter class=
 solr.LowerCaseFilterFactory/ filter
class=solr.EdgeNGramFilterFactory
 minGramSize=2 maxGramSize=25 side=front/ filter class=
 solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25
side=back/
 /analyzer analyzer type=query tokenizer class=
 solr.KeywordTokenizerFactory/ filter
class=solr.LowerCaseFilterFactory
 / /analyzer /fieldType

 I also added a bit of extra weighting for the keyword field so that
exact
 matches recieved a higher score.

 What this solution doesn't do is to exclude values like EE 009, when
 searching for FE 009, but they return far down on the list, which for
the
 customer is ok, because usually these results are somewhat related og
 within the same category.

 *Aleksander Akerø*
 Systemkonsulent
 Mobil: 944 89 054
 E-post: aleksan...@gurusoft.no

 *Gurusoft AS*
 Telefon: 92 44 09 99
 Østre Kullerød
 www.gurusoft.no


 2014-01-30 Jack Krupansky j...@basetechnology.com

 The standard, keyword-oriented query parsers will all treat unquoted,
 unescaped white space as term delimiters and ignore the what space.
There
 is no way to bypass that behavior. So, your regex will never even see
the
 white space - unless you enclose the text and white space in quotes or
use
 a backslash to quote each white space character.

 You can use the field and term query parsers to pass a query string
as
 if it were fully enclosed in quotes, but that only handles a single 
 term

 and does not allow for multiple terms or any query operators. For
example:

 {!field f=myfield}Foo Bar

 See:
 http://wiki.apache.org/solr/QueryParser

 You can also pre-configure the field query parser with the 
 defType=field

 parameter.

 -- Jack Krupansky


 -Original Message- From: Srinivasa7
 Sent: Thursday, January 30, 2014 6:37 AM

 To: solr-user@lucene.apache.org
 Subject: Re: KeywordTokenizerFactory - trouble with exact matches

 Hi,

 I  have similar kind of problem  where I want search for a words with
 spaces
 in that. And I wanted to search by stripping all 

Re: ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen

Hi

I used Ivy 2.2.0. Upgraded to 2.3.0. Didnt help
No lck files found in ~/.ivy2/cache, so nothing to delete
Deleted the entire ~/.ivy2/cache folder. Didnt help
Debugged a little and found that it was hanging due to org.apache.hadoop 
dependencies in solr/core/ivy.xml - if I commended out everything that 
had to do with hadoop in that ivy.xml it didnt hang in ant resolve 
(from solr/core)
Finally the problem was solved when I tried to add 
http://central.maven.org/maven2 to our Artifactory. Do not understand 
why that was necessary, because we already had 
http://repo1.maven.org/maven2/ in our Artifactory.


Well never mind - it works for me now.

Thanks for the help!

Regards, Per Steffensen

On 1/30/14 1:11 PM, Steve Rowe wrote:

Hi Per,

You may be seeing the stale-Ivy-lock problem (see IVY-1388). LUCENE-4636 
upgraded the bootstrapped Ivy to 2.3.0 to reduce the likelihood of this 
problem, so the first thing is to make sure you have that version in 
~/.ant/lib/ - if not, remove the Ivy jar that’s there and run ‘ant 
ivy-bootstrap’ to download and put the 2.3.0 jar in place.

You should run the following and remove any files it finds:

 find ~/.ivy2/cache -name ‘*.lck’

That should stop ‘ant resolve’ from hanging.

Steve
  
On Jan 30, 2014, at 5:06 AM, Per Steffensen st...@designware.dk wrote:



Hi

Earlier in used to be able to successfully run ant eclipse from branch_4x. With the 
newest code (tip of branch_4x today) I cant. ant eclipse hangs forever at the point 
showed by console output below. I noticed that this problem has been around for a while - not 
something that happened today. Any idea about what might be wrong? A solution? Help to debug?

Regards Per Steffensen

--- console when running ant eclipse -

...

resolve:
 [echo] Building solr-example-DIH...

ivy-availability-check:
 [echo] Building solr-example-DIH...

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml

resolve:

resolve:
 [echo] Building solr-core...

ivy-availability-check:
 [echo] Building solr-core...

ivy-fail:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml

resolve:

HERE IT JUST HANGS FOREVER
-






Re: KeywordTokenizerFactory - trouble with exact matches

2014-01-30 Thread Aleksander Akerø
I've come across something like this as well, can't remember where, but it
was often related to synonym functionality.

The following link shows a 3rd party QueryParser that seems to deal with
synonyms alongside edismax, and may be interesting to look at:
http://wiki.apache.org/solr/QueryParser

It is also mentioned as an issue while using the synonymFilterFactory:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
The Lucene QueryParser tokenizes on white space before giving any text to
the Analyzer, so if a person searches for the words sea biscit the analyzer
will be given the words sea and biscit seperately, and will not know
that they match a synonym.

Maybe the extended support for synonym handling is what will give us the
solution one day. For now I have solved my problem and will leave it at
that.

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Jack Krupansky j...@basetechnology.com:

 I vaguely recall that there was a Jira floating around for multi-word
 synonyms that dealt with parsing of spaces as well. And Robert Muir has
 (repeatedly) referred to this query parser feature as a bug. Somehow,
 eventually, I think it will be dealt with, but the difficulty remains for
 now.

 -- Jack Krupansky

 -Original Message- From: Aleksander Akerø
 Sent: Thursday, January 30, 2014 9:31 AM

 To: solr-user@lucene.apache.org
 Subject: Re: KeywordTokenizerFactory - trouble with exact matches

 Yes, I actually noted that about the filter vs. tokenizer. It's easy to get
 confused if you don't have a good understanding of the differences between
 tokenizers and filters.

 As for the query parser problem, there's always a workaround, but it was
 nice to be made aware of. It sort of was a ghost-like problem before.
 Allthough it would be great to have the opportunity to disable the
 splitting on whitespace even for DisMax, I understand that it probably not
 the most wanted feature for next solr release :)

 *Aleksander Akerø*
 Systemkonsulent
 Mobil: 944 89 054
 E-post: aleksan...@gurusoft.no

 *Gurusoft AS*
 Telefon: 92 44 09 99
 Østre Kullerød
 www.gurusoft.no


 2014-01-30 Erick Erickson erickerick...@gmail.com:

  Note, the comments about lowercasetokenizer were a red herring. You were
 using LowerCaseFilterFactory. note Filter rather than Tokenizer. So it
 would
 just do what you expected, lowercase the entire input. You would have used
 LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a
 Filter.

 As for the rest, I expect Jack is right, it's the query parsing above
 the field input.

 Best
 Erick

 On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
 aleksan...@gurusoft.no wrote:
  Hi Srinivasa
 
  Yes I've come to understand that the analyzers will never see the
  whitespace, thus no need for patternreplacement, like Jack points out.
  So
  the solution would be to set wich parser to use for the query. Also Jack
  has pointed out that the field queryparser should work in this
 particular
  setting - http://wiki.apache.org/solr/QueryParser
 
  My problem was though, that it was only for one of the fields in the
 schema
  that i needed this for, but for all the other fields, e.g. name,
  description etc., I would very much like to make use of the eDisMax
  functionality. And it seems that there can only be defined one query
 parser
  per query. in other words: for all fields. Jack, you may correct me if
 I'm
  wrong here :)
 
  This particular customer wanted a wildcard search at both ends of the
  phrase, and that sort of ambiguated the problem. And therefore I chose
  to
  replace all whitespace for this field in sql at index time, using the
 DIH.
  And then using EdgeNGramFilterFactory on both sides of the keyword like
 the
  config below, and that seemed to work pretty nicely.
 
  !--  WildCard search number  -- fieldType name=keyword
 class=
  solr.TextField positionIncrementGap=100 analyzer type=index 
  tokenizer class=solr.KeywordTokenizerFactory/ filter class=
  solr.LowerCaseFilterFactory/ filter
 class=solr.EdgeNGramFilterFactory
  minGramSize=2 maxGramSize=25 side=front/ filter class=
  solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25
 side=back/
  /analyzer analyzer type=query tokenizer class=
  solr.KeywordTokenizerFactory/ filter
 class=solr.LowerCaseFilterFactory
  / /analyzer /fieldType
 
  I also added a bit of extra weighting for the keyword field so that
 exact
  matches recieved a higher score.
 
  What this solution doesn't do is to exclude values like EE 009, when
  searching for FE 009, but they return far down on the list, which for
 the
  customer is ok, because usually these results are somewhat related og
  within the same category.
 
  *Aleksander Akerø*
  Systemkonsulent
  Mobil: 944 89 054
  E-post: aleksan...@gurusoft.no
 
  *Gurusoft AS*
  Telefon: 92 44 09 99
  Østre Kullerød
  

Error when restarting solr servers

2014-01-30 Thread lansing
Hello,
Running solr cloud with 2 collections 5 shards and 3 replicas for each
collection, 5 zookeeper instance.
solr-4.6.0
apache-tomcat-7.0.39
zookeeper-3.4.5
jre1.7.0_21

When I try to restart a solr servers in my solr cloud I am receiving this
errors :

1861449 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â Running the leader
process for shard shard1
1861451 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â Checking if I should try
and be the leader.
1861451 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â My last published State
was down, I won't be the leader.
1861451 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â There may be a better
leader candidate than us - going back into recovery
1861452 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.update.DefaultSolrCoreState  â Running recovery - first
canceling any ongoing recovery
1861452 [localhost-startStop-1-EventThread] WARN 
org.apache.solr.cloud.RecoveryStrategy  â Stopping recovery for
zkNodeName=core_node3core=Current1_shard1_replica3
1862223 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  â
Finished recovery process. core=Current1_shard1_replica3
1862223 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  â
Starting recovery process.  core=Current1_shard1_replica3
recoveringAfterStartup=false
1862223 [RecoveryThread] ERROR org.apache.solr.update.UpdateLog  â Exception
reading versions from log
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
at sun.nio.ch.FileChannelImpl.read(Unknown Source)
at
org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:778)
at
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:71)
at
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
at
org.apache.solr.update.TransactionLog$FSReverseReader.init(TransactionLog.java:696)
at
org.apache.solr.update.TransactionLog.getReverseReader(TransactionLog.java:575)
at
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:942)
at
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:885)
at
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1042)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:280)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)
1862223 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  â
Error while trying to recover.
core=Current1_shard1_replica3:org.apache.solr.common.SolrException: Cloud
state sti
ll says we are leader.
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:354)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)

1862224 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  â
Recovery failed - trying again... (0) core=Current1_shard1_replica3
1862224 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  â
Wait 2.0 seconds before trying to recover again (1)
1862541 [localhost-startStop-1-SendThread(10.0.5.230:2281)] WARN 
org.apache.zookeeper.ClientCnxn  â Session 0x542fd3f2be100e6 for server
10.0.5.230/10.0.5.230:2281, unexpected error, clo
sing socket connection and attempting reconnect
java.io.IOException: Packet len11106511 is out of range!
at
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None


..


1270268 [http-bio-8201-exec-26] INFO 
org.apache.solr.handler.admin.CoreAdminHandler  â Going to wait for
coreNodeName: core_node10, state: recovering, checkLive: true, onlyIfLeader:
true
1270268 [http-bio-8201-exec-10] INFO 
org.apache.solr.handler.admin.CoreAdminHandler  â Going to wait for
coreNodeName: core_node11, state: recovering, 

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Shawn Heisey
On 1/29/2014 12:48 PM, Jeff Wartes wrote:
 And that, I think, is my misunderstanding. I had assumed that the link
 between a node and the collections it belongs to would be the (possibly
 chroot¹ed) zookeeper reference *itself*, not the node¹s directory
 structure. Instead, it appears that ZK is simply a repository for the
 collection configuration, where nodes may look up what they need based on
 filesystem core references.

Work is underway towards a new mode where zookeeper is the ultimate
source of truth, and each node will behave accordingly to implement and
maintain that truth.  I can't seem to locate a Jira issue for it,
unfortunately.  It's possible that one doesn't exist yet, or that it has
an obscure title.  Mark Miller is the one who really understands the
full details, as he's a primary author of SolrCloud code.

Currently, what SolrCloud considers to be truth is dictated by both
zookeeper and an amalgamation of which cores each server actually has
present.  The collections API modifies both.  With an older config (all
current and future 4.x versions), the latter is in solr.xml.  If you're
using the new solr.xml format (available 4.4 and later, will be
mandatory in 5.0), it's done with Core Discovery.  Zookeeper has a list
of everything and coordinates the cluster state, but has no real control
over the cores that actually exist on each server.  When the two sources
of truth disagree, nothing happens to fix the situation, manual
intervention is required.

Any errors in my understanding of SolrCloud are my own.  I don't claim
that what I just wrote is error-free, but I am pretty sure that it's
essentially correct.

Thanks,
Shawn



Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes

Work is underway towards a new mode where zookeeper is the ultimate
source of truth, and each node will behave accordingly to implement and
maintain that truth.  I can't seem to locate a Jira issue for it,
unfortunately.  It's possible that one doesn't exist yet, or that it has
an obscure title.  Mark Miller is the one who really understands the
full details, as he's a primary author of SolrCloud code.

Currently, what SolrCloud considers to be truth is dictated by both
zookeeper and an amalgamation of which cores each server actually has
present.  The collections API modifies both.  With an older config (all
current and future 4.x versions), the latter is in solr.xml.  If you're
using the new solr.xml format (available 4.4 and later, will be
mandatory in 5.0), it's done with Core Discovery.  Zookeeper has a list
of everything and coordinates the cluster state, but has no real control
over the cores that actually exist on each server.  When the two sources
of truth disagree, nothing happens to fix the situation, manual
intervention is required.


Thanks Shawn, this was exactly the confirmation I was looking for. I think
I have a much better understanding now.

The takeaway I have is that SolrCloud¹s current automation assumes
relatively static clusters, and that if I want anything like dynamic
scaling, I¹m going to have to write my own tooling to add nodes safely.

Fortunately, it appears that the necessary CoreAdmin commands don¹t need
much besides the collection name, so it smells like a simple thing to
query zookeeper¹s /collections path (or clusterstate.json) and issue GET
requests accordingly when I spin up a new node.

If you (or anyone) does happen to recall a reference to the work you
alluded to, I¹d certainly be interested. I googled around myself for a few
minutes, but haven¹t found anything so far.




Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Kuchekar
Hi Mikhail,

 I would like my faceting to run only on my resultset
returned as in only on numFound, rather than the whole index.

In the example, even when I specify the query 'company:Apple' .. it gives
me faceted results for other companies. This means that it is querying
against the whole index, rather than just the result set.

Using facet.mincount=1 will give me faceted values which are greater than
1, but that will again to retrieve all the distinct values (Apple, Bose,
Chevron, ..Oracle..) of facet field (company) query the whole index.

What I would like to do is ... facet only on the resultset.

i.e. my query (q= company:Apple AND technologies:java ) should return, only
the facet details about 'Apple' since that is only present in the results
set. But it provides me the list of other Company Names ... which makes me
believe that it is querying the whole index to get the distinct value for
the company..

docs: [ { id: ABC123, company: [ APPLE ] },
{ id: ABC1234, company: [ APPLE ] },
{ id: ABC1235, company: [ APPLE ] },
{ id: ABC1236, company: [ APPLE ] } ] }, facet_counts: { 
facet_queries: { p_company:ucsf\n: 1 }, facet_fields: { company: [
APPLE, 4, ] }, facet_dates: {}, facet_ranges: {} }


 Thanks.
Kuchekar, Nilesh


On Thu, Jan 30, 2014 at 2:13 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello
 Do you mean setting
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
 you want to facet only returned page (rows) instead of full resultset
 (numFound) ?


 On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
 kuchekar.nil...@gmail.comwrote:

  Yeah it's a typo... I meant company:Apple
 
  Thanks
  Nilesh
 
   On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch arafa...@gmail.com
 
  wrote:
  
   On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar kuchekar.nil...@gmail.com
  wrote:
   company=Apple
   Did you mean company:Apple ?
  
   Otherwise, that could be the issue.
  
   Regards,
 Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



RES: Regarding Solr Faceting on the query response.

2014-01-30 Thread Felipe Dantas de Souza Paiva
Hi Nilesh,

maybe Facetting is not the right thing for you, because 'faceting is the 
arrangement of search results into categories based on indexed terms' 
(https://cwiki.apache.org/confluence/display/solr/Faceting).

Perhaps you could use Result Clustering 
(https://cwiki.apache.org/confluence/display/solr/Result+Clustering), for   the 
clustering algorithm is applied to the search result of each single query.

Hope this helps.

Felipe Dantas de Souza Paiva

De: Kuchekar [kuchekar.nil...@gmail.com]
Enviado: quinta-feira, 30 de janeiro de 2014 15:35
Para: solr-user@lucene.apache.org
Assunto: Re: Regarding Solr Faceting on the query response.

Hi Mikhail,

 I would like my faceting to run only on my resultset
returned as in only on numFound, rather than the whole index.

In the example, even when I specify the query 'company:Apple' .. it gives
me faceted results for other companies. This means that it is querying
against the whole index, rather than just the result set.

Using facet.mincount=1 will give me faceted values which are greater than
1, but that will again to retrieve all the distinct values (Apple, Bose,
Chevron, ..Oracle..) of facet field (company) query the whole index.

What I would like to do is ... facet only on the resultset.

i.e. my query (q= company:Apple AND technologies:java ) should return, only
the facet details about 'Apple' since that is only present in the results
set. But it provides me the list of other Company Names ... which makes me
believe that it is querying the whole index to get the distinct value for
the company..

docs: [ { id: ABC123, company: [ APPLE ] },
{ id: ABC1234, company: [ APPLE ] },
{ id: ABC1235, company: [ APPLE ] },
{ id: ABC1236, company: [ APPLE ] } ] }, facet_counts: { 
facet_queries: { p_company:ucsf\n: 1 }, facet_fields: { company: [
APPLE, 4, ] }, facet_dates: {}, facet_ranges: {} }


 Thanks.
Kuchekar, Nilesh


On Thu, Jan 30, 2014 at 2:13 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello
 Do you mean setting
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
 you want to facet only returned page (rows) instead of full resultset
 (numFound) ?


 On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
 kuchekar.nil...@gmail.comwrote:

  Yeah it's a typo... I meant company:Apple
 
  Thanks
  Nilesh
 
   On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch arafa...@gmail.com
 
  wrote:
  
   On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar kuchekar.nil...@gmail.com
  wrote:
   company=Apple
   Did you mean company:Apple ?
  
   Otherwise, that could be the issue.
  
   Regards,
 Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com




AVISO: A informação contida neste e-mail, bem como em qualquer de seus anexos, 
é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinatário(s) acima 
referido(s), podendo conter informações sigilosas e/ou legalmente protegidas. 
Caso você não seja o destinatário desta mensagem, informamos que qualquer 
divulgação, distribuição ou cópia deste e-mail e/ou de qualquer de seus anexos 
é absolutamente proibida. Solicitamos que o remetente seja comunicado 
imediatamente, respondendo esta mensagem, e que o original desta mensagem e de 
seus anexos, bem como toda e qualquer cópia e/ou impressão realizada a partir 
destes, sejam permanentemente apagados e/ou destruídos. Informações adicionais 
sobre nossa empresa podem ser obtidas no site http://sobre.uol.com.br/.

NOTICE: The information contained in this e-mail and any attachments thereto is 
CONFIDENTIAL and is intended only for use by the recipient named herein and may 
contain legally privileged and/or secret information.
If you are not the e-mail´s intended recipient, you are hereby notified that 
any dissemination, distribution or copy of this e-mail, and/or any attachments 
thereto, is strictly prohibited. Please immediately notify the sender replying 
to the above mentioned e-mail address, and permanently delete and/or destroy 
the original and any copy of this e-mail and/or its attachments, as well as any 
printout thereof. Additional information about our company may be obtained 
through the site http://www.uol.com.br/ir/.


Adding DocValues in an existing field

2014-01-30 Thread yriveiro
Hi,

Can I add to an existing field the docvalue feature without wipe the actual?

The modification on the schema will be something like this:
field name=surrogate_id  type=tlong   indexed=true  stored=true 
multiValued=false /
field name=surrogate_id  type=tlong   indexed=true  stored=true 
multiValued=false  docValues=true/

I want use the actual data to reindex it again in the same collection but in
the process create the docvalues too, it's possible?

I'm using solr 4.6.1



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-DocValues-in-an-existing-field-tp4114462.html
Sent from the Solr - User mailing list archive at Nabble.com.


Geospatial clustering + zoom in/out help

2014-01-30 Thread Bojan Šmid
Hi,

I have an index with 300K docs with lat,lon. I need to cluster the docs
based on lat,lon for display in the UI. The user then needs to be able to
click on any cluster and zoom in (up to 11 levels deep).

I'm using Solr 4.6 and I'm wondering how best to implement this efficiently?

A bit more specific questions below.

I need to:

1) cluster data points at different zoom levels

2) click on a specific cluster and zoom in

3) be able to select a region (bounding box or polygon) and show clusters
in the selected area

What's the best way to implement this so that queries are fast?

What I thought I would try, but maybe there are better ways:

* divide the world in NxM large squares and then each of these squares into
4 more squares, and so on - 11 levels deep

* at index time figure out all squares (at all 11 levels) each data point
belongs to and index that info into 11 different fields: e.g.
id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
zoom3=square1_62_47_33 

* at search time, use field collapsing on zoomX field to get which docs
belong to which square on particular level

* calculate center point of each square (by calculating mean value of
positions for all points in that square) using StatsComponent (facet on
zoomX field, avg on lat and lon fields) - I would consider those squares as
separate clusters (one square is one cluster) and center points of those
squares as center points of clusters derived from them

I *think* the problem with this approach is that:

* there will be many unique fields for bigger zoom levels, which means
field collapsing / StatsComponent maaay not work fast enough

* clusters will not look very natural because I would have many clusters on
each zoom level and what are real geographical clusters would be
displayed as multiple clusters since their points would in some cases be
dispersed into multiple squares. But that may be OK

* a lot will depend on how the squares are calculated - linearly dividing
360 degrees by N to get equal size squares in degrees would produce
issues with real square sizes and counts of points in each of them


So I'm wondering if there is a better way?

Thanks,


  Bojan


Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread eShard
Hi,
My crawler uploads all the documents to Solr for indexing to a tomcat/temp
folder.  
Over time this folder grows so large that I run out of disk space.  
So, I wrote a bash script to delete the files and put it in the crontab.
However, if I delete the docs too soon, it doesn't get indexed; too late and
I run out of disk.
I'm still trying to find the right window...
So, (and this is probably a long shot)  I'm wondering if there's anything in
Solr that can delete these docs from /temp after they've been indexed...

Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes
Found it. In case anyone else cares, this appears to be the root issue:
https://issues.apache.org/jira/browse/SOLR-5128

Thanks again.


On 1/30/14, 9:01 AM, Jeff Wartes jwar...@whitepages.com wrote:


Work is underway towards a new mode where zookeeper is the ultimate
source of truth, and each node will behave accordingly to implement and
maintain that truth.  I can't seem to locate a Jira issue for it,
unfortunately.  It's possible that one doesn't exist yet, or that it has
an obscure title.  Mark Miller is the one who really understands the
full details, as he's a primary author of SolrCloud code.

Currently, what SolrCloud considers to be truth is dictated by both
zookeeper and an amalgamation of which cores each server actually has
present.  The collections API modifies both.  With an older config (all
current and future 4.x versions), the latter is in solr.xml.  If you're
using the new solr.xml format (available 4.4 and later, will be
mandatory in 5.0), it's done with Core Discovery.  Zookeeper has a list
of everything and coordinates the cluster state, but has no real control
over the cores that actually exist on each server.  When the two sources
of truth disagree, nothing happens to fix the situation, manual
intervention is required.


Thanks Shawn, this was exactly the confirmation I was looking for. I think
I have a much better understanding now.

The takeaway I have is that SolrCloud¹s current automation assumes
relatively static clusters, and that if I want anything like dynamic
scaling, I¹m going to have to write my own tooling to add nodes safely.

Fortunately, it appears that the necessary CoreAdmin commands don¹t need
much besides the collection name, so it smells like a simple thing to
query zookeeper¹s /collections path (or clusterstate.json) and issue GET
requests accordingly when I spin up a new node.

If you (or anyone) does happen to recall a reference to the work you
alluded to, I¹d certainly be interested. I googled around myself for a few
minutes, but haven¹t found anything so far.





JVM heap constraints and garbage collection

2014-01-30 Thread Joseph Hagerty
Greetings esteemed Solr-ites,

I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.

Since my average load during peak hours is becoming quite high, and since
I'm finally starting to notice a little bit of performance degradation and
intermittent errors (e.g. Solr returned response 0 on perfectly valid
reads during load spikes), I think it's time to tune my Slave box before
things get out of control.

In particular, *I am curious how others are tuning their JVM heap
constraints (xms, xms, etc.) and garbage collection (parallel or
concurrent) to meet the needs of Solr*. I am using the Sun JVM Version 6,
not the fancy third party offerings.

Some more info, FWIW:

- Average document size in my index is probably around 6k
- Using CentOS
- Master-Slave setup. Master gets all the writes, Slave gets all the read
requests. It is the *Slave* that is suffering-- the Master seems fine.
- The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
- DaemonThreads skyrocket during the aforementioned load spikes

Thanks for reading, and to the devs: thanks for an excellent product.

-- 
- Joe


TemplateTransformer returns null values

2014-01-30 Thread tom
Hi,
I am trying a simple transformer on data input using DIH, Solr 4.6. when I
run the below query while DIH I get null values for new_url. what is wrong?
even tried with ${document_solr.id}

the name is 

data-config.xml:

entity name=document_solr

transformer=TemplateTransformer,LogTransformer

query=select DOC_IDN as id, BILL_IDN as bill_id from document_solr
logTemplate=The name is ${document_solr.DOC_IDN} logLevel=debug 

field column=DOC_IDN name=id /
field column=BILL_IDN name=bill_id /   

field column=new_url  template=${document_solr.DOC_IDN} /
/entity



below stack trace:
8185946 [Thread-29] INFO  org.apache.solr.search.SolrIndexSearcher  û
Opening Searcher@5a5f4cb7 realtime
8185960 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource 
û Creating a connection for entity document_solr with URL:
jdbc:oracle:thin:@vluedb01:1521:iedwdev
8186225 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource 
û Time taken forgetConnection():265
8186226 [Thread-29] DEBUG org.apache.solr.handler.dataimport.JdbcDataSource 
û Executing SQL: select DOC_IDN as id, BILL_IDN as bill_id from
document_solr
8186291 [Thread-29] TRACE org.apache.solr.handler.dataimport.JdbcDataSource 
û Time taken for sql :64
8186301 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer 
û The name is
8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer 
û The name is
8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer 
û The name is


`Tom




--
View this message in context: 
http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread Alexandre Rafalovitch
Well, it's your crawler that submits them, so the crawler should know
when to delete them.

If you want some sort of trigger from Solr, look at postCommit hook
defined in solrconfig.xml. Though all that gives you is timing, not
which documents to deal with.

You could probably also plug into UpdateRequestProcessor chain, where
you do have access to the document content.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 3:40 AM, eShard zim...@yahoo.com wrote:
 Hi,
 My crawler uploads all the documents to Solr for indexing to a tomcat/temp
 folder.
 Over time this folder grows so large that I run out of disk space.
 So, I wrote a bash script to delete the files and put it in the crontab.
 However, if I delete the docs too soon, it doesn't get indexed; too late and
 I run out of disk.
 I'm still trying to find the right window...
 So, (and this is probably a long shot)  I'm wondering if there's anything in
 Solr that can delete these docs from /temp after they've been indexed...

 Thank you,




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: TemplateTransformer returns null values

2014-01-30 Thread Alexandre Rafalovitch
I think you have double mapping there:
*) select DOC_IDN as id
*) field column=DOC_IDN name=id /
Both are mapping DOC_IDN to id, possibly with second overriding the
first (or shadowing).

Try not doing 'as' part in select and then look for .id . Or keep the
'as' part as just have explicit field definition in the second one:
field column=id /

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 6:29 AM, tom praveen...@yahoo.com wrote:
 Hi,
 I am trying a simple transformer on data input using DIH, Solr 4.6. when I
 run the below query while DIH I get null values for new_url. what is wrong?
 even tried with ${document_solr.id}

 the name is

 data-config.xml:

 entity name=document_solr
 transformer=TemplateTransformer,LogTransformer
 query=select DOC_IDN as id, BILL_IDN as bill_id from document_solr
 logTemplate=The name is ${document_solr.DOC_IDN} logLevel=debug 

 field column=DOC_IDN name=id /
 field column=BILL_IDN name=bill_id /

 field column=new_url  template=${document_solr.DOC_IDN} /
 /entity



 below stack trace:
 8185946 [Thread-29] INFO  org.apache.solr.search.SolrIndexSearcher  û
 Opening Searcher@5a5f4cb7 realtime
 8185960 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource
 û Creating a connection for entity document_solr with URL:
 jdbc:oracle:thin:@vluedb01:1521:iedwdev
 8186225 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource
 û Time taken forgetConnection():265
 8186226 [Thread-29] DEBUG org.apache.solr.handler.dataimport.JdbcDataSource
 û Executing SQL: select DOC_IDN as id, BILL_IDN as bill_id from
 document_solr
 8186291 [Thread-29] TRACE org.apache.solr.handler.dataimport.JdbcDataSource
 û Time taken for sql :64
 8186301 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer
 û The name is
 8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer
 û The name is
 8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer
 û The name is


 `Tom




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: TemplateTransformer returns null values

2014-01-30 Thread tom
Thanks Alexandre for quick response,

I tried both the ways but still no luck null values, anything I am doing
fundamentally wrong?
 
query=select DOC_IDN, BILL_IDN from document_fact 
   field column=DOC_IDN name=id / 

and

query=select DOC_IDN as id ,BILL_IDN as bill_id from document_fact 
   field column=id /




--
View this message in context: 
http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539p4114544.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting documents by categorical preferences

2014-01-30 Thread Amit Nithian
Chris,

Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this
as I have a writeup pretty much ready to go.

Cheers
Amit


On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : The initial results seem to be kinda promising... of course there are
 many
 : more optimizations I could do like decay user ratings over time to
 indicate
 : that preferences decay over time so a 5 rating a year ago doesn't count
 as
 : much as a 5 rating today.
 :
 : Hope this helps others. I'll open source what I have soon and post back.
 If
 : there is feedback or other thoughts let me know!

 Hey Amit,

 Glad to hear your user based boosting experiments are paying off.  I would
 definitely love to see a more detailed writeup down the road showing off
 how it affects your final user metrics -- or perhaps even give a session
 on your technique at ApacheCon?


 http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


 -Hoss
 http://www.lucidworks.com/



Re: JVM heap constraints and garbage collection

2014-01-30 Thread Shawn Heisey

On 1/30/2014 3:20 PM, Joseph Hagerty wrote:

I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.


snip


- The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM


One detail that you did not provide was how much of your 7.5GB RAM you 
are allocating to the Java heap for Solr, but I actually don't think I 
need that information, because for your index size, you simply don't 
have enough. If you're sticking with Amazon, you'll want one of the 
instances with at least 30GB of RAM, and you might want to consider more 
memory than that.


An ideal RAM size for Solr is equal to the size of on-disk data plus the 
heap space used by Solr and other programs.  This means that if your 
java heap for Solr is 4GB and there are no other significant programs 
running on the same server, you'd want a minimum of 34GB of RAM for an 
ideal setup with your index.  4GB of that would be for Solr itself, the 
remainder would be for the operating system to fully cache your index in 
the OS disk cache.


Depending on your query patterns and how your schema is arranged, you 
*might* be able to get away as little as half of your index size just 
for the OS disk cache, but it's better to make it big enough for the 
whole index, plus room for growth.


http://wiki.apache.org/solr/SolrPerformanceProblems

Many people are *shocked* when they are told this information, but if 
you think about the relative speeds of getting a chunk of data from a 
hard disk vs. getting the same information from memory, it's not all 
that shocking.


Thanks,
Shawn



Re: TemplateTransformer returns null values

2014-01-30 Thread Alexandre Rafalovitch
Hmm,

Try the variable reference without scope: ${id}. I can't remember if
the scope is required only for higher level items. It might also be
worth writing a very basic All fields logger to see what your
in-progress map looks like.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 7:10 AM, tom praveen...@yahoo.com wrote:
 Thanks Alexandre for quick response,

 I tried both the ways but still no luck null values, anything I am doing
 fundamentally wrong?

 query=select DOC_IDN, BILL_IDN from document_fact 
field column=DOC_IDN name=id /

 and

 query=select DOC_IDN as id ,BILL_IDN as bill_id from document_fact 
field column=id /




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539p4114544.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Alexandre Rafalovitch
Hi Nilesh,

I am not sure the faceting code does what you think it does. However,
there are different options and you can experiment with whichever one
is best for you. They are controlled by the facet.method parameter:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 12:51 AM, Felipe Dantas de Souza Paiva
cad_fpa...@uolinc.com wrote:
 Hi Nilesh,

 maybe Facetting is not the right thing for you, because 'faceting is the 
 arrangement of search results into categories based on indexed terms' 
 (https://cwiki.apache.org/confluence/display/solr/Faceting).

 Perhaps you could use Result Clustering 
 (https://cwiki.apache.org/confluence/display/solr/Result+Clustering), for   
 the clustering algorithm is applied to the search result of each single query.

 Hope this helps.

 Felipe Dantas de Souza Paiva
 
 De: Kuchekar [kuchekar.nil...@gmail.com]
 Enviado: quinta-feira, 30 de janeiro de 2014 15:35
 Para: solr-user@lucene.apache.org
 Assunto: Re: Regarding Solr Faceting on the query response.

 Hi Mikhail,

  I would like my faceting to run only on my resultset
 returned as in only on numFound, rather than the whole index.

 In the example, even when I specify the query 'company:Apple' .. it gives
 me faceted results for other companies. This means that it is querying
 against the whole index, rather than just the result set.

 Using facet.mincount=1 will give me faceted values which are greater than
 1, but that will again to retrieve all the distinct values (Apple, Bose,
 Chevron, ..Oracle..) of facet field (company) query the whole index.

 What I would like to do is ... facet only on the resultset.

 i.e. my query (q= company:Apple AND technologies:java ) should return, only
 the facet details about 'Apple' since that is only present in the results
 set. But it provides me the list of other Company Names ... which makes me
 believe that it is querying the whole index to get the distinct value for
 the company..

 docs: [ { id: ABC123, company: [ APPLE ] },
 { id: ABC1234, company: [ APPLE ] },
 { id: ABC1235, company: [ APPLE ] },
 { id: ABC1236, company: [ APPLE ] } ] }, facet_counts: { 
 facet_queries: { p_company:ucsf\n: 1 }, facet_fields: { company: [
 APPLE, 4, ] }, facet_dates: {}, facet_ranges: {} }


  Thanks.
 Kuchekar, Nilesh


 On Thu, Jan 30, 2014 at 2:13 AM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Hello
 Do you mean setting
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
 you want to facet only returned page (rows) instead of full resultset
 (numFound) ?


 On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
 kuchekar.nil...@gmail.comwrote:

  Yeah it's a typo... I meant company:Apple
 
  Thanks
  Nilesh
 
   On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch arafa...@gmail.com
 
  wrote:
  
   On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar kuchekar.nil...@gmail.com
  wrote:
   company=Apple
   Did you mean company:Apple ?
  
   Otherwise, that could be the issue.
  
   Regards,
 Alex.
  
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com


 

 AVISO: A informação contida neste e-mail, bem como em qualquer de seus 
 anexos, é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinatário(s) 
 acima referido(s), podendo conter informações sigilosas e/ou legalmente 
 protegidas. Caso você não seja o destinatário desta mensagem, informamos que 
 qualquer divulgação, distribuição ou cópia deste e-mail e/ou de qualquer de 
 seus anexos é absolutamente proibida. Solicitamos que o remetente seja 
 comunicado imediatamente, respondendo esta mensagem, e que o original desta 
 mensagem e de seus anexos, bem como toda e qualquer cópia e/ou impressão 
 realizada a partir destes, sejam permanentemente apagados e/ou destruídos. 
 Informações adicionais sobre nossa empresa podem ser obtidas no site 
 http://sobre.uol.com.br/.

 NOTICE: The information contained in this e-mail and any attachments thereto 
 is CONFIDENTIAL and is intended only for use by the recipient named herein 
 and may contain legally privileged and/or secret information.
 If you are not the e-mail´s intended recipient, you are hereby notified that 
 any dissemination, distribution or copy of this e-mail, and/or any 
 attachments thereto, is strictly prohibited.