Copy in multivalued field and faceting

2011-12-14 Thread darul
Hello,

Field for this scenario is Title and contains several words.

For a specific query, I would like get the top ten words by frequency in a
specific field.

My idea was the following:

- Title in my schema is stored/indexed in a specific field
- A copyField copy Title field content into a multivalued field. If my
multivalue field use a specific tokenizer which split words, does it fill
each word in each multivalued items ?
- If so, using faceting on this multivalue field, I will get top ten words,
correct ?

Example:

1) Title : this is my title
2) CopyField Title to specific multivalue field F1
3) F1 contains : {this, is, my, title}

My english

Thanks,

Jul

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Copy-in-multivalued-field-and-faceting-tp3584819p3584819.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting and searching on a field

2011-12-14 Thread Swapna Vuppala
Hi,

I have a field in Solr that I want to be sortable. But at the same time, I want 
to be able to search on that field without using wild cards. Is that possible ?

For example, if I have a field Subject with a value This is my first 
subject, searching in solr as subject:first should give me this result. And 
the field Subject should be sortable.
I have read about the option of copying this to a different field, using one 
for searching by tokenizing, and one for sorting. But am looking for to be able 
to do both things on the same field.

Can someone please point to a way to achieve this ?

Thanks and Regards,
Swapna.

Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses


Possible to adjust FieldNorm?

2011-12-14 Thread cnyee
Hi, 

Is it possible to adjust FieldNorm? I have a scenario where the search is
not producing the desired result because of fieldNorm:

Search terms: coaching leadership
Record 1: name=Ask the Coach, desc=...,...
Record 2: name=Coaching as a Leadership Development Tool Part 1,
desc=...,...

Record 1 was scored higher than record 2, despite record 2 has two matches.
The scoring is given below:

Record 1:
  1.2878088 = (MATCH) weight(name_en:coach in 6430), product of:
0.20103075 = queryWeight(name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
6.406029 = (MATCH) fieldWeight(name_en:coach in 6430), product of:
  1.0 = tf(termFreq(name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  1.0 = fieldNorm(field=name_en, doc=6430)

Record 2:
  0.56341636 = (MATCH) weight(name_en:coach in 4744), product of:
0.20103075 = queryWeight(name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
2.8026378 = (MATCH) fieldWeight(name_en:coach in 4744), product of:
  1.0 = tf(termFreq(crs_name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.4375 = fieldNorm(field=name_en, doc=4744)

Many thanks in advance.

Chut



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Possible-to-adjust-FieldNorm-tp3584998p3584998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Copy in multivalued field and faceting

2011-12-14 Thread yunfei wu
Sounds like working by carefully choosing tokenizer, and then use
facet.sort and facet.limit parameters to do faceting.

Will see any expert's comments on this one.

Yunfei


On Wed, Dec 14, 2011 at 12:26 AM, darul daru...@gmail.com wrote:

 Hello,

 Field for this scenario is Title and contains several words.

 For a specific query, I would like get the top ten words by frequency in a
 specific field.

 My idea was the following:

 - Title in my schema is stored/indexed in a specific field
 - A copyField copy Title field content into a multivalued field. If my
 multivalue field use a specific tokenizer which split words, does it fill
 each word in each multivalued items ?
 - If so, using faceting on this multivalue field, I will get top ten words,
 correct ?

 Example:

 1) Title : this is my title
 2) CopyField Title to specific multivalue field F1
 3) F1 contains : {this, is, my, title}

 My english

 Thanks,

 Jul

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Copy-in-multivalued-field-and-faceting-tp3584819p3584819.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Large RDBMS dataset

2011-12-14 Thread Finotti Simone
Hello,
I have a very large dataset ( 1 Mrecords) on the RDBMS which I want my Solr 
application to pull data from.

Problem is that the document fields which I have to index aren't in the same 
table, but I have to join records with two other tables. Well, in fact they are 
views, but I don't think that this makes any difference.

That's the data import handler that I've actually written:

?xml version=1.0?
dataConfig
  dataSource type=JdbcDataSource driver=net.sourceforge.jtds.jdbc.Driver 
url=jdbc:jtds:sqlserver://YSQLDEV01BLQ/YooxProcessCluster1 
instance=SVCSQLDEV /
  document name=Products
entity name=fd query=SELECT * FROM clust_w_fast_dump ORDER BY 
endeca_id;
  entity name=fd2 query=SELECT macrocolor_id, color_descr, gsize_descr, 
size_descr FROM clust_w_fast_dump2_ByMarkets WHERE endeca_id='${fd.Endeca_ID}' 
ORDER BY endeca_id;/
  entity name=cpd query=SELECT DepartmentCode, Ranking, 
DepartmentPriceRangeCode FROM clust_w_CatalogProductsDepartments_ByMarket WHERE 
endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;/
  entity name=env query=SELECT Environment FROM clust_w_Environment 
WHERE endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;/
/entity
  /document
/dataConfig

It works, but it takes 1'38 to parse 100 records: it means 1 rec/s! That means 
that digesting the whole dataset would take 1 Ms (= 12 days).

The problem is that for each record in fd, Solr makes three distinct SELECT 
on the other three tables. Of course, this is absolutely inefficient.

Is there a way to have Solr loading every record in the four tables and join 
them when they are already loaded in memory?

TIA


Solr Search Across Multiple Cores not working when quering on specific field

2011-12-14 Thread ravicv
I have two Solr cores. 
Core0 and core1

Both cores are having same schema and configuration.
after indexing both cores data is retried from both cores individually

http://localhost:8983/solr/core0/select?q=fieldName:%22United%22
http://localhost:8983/solr/core1/select?q=fieldName:%22United%22

*Searching on both cores*

This url is working
http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=iPo*

but when i searched on a specific field than it is not working
http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=mnemonic_value:United;

Why distributed search is not working when i search on a particular field.?

Please help

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-Across-Multiple-Cores-not-working-when-quering-on-specific-field-tp3585013p3585013.html
Sent from the Solr - User mailing list archive at Nabble.com.


Getting Error while running Query

2011-12-14 Thread Sanket Shah
Hi All,

   I am sorry If I have sent this email at wrong list. If it is then
kindly let me know! 

   I am using Alfresco 4.0 which is having SOLR for Lucene. I am able to
see the SOLR page and also able to fire queris But they do not return
any results and sometimes giving errors. I am using SOLR UI
(https://localhost:8443/solr/alfresco/admin/ ).

 

 For example: 

When I search for @cm\:name:sanket It shows me some xml result
which is as under.

response

 

lst name=responseHeader

 int name=status0/int

 int name=QTime0/int

 lst name=params

  str name=explainOther/

  str name=indenton/str

  str name=hl.fl/

  str name=wtstandard/str

  str name=hlon/str

  str name=rows10/str

  str name=version2.2/str

  str name=fl*,score/str

  str name=debugQueryon/str

  str name=start0/str

  str name=q@cm\:cm:sanket/str

  str name=qtstandard/str

  str name=fq/

 /lst

/lst

result name=response numFound=0 start=0 maxScore=0.0/

lst name=highlighting/

lst name=debug

 str name=rawquerystring@cm\:cm:sanket/str

 str name=querystring@cm\:cm:sanket/str

 str name=parsedquery@cm:cm:sanket/str

 str name=parsedquery_toString@cm:cm:sanket/str

 lst name=explain/

 str name=QParserLuceneQParser/str

 arr name=filter_queries

 

Also sometime I get exception like 

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot
parse '@cm:name:sanket': Encountered  : :  at line 1, column 8.
Was expecting one of: EOF AND ... OR ... NOT bla..bla ..bla..

 

Please help me out.

 

Thanking You!

Sanket Shah

 



Re: Getting Error while running Query

2011-12-14 Thread Gora Mohanty
On Wed, Dec 14, 2011 at 5:00 PM, Sanket Shah sanket.s...@cignex.com wrote:
 Hi All,

   I am sorry If I have sent this email at wrong list. If it is then
 kindly let me know!

   I am using Alfresco 4.0 which is having SOLR for Lucene. I am able to
 see the SOLR page and also able to fire queris But they do not return
 any results and sometimes giving errors. I am using SOLR UI
 (https://localhost:8443/solr/alfresco/admin/ ).
[...]
 result name=response numFound=0 start=0 maxScore=0.0/
[...]

The 'numfound=0' indicates that no documents matched
the search string. Maybe the indexing is not done properly:
Could you try searching for *:* from the Solr admin. interface?
This should return all documents indexed into Solr.

 Also sometime I get exception like

 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot
 parse '@cm:name:sanket': Encountered  : :  at line 1, column 8.
 Was expecting one of: EOF AND ... OR ... NOT bla..bla ..bla..

The colon, :, is a special character for Solr, and needs to be escaped, as
you did in your first example. Please see
http://wiki.apache.org/solr/SolrQuerySyntax#NOTE:_URL_Escaping_Special_Characters

Regards,
Gora


RE: Getting Error while running Query

2011-12-14 Thread Sanket Shah
Thanks Gora for your reply.
   How can I come to know that alfresco or share is running n SOLR? I meant, 
when I login, clicking some folders or creating or uploading new files. How can 
I know that it is being done by SOLR and not by the old way before alfresco 
4.0. 

I have put the following things in my glob.prop file.
### Solr indexing ###
index.subsystem.name=solr
dir.keystore=${dir.root}/keystore
solr.port.ssl=8443
## newly added.
solr.host=localhost
solr.port=8080
# default keystores location
dir.keystore=classpath:alfresco/keystore

encryption.ssl.keystore.location=${dir.keystore}/ssl.keystore
encryption.ssl.keystore.provider=
encryption.ssl.keystore.type=JCEKS
encryption.ssl.keystore.keyMetaData.location=${dir.keystore}/ssl-keystore-passwords.properties

encryption.ssl.truststore.location=${dir.keystore}/ssl.truststore
encryption.ssl.truststore.provider=
encryption.ssl.truststore.type=JCEKS
encryption.ssl.truststore.keyMetaData.location=${dir.keystore}/ssl-truststore-passwords.properties

thanking you!





-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Wednesday, December 14, 2011 5:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Getting Error while running Query

On Wed, Dec 14, 2011 at 5:00 PM, Sanket Shah sanket.s...@cignex.com wrote:
 Hi All,

   I am sorry If I have sent this email at wrong list. If it is then
 kindly let me know!

   I am using Alfresco 4.0 which is having SOLR for Lucene. I am able to
 see the SOLR page and also able to fire queris But they do not return
 any results and sometimes giving errors. I am using SOLR UI
 (https://localhost:8443/solr/alfresco/admin/ ).
[...]
 result name=response numFound=0 start=0 maxScore=0.0/
[...]

The 'numfound=0' indicates that no documents matched
the search string. Maybe the indexing is not done properly:
Could you try searching for *:* from the Solr admin. interface?
This should return all documents indexed into Solr.

 Also sometime I get exception like

 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot
 parse '@cm:name:sanket': Encountered  : :  at line 1, column 8.
 Was expecting one of: EOF AND ... OR ... NOT bla..bla ..bla..

The colon, :, is a special character for Solr, and needs to be escaped, as
you did in your first example. Please see
http://wiki.apache.org/solr/SolrQuerySyntax#NOTE:_URL_Escaping_Special_Characters

Regards,
Gora


Use Solr to process/analyze docs without indexing

2011-12-14 Thread tesnick
Hello,

I would use Solr to analyze / process documents using stemming analyzers,
stopwordsfilters, etc. and then return the results instead of indexing.
There is already some api service out-of-box to do this? It would be easy to
implement? 

I'm thinking of using a RequestHandler to receive the documents, process
them with analyzers specified in the schema.xml and return the results
without going through the index... 

is this possible? someone has done?

Thanks!!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-Solr-to-process-analyze-docs-without-indexing-tp3585263p3585263.html
Sent from the Solr - User mailing list archive at Nabble.com.


Shutdown hook issue

2011-12-14 Thread Adolfo Castro Menna
Hi All,

I'm experiencing some issues with solr. From time to time solr goes down.
After checking the logs, I see that it's due to the shutdown hook being
triggered.
I still don't know why it happens but it seems to be related to solr being
idle. Does anyone have any insights?

I'm using Ubuntu 10.04.2 LTS and solr 3.1.0 running on Jetty (default
configuration). Solr runs in background, so it doesn't seem to be related
to a SIGINT unless ubuntu is sending it for some odd reason.

Thanks,
Adolfo.


Re: Use Solr to process/analyze docs without indexing

2011-12-14 Thread Ahmet Arslan

 I would use Solr to analyze / process documents using
 stemming analyzers,
 stopwordsfilters, etc. and then return the results instead
 of indexing.
 There is already some api service out-of-box to do this? It
 would be easy to
 implement? 
 
 I'm thinking of using a RequestHandler to receive the
 documents, process
 them with analyzers specified in the schema.xml and return
 the results
 without going through the index... 
 
 is this possible? someone has done?

May be this?  http://wiki.apache.org/solr/AnalysisRequestHandler


Faceting with null dates

2011-12-14 Thread kenneth hansen

hello,I have the following faceting parameters, which gives me some unwanted 
non-null dates in the result set. Is there a way to query the index to not give 
me non-null dates in return? I.e. I would like to get a result set which 
contains only non-nulls on the validToDate, but as I am faceting on non-null 
values on the validToDate, I would like to get the non-null values in the 
faceting result. This response example below gives me 10 results, with 7 
non-null validToDates. What I would like to get is 3 results and 7 non-null 
validToDate facets. And as I write this, I start to wonder if this is possible 
at all as the facets are dependent on the result set and that this might be 
better to handle in the application layer by just extracting 10-7=3...
Any help would be appreciated!
br,ken
codestr name=facettrue/strstr 
name=f.validToDate.facet.range.startNOW/DAYS-4MONTHS/strstr 
name=facet.mincount1/strstr name=q(*:*)/strarr 
name=facet.rangestrvalidToDate/str/arrstr 
name=facet.range.endNOW/DAY+1DAY/strstr 
name=facet.range.gap+1MONTH/str/code

result name=response numFound=10 start=0lst name=facet_countslst 
name=facet_ranges  lst name=validToDate  lst name=counts  int 
name=2011-11-14T00:00:00Z7/int

  

Re: Too many connections in CLOSE_WAIT state on master solr server

2011-12-14 Thread Erick Erickson
I'm guessing (and it's just a guess) that what's happening is that
the container is queueing up your requests while waiting
for the other connections to close, so Mikhail's suggestion
seems like a good idea.

Best
Erick

On Wed, Dec 14, 2011 at 12:28 AM, samarth s
samarth.s.seksa...@gmail.com wrote:
 The updates to the master are user driven, and are needed to be
 visible quickly. Hence, the high frequency of replication. It may be
 that too many replication requests are being handled at a time, but
 why should that result in half closed connections?

 On Wed, Dec 14, 2011 at 2:47 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Replicating 40 cores every 20 seconds is just *asking* for trouble.
 How often do your cores change on the master? How big are
 they? Is there any chance you just have too many cores replicating
 at once?

 Best
 Erick

 On Tue, Dec 13, 2011 at 3:52 PM, Mikhail Khludnev
 mkhlud...@griddynamics.com wrote:
 You can try to reuse your connections (prevent them from closing) by
 specifying  
 -Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN
 in jvm startup params. At client JVM!. Number should be chosen considering
 the number of connection you'd like to keep alive.

 Let me know if it works for you.

 On Tue, Dec 13, 2011 at 2:57 PM, samarth s 
 samarth.s.seksa...@gmail.comwrote:

 Hi,

 I am using solr replication and am experiencing a lot of connections
 in the state CLOSE_WAIT at the master solr server. These disappear
 after a while, but till then the master solr stops responding.

 There are about 130 open connections on the master server with the
 client as the slave m/c and all are in the state CLOSE_WAIT. Also, the
 client port specified on the master solr server netstat results is not
 visible in the netstat results on the client (slave solr) m/c.

 Following is my environment:
 - 40 cores in the master solr on m/c 1
 - 40 cores in the slave solr on m/c 2
 - The replication poll interval is 20 seconds.
 - Replication part in solrconfig.xml in the slave solr:
 requestHandler name=/replication class=solr.ReplicationHandler 
           lst name=slave

                   !--fully qualified url for the replication handler
 of master--
                   str name=masterUrl$mastercorename/replication/str

                   !--Interval in which the slave should poll master
 .Format is HH:mm:ss . If this is absent slave does not poll
 automatically.
                                But a fetchindex can be triggered from
 the admin or the http API--
                   str name=pollInterval00:00:20/str
                   !-- The following values are used when the slave
 connects to the master to download the index files.
                               Default values implicitly set as 5000ms
 and 1ms respectively. The user DOES NOT need to specify
                               these unless the bandwidth is extremely
 low or if there is an extremely high latency--
                   str name=httpConnTimeout5000/str
                   str name=httpReadTimeout1/str
          /lst
   /requestHandler

 Thanks for any pointers.

 --
 Regards,
 Samarth




 --
 Sincerely yours
 Mikhail Khludnev
 Developer
 Grid Dynamics
 tel. 1-415-738-8644
 Skype: mkhludnev
 http://www.griddynamics.com
  mkhlud...@griddynamics.com



 --
 Regards,
 Samarth


Re: Copy in multivalued field and faceting

2011-12-14 Thread Erick Erickson
I don't quite understand what you're trying to do. MultiValued is
a bit misleading. All it means is that you can add the same
field multiple times to a document, i.e. (XML example)
doc
  add name=fieldvalue1 value2 value3/add
  add name=fieldvalue4 value5 value6/add
/doc

will succeed if field is multiValued and fail if not.

This will work if field is NOT multiValued:
doc
  add name=fieldvalue1 value2 value3 value4 value5 value6/add
/doc

and, assuming WhitespaceTokenizer, the field field will contain
the exact same tokens. The only difference *might* be the
offsets, but don't worry about that quite yet, all it would really
affect is phrase queries.

With that as a preface, I don't see why copyField has anything
to do with your problem, you'd get the same results faceting
on the title field, assuming identical analyzer chains.

Faceting on a text field is iffy, it can be quite expensive. What you'd
get in the end, though, is a list of the top words in your corpus for
that field counted from the documents that satisfied the query. Which
sounds like what you're after.

Best
Erick

On Wed, Dec 14, 2011 at 4:59 AM, yunfei wu yunfei...@gmail.com wrote:
 Sounds like working by carefully choosing tokenizer, and then use
 facet.sort and facet.limit parameters to do faceting.

 Will see any expert's comments on this one.

 Yunfei


 On Wed, Dec 14, 2011 at 12:26 AM, darul daru...@gmail.com wrote:

 Hello,

 Field for this scenario is Title and contains several words.

 For a specific query, I would like get the top ten words by frequency in a
 specific field.

 My idea was the following:

 - Title in my schema is stored/indexed in a specific field
 - A copyField copy Title field content into a multivalued field. If my
 multivalue field use a specific tokenizer which split words, does it fill
 each word in each multivalued items ?
 - If so, using faceting on this multivalue field, I will get top ten words,
 correct ?

 Example:

 1) Title : this is my title
 2) CopyField Title to specific multivalue field F1
 3) F1 contains : {this, is, my, title}

 My english

 Thanks,

 Jul

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Copy-in-multivalued-field-and-faceting-tp3584819p3584819.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: edismax phrase matching with a non-word char inbetween

2011-12-14 Thread Erick Erickson
What I think is happening here is that WordDelimiterFilterFactory is
throwing away your non-alpha-numeric characters. You can see
this in admin/analysis, which I've found *extremely* helpful when
faced with this kind of question.

Best
Erick

On Tue, Dec 13, 2011 at 10:37 AM, Robert Brown r...@intelcompute.com wrote:
 I have a field which is indexed and queried as follows:

 tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.SynonymFilterFactory synonyms=text-synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/



 When searching for street work (with quotes), i'm getting matches and
 highlighting on things like...


 ...Oxford emStreet/em (emWork/em Experience)...


 why is this happening, and what can I do to stop it?

 I've set int name=qs0/int in my config to try and avert this sort of
 behaviour, am I correct in thinking that this is used to ensure there are no
 words in-between the phrase words?



Re: Use Solr to process/analyze docs without indexing

2011-12-14 Thread tesnick
Thanks iorixxx!

I think that's exactly what I was looking.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Use-Solr-to-process-analyze-docs-without-indexing-tp3585263p3585522.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting Error while running Query

2011-12-14 Thread Gora Mohanty
On Wed, Dec 14, 2011 at 5:34 PM, Sanket Shah sanket.s...@cignex.com wrote:
 Thanks Gora for your reply.
   How can I come to know that alfresco or share is running n SOLR? I meant, 
 when I login, clicking some folders or creating or uploading new files. How 
 can I know that it is being done by SOLR and not by the old way before 
 alfresco 4.0.
[...]

I am sorry, but while we do use Alfresco, we have not yet
had occasion to look at integrating Solr with Alfresco. So,
I would be unable to help you here.

I presume that you have tried looking at the resources
thrown up by searching Google, e.g.,
http://wiki.alfresco.com/wiki/Alfresco_And_SOLR

Regards,
Gora


Re: Large RDBMS dataset

2011-12-14 Thread Martin Koch
Instead of handling it from within solr, I'd suggest writing an external
application (e.g. in python using pysolr) that wraps the (fast) SQL query
you like. Then retrieve a batch of documents, and write them to solr. For
extra speed, don't commit until you're done.

/Martin

On Wed, Dec 14, 2011 at 11:18 AM, Finotti Simone tech...@yoox.com wrote:

 Hello,
 I have a very large dataset ( 1 Mrecords) on the RDBMS which I want my
 Solr application to pull data from.

 Problem is that the document fields which I have to index aren't in the
 same table, but I have to join records with two other tables. Well, in fact
 they are views, but I don't think that this makes any difference.

 That's the data import handler that I've actually written:

 ?xml version=1.0?
 dataConfig
  dataSource type=JdbcDataSource
 driver=net.sourceforge.jtds.jdbc.Driver
 url=jdbc:jtds:sqlserver://YSQLDEV01BLQ/YooxProcessCluster1
 instance=SVCSQLDEV /
  document name=Products
entity name=fd query=SELECT * FROM clust_w_fast_dump ORDER BY
 endeca_id;
  entity name=fd2 query=SELECT macrocolor_id, color_descr,
 gsize_descr, size_descr FROM clust_w_fast_dump2_ByMarkets WHERE
 endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;/
  entity name=cpd query=SELECT DepartmentCode, Ranking,
 DepartmentPriceRangeCode FROM clust_w_CatalogProductsDepartments_ByMarket
 WHERE endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;/
  entity name=env query=SELECT Environment FROM clust_w_Environment
 WHERE endeca_id='${fd.Endeca_ID}' ORDER BY endeca_id;/
/entity
  /document
 /dataConfig

 It works, but it takes 1'38 to parse 100 records: it means 1 rec/s! That
 means that digesting the whole dataset would take 1 Ms (= 12 days).

 The problem is that for each record in fd, Solr makes three distinct
 SELECT on the other three tables. Of course, this is absolutely inefficient.

 Is there a way to have Solr loading every record in the four tables and
 join them when they are already loaded in memory?

 TIA



Re: Solr using very high I/O

2011-12-14 Thread Martin Koch
Do you commit often? If so, try committing less often :)

/Martin

On Wed, Dec 7, 2011 at 12:16 PM, Adrian Fita adrian.f...@gmail.com wrote:

 Hi. I experience an issue where Solr is using huge ammounts of I/O.
 Basically it uses the whole HDD continously, leaving nothing to the
 other processes. Solr is called by a script which continously indexes
 some files.

 The index has around 800MB and I can't understand why it could trash
 the HDD so much.

 I could use some help on how to optimize Solr so it doesn't use so much
 I/O.

 Thank you.
 --
 Fita Adrian



Using LocalParams in StatsComponent to create a price slider?

2011-12-14 Thread Mark Schoy
Hi,

I'm using the StatsComponent to receive to lower and upper bounds of a
price field to create a price slider.
If someone sets the price range to $100-$200 I have to add a filter to
the query. But then the lower and upper bound are calculated of the
filtered result.

Is it possible to use LocalParams (like for facets) to ignore a specific filter?

Thanks.

Mark


Re: Large RDBMS dataset

2011-12-14 Thread Gora Mohanty
On Wed, Dec 14, 2011 at 3:48 PM, Finotti Simone tech...@yoox.com wrote:
 Hello,
 I have a very large dataset ( 1 Mrecords) on the RDBMS which I want my Solr 
 application to pull data from.
[...]

 It works, but it takes 1'38 to parse 100 records: it means 1 rec/s! That 
 means that digesting the whole dataset would take 1 Ms (= 12 days).

Depending on the size of the data that you are pulling from
the database, 1M records is not really that large a number.
We were doing ~75GB of stored data from ~7million records
in about 9h, including quite complicated transfomers. I would
imagine that there is much room for improvement in your case
also. Some notes on this:
* If you have servers to throw at the problem, and a sensible
  way to shard your RDBMS data, use parallel indexing to
  multiple Solr cores, maybe on multiple servers, followed by
  a merge. In our experience, given enough RAM and adequate
  provisioning of database servers, indexing speed scales linearly
  with the total no. of cores.
* Replicate your database, manually if needed. Look at the load
  on a database server during the indexing process, and provision
  enough database servers to match the no. of Solr indexing servers.
* This point is leading into flamewar territory, but consider switching
   databases. From our (admittedly non-rigorous measurements),
   mysql was at least a factor of 2-3 faster than MS-SQL, with the
   same dataset.
* Look at cloud-computing. If finances permit, one should be able
  to shrink indexing times to almost any desired level. E.g., for the
  dataset that we used, I have little doubt that we could have shrunk
  the time down to less than 1h, at an affordable cost on Amazon EC2.
  Unfortunately, we have not yet had the opportunity to try this.

 The problem is that for each record in fd, Solr makes three distinct SELECT 
 on the other three tables. Of course, this is absolutely inefficient.

 Is there a way to have Solr loading every record in the four tables and join 
 them when they are already loaded in memory?

For various reasons, we did not investigate this in depth,
but you could also look at Solr's CachedSqlEntityProcessor.

Regards,
Gora


Re: CRUD on solr Index while replicating between master/slave

2011-12-14 Thread Tarun Jain
Hi,
We have an index which needs constant updates in the master.

One more question..
The scenario is
1) Master starts replicating to slave (takes approx 15 mins)

2) We do some changes to index on master while it is replicating

So question is what happens to the changes in master index while it is 
replicating.
Will the slave get it or not? 


Tarun Jain
-=-




- Original Message -
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org; Tarun Jain tjai...@yahoo.com
Cc: 
Sent: Tuesday, December 13, 2011 4:18 PM
Subject: Re: CRUD on solr Index while replicating between master/slave

No, you can search on the master when replicating, no
problem.

But why do you want to? The whole point of master/slave
setups is to separate indexing from searching machines.

Best
Erick

On Tue, Dec 13, 2011 at 4:10 PM, Tarun Jain tjai...@yahoo.com wrote:
 Hi,
 Thanks.
 So just to clarify here again while replicating we cannot search on master 
 index ?

 Tarun Jain
 -=-



 - Original Message -
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Cc:
 Sent: Tuesday, December 13, 2011 3:03 PM
 Subject: Re: CRUD on solr Index while replicating between master/slave

 Hi,

 Master: Update/insert/delete docs    --    Yes
 Slaves: Search                              --   Yes

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, December 13, 2011 11:15 AM
Subject: CRUD on solr Index while replicating between master/slave

Hi,
When replication is happening between master to slave what operations can we 
do on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to know 
this anyway. Also I understand that doing DML on a slave will make the slave 
index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-






Re: Shutdown hook issue

2011-12-14 Thread Otis Gospodnetic
Hi,

Solr won't shut down by itself just because it's idle. :)
You could run it with debugger attached and breakpoint set in the shutdown hook 
you are talking about and see what calls it.

Otis


Performance Monitoring SaaS for Solr 
- http://sematext.com/spm/solr-performance-monitoring/index.html





 From: Adolfo Castro Menna adolfo.castrome...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Wednesday, December 14, 2011 8:17 AM
Subject: Shutdown hook issue
 
Hi All,

I'm experiencing some issues with solr. From time to time solr goes down.
After checking the logs, I see that it's due to the shutdown hook being
triggered.
I still don't know why it happens but it seems to be related to solr being
idle. Does anyone have any insights?

I'm using Ubuntu 10.04.2 LTS and solr 3.1.0 running on Jetty (default
configuration). Solr runs in background, so it doesn't seem to be related
to a SIGINT unless ubuntu is sending it for some odd reason.

Thanks,
Adolfo.




Re: CRUD on solr Index while replicating between master/slave

2011-12-14 Thread Otis Gospodnetic
Hi,

The slave will get the changes next time it polls the master and master tells 
it the index has changed.
Note that master doesn't replicate to slave, but rather the slave copies 
changes from the master.

Otis 

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html




 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org 
Sent: Wednesday, December 14, 2011 10:43 AM
Subject: Re: CRUD on solr Index while replicating between master/slave
 
Hi,
We have an index which needs constant updates in the master.

One more question..
The scenario is
1) Master starts replicating to slave (takes approx 15 mins)

2) We do some changes to index on master while it is replicating

So question is what happens to the changes in master index while it is 
replicating.
Will the slave get it or not? 


Tarun Jain
-=-




- Original Message -
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org; Tarun Jain tjai...@yahoo.com
Cc: 
Sent: Tuesday, December 13, 2011 4:18 PM
Subject: Re: CRUD on solr Index while replicating between master/slave

No, you can search on the master when replicating, no
problem.

But why do you want to? The whole point of master/slave
setups is to separate indexing from searching machines.

Best
Erick

On Tue, Dec 13, 2011 at 4:10 PM, Tarun Jain tjai...@yahoo.com wrote:
 Hi,
 Thanks.
 So just to clarify here again while replicating we cannot search on master 
 index ?

 Tarun Jain
 -=-



 - Original Message -
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Cc:
 Sent: Tuesday, December 13, 2011 3:03 PM
 Subject: Re: CRUD on solr Index while replicating between master/slave

 Hi,

 Master: Update/insert/delete docs    --    Yes
 Slaves: Search                              --   Yes

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, December 13, 2011 11:15 AM
Subject: CRUD on solr Index while replicating between master/slave

Hi,
When replication is happening between master to slave what operations can we 
do on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to 
know this anyway. Also I understand that doing DML on a slave will make the 
slave index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-








NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Jay Luker
I can't get NumericRangeQuery or TermQuery to work on my integer id
field. I feel like I must be missing something obvious.

I have a test index that has only two documents, id:9076628 and
id:8003001. The id field is defined like so:

field name=id type=tint indexed=true stored=true required=true /

A MatchAllDocsQuery will return the 2 documents, but any queries I try
on the id field return no results. For instance,

public void testIdRange() throws IOException {
Query q = NumericRangeQuery.newIntRange(id, 1, 1000, true, true);
System.out.println(query:  + q);
assertEquals(2, searcher.search(q, 5).totalHits);
}

public void testIdSearch() throws IOException {
Query q = new TermQuery(new Term(id, 9076628));
System.out.println(query:  + q);
assertEquals(1, searcher.search(q, 5).totalHits);
}

Both tests fail with totalHits being 0. This is using solr/lucene
trunk, but I tried also with 3.2 and got the same results.

What could I be doing wrong here?

Thanks,
--jay


Re: NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Dmitry Kan
Maybe you should index your values differently? Here is what Lucene's 2.9
javadoc says:

To use this, you must first index the numeric values using
NumericFieldhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/document/NumericField.html(expert:
NumericTokenStreamhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/analysis/NumericTokenStream.html).
If your terms are instead textual, you should use
TermRangeQueryhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/TermRangeQuery.html.
NumericRangeFilterhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/NumericRangeFilter.htmlis
the filter equivalent of this query.

Dmitry

On Wed, Dec 14, 2011 at 6:53 PM, Jay Luker lb...@reallywow.com wrote:

 I can't get NumericRangeQuery or TermQuery to work on my integer id
 field. I feel like I must be missing something obvious.

 I have a test index that has only two documents, id:9076628 and
 id:8003001. The id field is defined like so:

 field name=id type=tint indexed=true stored=true required=true
 /

 A MatchAllDocsQuery will return the 2 documents, but any queries I try
 on the id field return no results. For instance,

 public void testIdRange() throws IOException {
Query q = NumericRangeQuery.newIntRange(id, 1, 1000, true, true);
System.out.println(query:  + q);
assertEquals(2, searcher.search(q, 5).totalHits);
 }

 public void testIdSearch() throws IOException {
Query q = new TermQuery(new Term(id, 9076628));
System.out.println(query:  + q);
assertEquals(1, searcher.search(q, 5).totalHits);
 }

 Both tests fail with totalHits being 0. This is using solr/lucene
 trunk, but I tried also with 3.2 and got the same results.

 What could I be doing wrong here?

 Thanks,
 --jay



Re: cache monitoring tools?

2011-12-14 Thread Dmitry Kan
Thanks, Justin. With zabbix I can gather jmx exposed stats from SOLR, how
about munin, what protocol / way it uses to accumulate stats? It wasn't
obvious from their online documentation...

On Mon, Dec 12, 2011 at 4:56 PM, Justin Caratzas
justin.carat...@gmail.comwrote:

 Dmitry,

 The only added stress that munin puts on each box is the 1 request per
 stat per 5 minutes to our admin stats handler.  Given that we get 25
 requests per second, this doesn't make much of a difference.  We don't
 have a sharded index (yet) as our index is only 2-3 GB, but we do have
 slave servers with replicated
 indexes that handle the queries, while our master handles
 updates/commits.

 Justin

 Dmitry Kan dmitry@gmail.com writes:

  Justin, in terms of the overhead, have you noticed if Munin puts much of
 it
  when used in production? In terms of the solr farm: how big is a shard's
  index (given you have sharded architecture).
 
  Dmitry
 
  On Sun, Dec 11, 2011 at 6:39 PM, Justin Caratzas
  justin.carat...@gmail.comwrote:
 
  At my work, we use Munin and Nagio for monitoring and alerts.  Munin is
  great because writing a plugin for it so simple, and with Solr's
  statistics handler, we can track almost any solr stat we want.  It also
  comes with included plugins for load, file system stats, processes,
  etc.
 
  http://munin-monitoring.org/
 
  Justin
 
  Paul Libbrecht p...@hoplahup.net writes:
 
   Allow me to chim in and ask a generic question about monitoring tools
   for people close to developers: are any of the tools mentioned in this
   thread actually able to show graphs of loads, e.g. cache counts or CPU
   load, in parallel to a console log or to an http request log??
  
   I am working on such a tool currently but I have a bad feeling of
  reinventing the wheel.
  
   thanks in advance
  
   Paul
  
  
  
   Le 8 déc. 2011 à 08:53, Dmitry Kan a écrit :
  
   Otis, Tomás: thanks for the great links!
  
   2011/12/7 Tomás Fernández Löbbe tomasflo...@gmail.com
  
   Hi Dimitry, I pointed to the wiki page to enable JMX, then you can
 use
  any
   tool that visualizes JMX stuff like Zabbix. See
  
  
 
 http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
  
   On Wed, Dec 7, 2011 at 11:49 AM, Dmitry Kan dmitry@gmail.com
  wrote:
  
   The culprit seems to be the merger (frontend) SOLR. Talking to one
  shard
   directly takes substantially less time (1-2 sec).
  
   On Wed, Dec 7, 2011 at 4:10 PM, Dmitry Kan dmitry@gmail.com
  wrote:
  
   Tomás: thanks. The page you gave didn't mention cache
 specifically,
  is
   there more documentation on this specifically? I have used
 solrmeter
   tool,
   it draws the cache diagrams, is there a similar tool, but which
 would
   use
   jmx directly and present the cache usage in runtime?
  
   pravesh:
   I have increased the size of filterCache, but the search hasn't
  become
   any
   faster, taking almost 9 sec on avg :(
  
   name: search
   class: org.apache.solr.handler.component.SearchHandler
   version: $Revision: 1052938 $
   description: Search using components:
  
  
  
 
 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.StatsComponent,org.apache.solr.handler.component.DebugComponent,
  
   stats: handlerStart : 1323255147351
   requests : 100
   errors : 3
   timeouts : 0
   totalTime : 885438
   avgTimePerRequest : 8854.38
   avgRequestsPerSecond : 0.008789442
  
   the stats (copying fieldValueCache as well here, to show term
   statistics):
  
   name: fieldValueCache
   class: org.apache.solr.search.FastLRUCache
   version: 1.0
   description: Concurrent LRU Cache(maxSize=1, initialSize=10,
   minSize=9000, acceptableSize=9500, cleanupThread=false)
   stats: lookups : 79
   hits : 77
   hitratio : 0.97
   inserts : 1
   evictions : 0
   size : 1
   warmupTime : 0
   cumulative_lookups : 79
   cumulative_hits : 77
   cumulative_hitratio : 0.97
   cumulative_inserts : 1
   cumulative_evictions : 0
   item_shingleContent_trigram :
  
  
  
 
 {field=shingleContent_trigram,memSize=326924381,tindexSize=4765394,time=215426,phase1=213868,nTerms=14827061,bigTerms=35,termInstances=114359167,uses=78}
   name: filterCache
   class: org.apache.solr.search.FastLRUCache
   version: 1.0
   description: Concurrent LRU Cache(maxSize=153600,
 initialSize=4096,
   minSize=138240, acceptableSize=145920, cleanupThread=false)
   stats: lookups : 1082854
   hits : 940370
   hitratio : 0.86
   inserts : 142486
   evictions : 0
   size : 142486
   warmupTime : 0
   cumulative_lookups : 1082854
   cumulative_hits : 940370
   cumulative_hitratio : 0.86
   cumulative_inserts : 142486
   cumulative_evictions : 0
  
  
   index size: 3,25 GB
  
   Does anyone have some pointers to where to look at and optimize
 for
   query
   time?
  

Optimal Setup

2011-12-14 Thread Dave Stuart
Background: 

We have around 100 web sites of various sizes (in terms of indexable content) 
and I'm trying to come up with the best architectural design from a performance 
perspective. 
- Each of the sites has a need for DEV, TEST and LIVE indices. 
- The content on the sites divided into 5 groups (but its likely that there 
will be more groups in future) which is similar enough to use the same schema, 
solrconfig, synonyms, stopwords etc.
- From a search view they are distinct per website (i.e. only that sites 
content should appear) .
- The indexing mechanism is the same for all sites (i.e. they all use the web 
api)
- While not unlimited we have a fair bit of flexibility on servers (although 
they are all virtual)


Questions:
- Is it better to
(a) have each site in it own core and webapp 
e.g. /solr1/(DEV|TEST|LIVE)  /solr2/(DEV|TEST|LIVE) etc
(b) have all the cores in one webapp 
e.g. /solr1/(SITE1DEV|SITE1TEST|SITE1LIVE|SITE2DEV|SITE2TEST|SITE2LIVE) 
etc
(c) have 3 cores per content group and have a filter query param in all queries 
that only grabs that sites data 
e.g./solr1/(CONTENTGRP1DEV|CONTENTGRP1TEST).
(d) same as (c) except sharding across multiple servers
(e) have all the DEV's, TEST's and LIVE's on separate boxes with either the (b) 
or (c) setup 
eg. (b) box1: /solr1/(SITE1LIVE|SITE2LIVE...) c 
/solr1/(ONTENTGRP1LIVE|ONTENTGRP2LIVE...)

Thanks For the help


Regards,

Dave




Re: Optimal Setup

2011-12-14 Thread Walter Underwood
You need dev, test, and live on separate boxes so that you can do capacity 
tests. When you are sending queries to find out the max rate before overload, 
you need to do that on dev or test, not live.

Also, you'll need to test new versions of Solr, so you need separate Solr 
installations.

 wunder

On Dec 14, 2011, at 9:30 AM, Dave Stuart wrote:

 Background: 
 
 We have around 100 web sites of various sizes (in terms of indexable content) 
 and I'm trying to come up with the best architectural design from a 
 performance perspective. 
 - Each of the sites has a need for DEV, TEST and LIVE indices. 
 - The content on the sites divided into 5 groups (but its likely that there 
 will be more groups in future) which is similar enough to use the same 
 schema, solrconfig, synonyms, stopwords etc.
 - From a search view they are distinct per website (i.e. only that sites 
 content should appear) .
 - The indexing mechanism is the same for all sites (i.e. they all use the web 
 api)
 - While not unlimited we have a fair bit of flexibility on servers (although 
 they are all virtual)
 
 
 Questions:
 - Is it better to
 (a) have each site in it own core and webapp 
   e.g. /solr1/(DEV|TEST|LIVE)  /solr2/(DEV|TEST|LIVE) etc
 (b) have all the cores in one webapp 
   e.g. /solr1/(SITE1DEV|SITE1TEST|SITE1LIVE|SITE2DEV|SITE2TEST|SITE2LIVE) 
 etc
 (c) have 3 cores per content group and have a filter query param in all 
 queries that only grabs that sites data 
   e.g./solr1/(CONTENTGRP1DEV|CONTENTGRP1TEST).
 (d) same as (c) except sharding across multiple servers
 (e) have all the DEV's, TEST's and LIVE's on separate boxes with either the 
 (b) or (c) setup 
   eg. (b) box1: /solr1/(SITE1LIVE|SITE2LIVE...) c 
 /solr1/(ONTENTGRP1LIVE|ONTENTGRP2LIVE...)
 
 Thanks For the help
 
 
 Regards,
 
 Dave
 
 







Re: Large RDBMS dataset

2011-12-14 Thread Erick Erickson
You can also consider using SolrJ to do this. I posted a small example a couple
of days ago.

Best
Erick

On Wed, Dec 14, 2011 at 10:39 AM, Gora Mohanty g...@mimirtech.com wrote:
 On Wed, Dec 14, 2011 at 3:48 PM, Finotti Simone tech...@yoox.com wrote:
 Hello,
 I have a very large dataset ( 1 Mrecords) on the RDBMS which I want my Solr 
 application to pull data from.
 [...]

 It works, but it takes 1'38 to parse 100 records: it means 1 rec/s! That 
 means that digesting the whole dataset would take 1 Ms (= 12 days).

 Depending on the size of the data that you are pulling from
 the database, 1M records is not really that large a number.
 We were doing ~75GB of stored data from ~7million records
 in about 9h, including quite complicated transfomers. I would
 imagine that there is much room for improvement in your case
 also. Some notes on this:
 * If you have servers to throw at the problem, and a sensible
  way to shard your RDBMS data, use parallel indexing to
  multiple Solr cores, maybe on multiple servers, followed by
  a merge. In our experience, given enough RAM and adequate
  provisioning of database servers, indexing speed scales linearly
  with the total no. of cores.
 * Replicate your database, manually if needed. Look at the load
  on a database server during the indexing process, and provision
  enough database servers to match the no. of Solr indexing servers.
 * This point is leading into flamewar territory, but consider switching
   databases. From our (admittedly non-rigorous measurements),
   mysql was at least a factor of 2-3 faster than MS-SQL, with the
   same dataset.
 * Look at cloud-computing. If finances permit, one should be able
  to shrink indexing times to almost any desired level. E.g., for the
  dataset that we used, I have little doubt that we could have shrunk
  the time down to less than 1h, at an affordable cost on Amazon EC2.
  Unfortunately, we have not yet had the opportunity to try this.

 The problem is that for each record in fd, Solr makes three distinct 
 SELECT on the other three tables. Of course, this is absolutely inefficient.

 Is there a way to have Solr loading every record in the four tables and join 
 them when they are already loaded in memory?

 For various reasons, we did not investigate this in depth,
 but you could also look at Solr's CachedSqlEntityProcessor.

 Regards,
 Gora


Re: CRUD on solr Index while replicating between master/slave

2011-12-14 Thread Erick Erickson
Whoa! Replicating takes 15 mins? That's a really long time. Are you including
about the polling interval here? Or is this just raw replication time?

Because this is really suspicious. Are you optimizing your index all the time
or something? Replication should pull down ONLY the changed segments.
But optimizing changes *all* the segments (really, collapses them into one)
and you'd be copying the full index each replication.

Or are you committing after every few documents? Or?

You need to understand why replication takes s long before going
any further IMO. It may be perfectly legitimate, but on the surface it sure
doesn't seem right.

Best
Erick

On Wed, Dec 14, 2011 at 10:52 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,

 The slave will get the changes next time it polls the master and master tells 
 it the index has changed.
 Note that master doesn't replicate to slave, but rather the slave copies 
 changes from the master.

 Otis
 
 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wednesday, December 14, 2011 10:43 AM
Subject: Re: CRUD on solr Index while replicating between master/slave

Hi,
We have an index which needs constant updates in the master.

One more question..
The scenario is
1) Master starts replicating to slave (takes approx 15 mins)

2) We do some changes to index on master while it is replicating

So question is what happens to the changes in master index while it is 
replicating.
Will the slave get it or not?


Tarun Jain
-=-




- Original Message -
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org; Tarun Jain tjai...@yahoo.com
Cc:
Sent: Tuesday, December 13, 2011 4:18 PM
Subject: Re: CRUD on solr Index while replicating between master/slave

No, you can search on the master when replicating, no
problem.

But why do you want to? The whole point of master/slave
setups is to separate indexing from searching machines.

Best
Erick

On Tue, Dec 13, 2011 at 4:10 PM, Tarun Jain tjai...@yahoo.com wrote:
 Hi,
 Thanks.
 So just to clarify here again while replicating we cannot search on master 
 index ?

 Tarun Jain
 -=-



 - Original Message -
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Cc:
 Sent: Tuesday, December 13, 2011 3:03 PM
 Subject: Re: CRUD on solr Index while replicating between master/slave

 Hi,

 Master: Update/insert/delete docs    --    Yes
 Slaves: Search                              --   Yes

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, December 13, 2011 11:15 AM
Subject: CRUD on solr Index while replicating between master/slave

Hi,
When replication is happening between master to slave what operations can 
we do on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to 
know this anyway. Also I understand that doing DML on a slave will make the 
slave index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-









Re: NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Erick Erickson
Hmmm, seems like it should work, but there are two things you might try:
1 just execute the query in Solr. id:1 TO 100]. Does that work?
2 I'm really grasping at straws here, but it's *possible* that you
 need to use the same precisionstep as tint (8?)? There's a
 constructor that takes precisionStep as a parameter, but the
 default is 4 in the 3.x code.

I guess it's also possible that you're not really connecting to the
server you think you are, but I doubt it as I expect your unit test
is creating the index for you, in which case you can't do 1

Best
Erick

On Wed, Dec 14, 2011 at 12:14 PM, Dmitry Kan dmitry@gmail.com wrote:
 Maybe you should index your values differently? Here is what Lucene's 2.9
 javadoc says:

 To use this, you must first index the numeric values using
 NumericFieldhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/document/NumericField.html(expert:
 NumericTokenStreamhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/analysis/NumericTokenStream.html).
 If your terms are instead textual, you should use
 TermRangeQueryhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/TermRangeQuery.html.
 NumericRangeFilterhttp://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/NumericRangeFilter.htmlis
 the filter equivalent of this query.

 Dmitry

 On Wed, Dec 14, 2011 at 6:53 PM, Jay Luker lb...@reallywow.com wrote:

 I can't get NumericRangeQuery or TermQuery to work on my integer id
 field. I feel like I must be missing something obvious.

 I have a test index that has only two documents, id:9076628 and
 id:8003001. The id field is defined like so:

 field name=id type=tint indexed=true stored=true required=true
 /

 A MatchAllDocsQuery will return the 2 documents, but any queries I try
 on the id field return no results. For instance,

 public void testIdRange() throws IOException {
    Query q = NumericRangeQuery.newIntRange(id, 1, 1000, true, true);
    System.out.println(query:  + q);
    assertEquals(2, searcher.search(q, 5).totalHits);
 }

 public void testIdSearch() throws IOException {
    Query q = new TermQuery(new Term(id, 9076628));
    System.out.println(query:  + q);
    assertEquals(1, searcher.search(q, 5).totalHits);
 }

 Both tests fail with totalHits being 0. This is using solr/lucene
 trunk, but I tried also with 3.2 and got the same results.

 What could I be doing wrong here?

 Thanks,
 --jay



Re: CRUD on solr Index while replicating between master/slave

2011-12-14 Thread Tarun Jain
Hi,
We do optimize the whole index because we index our entire content every 4 hrs. 
From an application/business point of view the replication time if acceptable. 

Thanks for the information though. We will try to change this behaviour in the 
future so that replication time if reduced.

Tarun Jain
-=-


From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org; Otis Gospodnetic otis_gospodne...@yahoo.com 
Sent: Wednesday, December 14, 2011 1:52 PM
Subject: Re: CRUD on solr Index while replicating between master/slave

Whoa! Replicating takes 15 mins? That's a really long time. Are you including
about the polling interval here? Or is this just raw replication time?

Because this is really suspicious. Are you optimizing your index all the time
or something? Replication should pull down ONLY the changed segments.
But optimizing changes *all* the segments (really, collapses them into one)
and you'd be copying the full index each replication.

Or are you committing after every few documents? Or?

You need to understand why replication takes s long before going
any further IMO. It may be perfectly legitimate, but on the surface it sure
doesn't seem right.

Best
Erick

On Wed, Dec 14, 2011 at 10:52 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,

 The slave will get the changes next time it polls the master and master tells 
 it the index has changed.
 Note that master doesn't replicate to slave, but rather the slave copies 
 changes from the master.

 Otis
 
 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wednesday, December 14, 2011 10:43 AM
Subject: Re: CRUD on solr Index while replicating between master/slave

Hi,
We have an index which needs constant updates in the master.

One more question..
The scenario is
1) Master starts replicating to slave (takes approx 15 mins)

2) We do some changes to index on master while it is replicating

So question is what happens to the changes in master index while it is 
replicating.
Will the slave get it or not?


Tarun Jain
-=-




- Original Message -
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org; Tarun Jain tjai...@yahoo.com
Cc:
Sent: Tuesday, December 13, 2011 4:18 PM
Subject: Re: CRUD on solr Index while replicating between master/slave

No, you can search on the master when replicating, no
problem.

But why do you want to? The whole point of master/slave
setups is to separate indexing from searching machines.

Best
Erick

On Tue, Dec 13, 2011 at 4:10 PM, Tarun Jain tjai...@yahoo.com wrote:
 Hi,
 Thanks.
 So just to clarify here again while replicating we cannot search on master 
 index ?

 Tarun Jain
 -=-



 - Original Message -
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Cc:
 Sent: Tuesday, December 13, 2011 3:03 PM
 Subject: Re: CRUD on solr Index while replicating between master/slave

 Hi,

 Master: Update/insert/delete docs    --    Yes
 Slaves: Search                              --   Yes

 Otis
 

 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 From: Tarun Jain tjai...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, December 13, 2011 11:15 AM
Subject: CRUD on solr Index while replicating between master/slave

Hi,
When replication is happening between master to slave what operations can 
we do on the master  what operations are possible on the slave?
I know it is not adivisable to do DML on the slave index but I wanted to 
know this anyway. Also I understand that doing DML on a slave will make the 
slave index incompatible with the master.

Master

Search                              --   Yes/No
Update/insert/delete docs    --    Yes/No

Slave
=
Search                              --    Yes/No
Update/insert/delete docs    --    Yes/No

Please share any other caveats that you have discovered regarding the above 
scenario that might be helpful.

Thanks
-=-









Re: How to get SolrServer

2011-12-14 Thread Tomás Fernández Löbbe
Hi Joey, if what you want is to customize Solr so that you do the indexing
code on the server side, you could implement your own RequestHandler, then
the only thing you need to do is to add it to the solrconfig.xml and you
can call it through http GET method.

On Tue, Dec 13, 2011 at 4:42 PM, Schmidt Jeff j...@rvswithoutborders.comwrote:

 Joey:

 I'm not sure what you mean by wapping solr to your own web application.
  There is a way to embed Solr into your application (same JVM), but I've
 never used that.  If you're talking about your servlet running in one JVM
 and Solr in another, then use the SolrJ client library to interact with
 Solr.  I use CommonsHttpSolrServer (
 http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html)
 and specify the URL that locates the Solr server/core name.

 I use Spring to instantiate the server instance, and then I inject it
 where I need it.

bean id=solrServerIngContent
 class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
constructor-arg value=
 http://localhost:8091/solr/mycorename/
/bean

 Thus is equivalent to new CommonsHttpSolrServer(
 http://localhost:8091/solr/mycorename;);

 Check out the API link above and http://wiki.apache.org/solr/Solrj to
 examples on using the SolrJ API.

 Cheers,

 Jeff

 On Dec 13, 2011, at 12:12 PM, Joey wrote:

  Hi I am new to Solr and want to do some customize development.
  I have wrapped solr to my own web application, and want to write a
 servlet
  to index a file system.
 
  The question is how can I get SolrServer inside my Servlet?
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-tp3583304p3583304.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: Possible to adjust FieldNorm?

2011-12-14 Thread Tomás Fernández Löbbe
From what I can see, the problem there is not with the field norm, but with
the fact that leadership is not matching the second document for some
reason. Is it possible that you are having some kind of analysis problem?

On Wed, Dec 14, 2011 at 6:50 AM, cnyee yeec...@gmail.com wrote:

 Hi,

 Is it possible to adjust FieldNorm? I have a scenario where the search is
 not producing the desired result because of fieldNorm:

 Search terms: coaching leadership
 Record 1: name=Ask the Coach, desc=...,...
 Record 2: name=Coaching as a Leadership Development Tool Part 1,
 desc=...,...

 Record 1 was scored higher than record 2, despite record 2 has two matches.
 The scoring is given below:

 Record 1:
  1.2878088 = (MATCH) weight(name_en:coach in 6430), product of:
0.20103075 = queryWeight(name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
6.406029 = (MATCH) fieldWeight(name_en:coach in 6430), product of:
  1.0 = tf(termFreq(name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  1.0 = fieldNorm(field=name_en, doc=6430)

 Record 2:
  0.56341636 = (MATCH) weight(name_en:coach in 4744), product of:
0.20103075 = queryWeight(name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
2.8026378 = (MATCH) fieldWeight(name_en:coach in 4744), product of:
  1.0 = tf(termFreq(crs_name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.4375 = fieldNorm(field=name_en, doc=4744)

 Many thanks in advance.

 Chut



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Possible-to-adjust-FieldNorm-tp3584998p3584998.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Shutdown hook issue

2011-12-14 Thread Adolfo Castro Menna
I think I found the issue. The ubuntu server is running OOM-Killer which
might be sending a SIGINT to the java process, probably because of memory
consumption.

Thanks,
Adolfo.

On Wed, Dec 14, 2011 at 12:44 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Hi,

 Solr won't shut down by itself just because it's idle. :)
 You could run it with debugger attached and breakpoint set in the shutdown
 hook you are talking about and see what calls it.

 Otis
 

 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html




 
  From: Adolfo Castro Menna adolfo.castrome...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 14, 2011 8:17 AM
 Subject: Shutdown hook issue
 
 Hi All,
 
 I'm experiencing some issues with solr. From time to time solr goes down.
 After checking the logs, I see that it's due to the shutdown hook being
 triggered.
 I still don't know why it happens but it seems to be related to solr being
 idle. Does anyone have any insights?
 
 I'm using Ubuntu 10.04.2 LTS and solr 3.1.0 running on Jetty (default
 configuration). Solr runs in background, so it doesn't seem to be related
 to a SIGINT unless ubuntu is sending it for some odd reason.
 
 Thanks,
 Adolfo.
 
 
 



Re: Solr Join with Dismax

2011-12-14 Thread Pascal Dimassimo
Thanks Hoss!

But unfortunately, the dismax parameters (like qf) are not passed over to
the fromIndex. In fact, even if using var dereferencing makes Dismax to be
selected as the fromQueryParser, the query that is passed to the
JoinQuery object contains nothing to indicate that it should use dismax.
The following code is from the method createParser in
JoinQParserPlugin.java:

// With var dereferencing, this makes the fromQueryParser  to be dismax
QParser fromQueryParser = subQuery(v, lucene);

// But after the call to getQuery, there is no indication that dismax
should be used
Query fromQuery = fromQueryParser.getQuery();
JoinQuery jq = new JoinQuery(fromField, toField, fromIndex, fromQuery);

So I guess that as it is right now, dismax can't really be used with joins.

On Fri, Dec 9, 2011 at 3:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Is there a specific reason  why it is hard-coded to use the lucene
 : QParser? I was looking at JoinQParserPlugin.java and here it is in
 : createParser:
 :
 : QParser fromQueryParser = subQuery(v, lucene);
 :
 : I could pass another param named fromQueryParser and use it instead of
 : lucene. But again, is there a reason why I should not do that?

 It's definitley a bug, but we don't need a new local param: that hardcoded
 lucene should just be replaced with null, so that the defType
 local param will be checked (just like it can in the BoostQParser)...

   qf=text name
   q={!join from=manu_id_s to=id defType=dismax}ipod

 Note: even with that hardcoded lucene bug, you can still override the
 default by using var dereferencing to point at another param with it's own
 localparams specying the type...

   qf=text name
   q={!join from=manu_id_s to=id v=$qq}
   qq={!dismax}ipod

 -Hoss




-- 
Pascal Dimassimo

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


Re: Shutdown hook issue

2011-12-14 Thread François Schiettecatte
I am not an expert on this but the oom-killer will kill off the process 
consuming the greatest amount of memory if the machine runs out of memory, and 
you should see something to that effect in the system log, /var/log/messages I 
think.

François

On Dec 14, 2011, at 2:54 PM, Adolfo Castro Menna wrote:

 I think I found the issue. The ubuntu server is running OOM-Killer which
 might be sending a SIGINT to the java process, probably because of memory
 consumption.
 
 Thanks,
 Adolfo.
 
 On Wed, Dec 14, 2011 at 12:44 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 
 Hi,
 
 Solr won't shut down by itself just because it's idle. :)
 You could run it with debugger attached and breakpoint set in the shutdown
 hook you are talking about and see what calls it.
 
 Otis
 
 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html
 
 
 
 
 
 From: Adolfo Castro Menna adolfo.castrome...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 14, 2011 8:17 AM
 Subject: Shutdown hook issue
 
 Hi All,
 
 I'm experiencing some issues with solr. From time to time solr goes down.
 After checking the logs, I see that it's due to the shutdown hook being
 triggered.
 I still don't know why it happens but it seems to be related to solr being
 idle. Does anyone have any insights?
 
 I'm using Ubuntu 10.04.2 LTS and solr 3.1.0 running on Jetty (default
 configuration). Solr runs in background, so it doesn't seem to be related
 to a SIGINT unless ubuntu is sending it for some odd reason.
 
 Thanks,
 Adolfo.
 
 
 
 



Re: Copy in multivalued field and faceting

2011-12-14 Thread yunfei wu
Hi, Eric,

Just interested in this topic, so might want to ask further question based
on Jul's topic.

I read the document of Facet.sort=count which seems to return the facets
order by the doc hit counts.

So, suppose one doc has title value1 value2 value3, and another doc has
title value2 value 4 value 5, and use WhitespaceTokenizer (no matter
designed in single field or multi-value field), do we get the facet results
as:
value2 - 2 docs
value1 - 1 doc
value3 - 1 doc
value4 - 1 doc
value5 - 1 doc

is it a way to get top words? does it cause high performance cost?

Thanks,
Yunfei



On Wed, Dec 14, 2011 at 5:51 AM, Erick Erickson erickerick...@gmail.comwrote:

 I don't quite understand what you're trying to do. MultiValued is
 a bit misleading. All it means is that you can add the same
 field multiple times to a document, i.e. (XML example)
 doc
  add name=fieldvalue1 value2 value3/add
  add name=fieldvalue4 value5 value6/add
 /doc

 will succeed if field is multiValued and fail if not.

 This will work if field is NOT multiValued:
 doc
  add name=fieldvalue1 value2 value3 value4 value5 value6/add
 /doc

 and, assuming WhitespaceTokenizer, the field field will contain
 the exact same tokens. The only difference *might* be the
 offsets, but don't worry about that quite yet, all it would really
 affect is phrase queries.

 With that as a preface, I don't see why copyField has anything
 to do with your problem, you'd get the same results faceting
 on the title field, assuming identical analyzer chains.

 Faceting on a text field is iffy, it can be quite expensive. What you'd
 get in the end, though, is a list of the top words in your corpus for
 that field counted from the documents that satisfied the query. Which
 sounds like what you're after.

 Best
 Erick

 On Wed, Dec 14, 2011 at 4:59 AM, yunfei wu yunfei...@gmail.com wrote:
  Sounds like working by carefully choosing tokenizer, and then use
  facet.sort and facet.limit parameters to do faceting.
 
  Will see any expert's comments on this one.
 
  Yunfei
 
 
  On Wed, Dec 14, 2011 at 12:26 AM, darul daru...@gmail.com wrote:
 
  Hello,
 
  Field for this scenario is Title and contains several words.
 
  For a specific query, I would like get the top ten words by frequency
 in a
  specific field.
 
  My idea was the following:
 
  - Title in my schema is stored/indexed in a specific field
  - A copyField copy Title field content into a multivalued field. If my
  multivalue field use a specific tokenizer which split words, does it
 fill
  each word in each multivalued items ?
  - If so, using faceting on this multivalue field, I will get top ten
 words,
  correct ?
 
  Example:
 
  1) Title : this is my title
  2) CopyField Title to specific multivalue field F1
  3) F1 contains : {this, is, my, title}
 
  My english
 
  Thanks,
 
  Jul
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Copy-in-multivalued-field-and-faceting-tp3584819p3584819.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Copy in multivalued field and faceting

2011-12-14 Thread Ahmet Arslan
 I read the document of Facet.sort=count which seems to
 return the facets
 order by the doc hit counts.
 
 So, suppose one doc has title value1 value2 value3, and
 another doc has
 title value2 value 4 value 5, and use WhitespaceTokenizer
 (no matter
 designed in single field or multi-value field), do we get
 the facet results
 as:
 value2 - 2 docs
 value1 - 1 doc
 value3 - 1 doc
 value4 - 1 doc
 value5 - 1 doc
 
 is it a way to get top words? does it cause high
 performance cost?

Consider using http://wiki.apache.org/solr/LukeRequestHandler for top term. 
Faceting is more meant to 'drill down' in the search result set.


Re: Possible to adjust FieldNorm?

2011-12-14 Thread Chris Hostetter

: From what I can see, the problem there is not with the field norm, but with
: the fact that leadership is not matching the second document for some
: reason. Is it possible that you are having some kind of analysis problem?

Agreed ... if those are your full score explanations for those two 
documents then something is not right with how the second document is 
matcihng your query.

what exactly does your request look like?  what exactly does the 
requestHandler configuration look like? what is the final parsed query 
according to the debug information? what does the fieldtype for name_en 
look like? what does analysis.jsp say about how those name_en field 
values are analized at index time? how is the word leadership analyzed 
at query time?

In general, if you want to disable norms you can set omitNorm on the 
field -- or you can customize the similariy to change the lengthNorm 
function.

:  Search terms: coaching leadership
:  Record 1: name=Ask the Coach, desc=...,...
:  Record 2: name=Coaching as a Leadership Development Tool Part 1,
:  desc=...,...
: 
:  Record 1 was scored higher than record 2, despite record 2 has two matches.
:  The scoring is given below:
: 
:  Record 1:
:   1.2878088 = (MATCH) weight(name_en:coach in 6430), product of:
: 0.20103075 = queryWeight(name_en:coach), product of:
:   6.406029 = idf(docFreq=160, maxDocs=35862)
:   0.03138149 = queryNorm
: 6.406029 = (MATCH) fieldWeight(name_en:coach in 6430), product of:
:   1.0 = tf(termFreq(name_en:coach)=1)
:   6.406029 = idf(docFreq=160, maxDocs=35862)
:   1.0 = fieldNorm(field=name_en, doc=6430)
: 
:  Record 2:
:   0.56341636 = (MATCH) weight(name_en:coach in 4744), product of:
: 0.20103075 = queryWeight(name_en:coach), product of:
:   6.406029 = idf(docFreq=160, maxDocs=35862)
:   0.03138149 = queryNorm
: 2.8026378 = (MATCH) fieldWeight(name_en:coach in 4744), product of:
:   1.0 = tf(termFreq(crs_name_en:coach)=1)
:   6.406029 = idf(docFreq=160, maxDocs=35862)
:   0.4375 = fieldNorm(field=name_en, doc=4744)

-Hoss


Re: Solr Join with Dismax

2011-12-14 Thread Pascal Dimassimo
Hi,

I have been doing more tracing in the code. And I think that I understand a
bit more. The problem does not seem to be dismax+join, but
dismax+join+fromIndex.

When doing this joined dismax query on the same index:
http://localhost:8080/solr/gutenberg/select?q={!join+from=id+to=id+v=$qq}qq={!dismax+qf='body%20tag
^2'}solr

the query returned by the method fromQueryParser.getQuery looks like this:
+(body:solr | tag:solr^2.0)

But when doing the same query across another core:
http://localhost:8080/solr/test/select/?q={!join+fromIndex=gutenberg+from=id+to=id+v=$qq}qq={!dismax+qf='body%20tag
^2'}solr

the query is:
+(body:solr)

We see that the second field defined in the qf param is not added to the
query. Tracing deeper shows that this happens because the tag field does
not exist in the test core, hence it is not added. This can be seen in
SolrPluginUtils.java in the method getFieldQuery. All the fields not part
of the current index won't be added to the query.

So the conclusion does not seem to be that dismax can't be used with joins,
but that it can't be used with another core that does not have the same
fields than the one where the initial query is made.

I just notice SOLR-2824. So it is really a bug. I'll take the time to look
at the patch attached to this ticket.

On Wed, Dec 14, 2011 at 2:55 PM, Pascal Dimassimo 
pascal.dimass...@sematext.com wrote:

 Thanks Hoss!

 But unfortunately, the dismax parameters (like qf) are not passed over to
 the fromIndex. In fact, even if using var dereferencing makes Dismax to be
 selected as the fromQueryParser, the query that is passed to the
 JoinQuery object contains nothing to indicate that it should use dismax.
 The following code is from the method createParser in
 JoinQParserPlugin.java:

 // With var dereferencing, this makes the fromQueryParser  to be dismax
 QParser fromQueryParser = subQuery(v, lucene);

 // But after the call to getQuery, there is no indication that dismax
 should be used
 Query fromQuery = fromQueryParser.getQuery();
 JoinQuery jq = new JoinQuery(fromField, toField, fromIndex, fromQuery);

 So I guess that as it is right now, dismax can't really be used with joins.

 On Fri, Dec 9, 2011 at 3:20 PM, Chris Hostetter 
 hossman_luc...@fucit.orgwrote:


 : Is there a specific reason  why it is hard-coded to use the lucene
 : QParser? I was looking at JoinQParserPlugin.java and here it is in
 : createParser:
 :
 : QParser fromQueryParser = subQuery(v, lucene);
 :
 : I could pass another param named fromQueryParser and use it instead of
 : lucene. But again, is there a reason why I should not do that?

 It's definitley a bug, but we don't need a new local param: that hardcoded
 lucene should just be replaced with null, so that the defType
 local param will be checked (just like it can in the BoostQParser)...

   qf=text name
   q={!join from=manu_id_s to=id defType=dismax}ipod

 Note: even with that hardcoded lucene bug, you can still override the
 default by using var dereferencing to point at another param with it's own
 localparams specying the type...

   qf=text name
   q={!join from=manu_id_s to=id v=$qq}
   qq={!dismax}ipod

 -Hoss




 --
 Pascal Dimassimo
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/




-- 
Pascal Dimassimo

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


Re: Solr Join with Dismax

2011-12-14 Thread Chris Hostetter

: I have been doing more tracing in the code. And I think that I understand a
: bit more. The problem does not seem to be dismax+join, but
: dismax+join+fromIndex.

Correct.  join+dismax works fine as i already demonstrated... 

:  Note: even with that hardcoded lucene bug, you can still override the
:  default by using var dereferencing to point at another param with it's own
:  localparams specying the type...
: 
:qf=text name
:q={!join from=manu_id_s to=id v=$qq}
:qq={!dismax}ipod

...the problem you are refering to now has nothing to do with dismax, and 
is specificly a bug in how the query is parsed when fromIndex is 
used (which i thought i already mentioned in this thread but i see you 
found independently)...

https://issues.apache.org/jira/browse/SOLR-2824

Did you file a Jira about defaulting to lucene instead of null so we can 
make the defType local param syntax work?  I havne't seen it in my 
email but it's really an unrelated problem so it should be tracked 
seperately)


-Hoss


Re: NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Jay Luker
On Wed, Dec 14, 2011 at 2:04 PM, Erick Erickson erickerick...@gmail.com wrote:
 Hmmm, seems like it should work, but there are two things you might try:
 1 just execute the query in Solr. id:1 TO 100]. Does that work?

Yep, that works fine.

 2 I'm really grasping at straws here, but it's *possible* that you
     need to use the same precisionstep as tint (8?)? There's a
     constructor that takes precisionStep as a parameter, but the
     default is 4 in the 3.x code.

Ah-ha, that was it. I did not notice the alternate constructor. The
field was originally indexed with solr's default int type, which has
precisionStep=0 (i.e., don't index at different precision levels).
The equivalent value for the NumericRangeQuery constructor is 32. This
isn't exactly inuitive, but I was able to figure it out with a careful
reading of the javadoc.

Thanks!
--jay


queryResultCache hit count is not being increased when programmatically adding Lucene queries as filters in the SearchComponent

2011-12-14 Thread Igor Muntyan
In my application I need to deal with a very large number of filter queries
that I cannot pass as http parameters - instead I add them as filters on the
ResponseBuilder:

public void process(ResponseBuilder rb)
{
ListQuery filters = rb.getFilters();
if (filters == null) {
filters = new ArrayListQuery();
rb.setFilters(filters);
}
filters.add(userAccessQuery);
filters.add(auctionEndConditionQuery);
}

In the /admin/stats.jsp I have noticed that if the code above gets executed
then my queryResultCache hit count does not increase.

The following is the debug query:

{
  responseHeader:{
status:0,
QTime:31,
params:{
 
_jstate:-NWHpuWq8R7oPBQGnJsHifjVh6blqvEe6DwUBpybB4ldLWmZEGqsSyvRk_0LX6a-U3fqO6Wd4kc,
  indent:true,
  wt:json,
  version:2,
  debugQuery:true,
  fl:vid,
  q:text:(Solara) OR slrDescEm:(\Solara\) OR
vinLast8:(\Solara\),
  fq:(eColorId:\4\)}},
  response:{numFound:2,start:0,docs:[
  {
vid:18372703},
  {
vid:19071820}]
  },
  debug:{
rawquerystring:text:(Solara) OR slrDescEm:(\Solara\) OR
vinLast8:(\Solara\),
querystring:text:(Solara) OR slrDescEm:(\Solara\) OR
vinLast8:(\Solara\),
parsedquery:text:solara slrDescEm:solara vinLast8:solara,
parsedquery_toString:text:solara slrDescEm:solara vinLast8:solara,
explain:{
  18372703:\n0.285029 = (MATCH) product of:\n  0.855087 = (MATCH) sum
of:\n0.855087 = (MATCH) weight(text:solara in 4146), product of:\n 
0.44694334 = queryWeight(text:solara), product of:\n7.6527553 =
idf(docFreq=23, maxDocs=18598)\n0.058402933 = queryNorm\n 
1.9131888 = (MATCH) fieldWeight(text:solara in 4146), product of:\n   
1.0 = tf(termFreq(text:solara)=1)\n7.6527553 = idf(docFreq=23,
maxDocs=18598)\n0.25 = fieldNorm(field=text, doc=4146)\n  0.3334
= coord(1/3)\n,
  19071820:\n0.285029 = (MATCH) product of:\n  0.855087 = (MATCH) sum
of:\n0.855087 = (MATCH) weight(text:solara in 13815), product of:\n 
0.44694334 = queryWeight(text:solara), product of:\n7.6527553 =
idf(docFreq=23, maxDocs=18598)\n0.058402933 = queryNorm\n 
1.9131888 = (MATCH) fieldWeight(text:solara in 13815), product of:\n   
1.0 = tf(termFreq(text:solara)=1)\n7.6527553 = idf(docFreq=23,
maxDocs=18598)\n0.25 = fieldNorm(field=text, doc=13815)\n 
0.3334 = coord(1/3)\n},
QParser:LuceneQParser,
filter_queries:[(eColorId:\4\)],
parsed_filter_queries:[eColorId:4,
  (+(((+((+(+cgcId:4 +iter:[1 TO 6])) 

/** A VERY, VERY LONG LIST OF CONDITIONS HERE **/


) (+(+cgcId:840 +iter:[1 TO 12])) (+(+cgcId:841 +iter:[1 TO 10]))
(+(+cgcId:843 +iter:[1 TO 12] +blInd:true),
  ET:[1323899277225 TO *]],
timing:{
  time:31.0,
  prepare:{
time:0.0,
com.openlane.search.solr.filter.GenericFilter:{
  time:0.0},
org.apache.solr.handler.component.QueryComponent:{
  time:0.0},
org.apache.solr.handler.component.FacetComponent:{
  time:0.0},
org.apache.solr.handler.component.MoreLikeThisComponent:{
  time:0.0},
org.apache.solr.handler.component.HighlightComponent:{
  time:0.0},
org.apache.solr.handler.component.StatsComponent:{
  time:0.0},
org.apache.solr.handler.component.DebugComponent:{
  time:0.0}},
  process:{
time:31.0,
com.openlane.search.solr.filter.GenericFilter:{
  time:31.0},
org.apache.solr.handler.component.QueryComponent:{
  time:0.0},
org.apache.solr.handler.component.FacetComponent:{
  time:0.0},
org.apache.solr.handler.component.MoreLikeThisComponent:{
  time:0.0},
org.apache.solr.handler.component.HighlightComponent:{
  time:0.0},
org.apache.solr.handler.component.StatsComponent:{
  time:0.0},
org.apache.solr.handler.component.DebugComponent:{
  time:0.0}


Notice the difference between filter_queries and parsed_filter_queries

If I block filters.add(userAccessQuery) then my queryResultCache hit count
is being increased as it should. The following is the response with
debugQuery=true
in this case.

{
  responseHeader:{
status:0,
QTime:16,
params:{
  indent:true,
  wt:json,
  version:2,
  debugQuery:true,
  fl:vid,
  q:text:(Solara) OR slrDescEm:(\Solara\) OR
vinLast8:(\Solara\),
  devMode:bypassUserAccess,
  fq:(eColorId:\4\)}},
  response:{numFound:3,start:0,docs:[
  {
vid:18372703},
  {
vid:19071820},
  {
vid:17192691}]
  },
  debug:{
rawquerystring:text:(Solara) OR slrDescEm:(\Solara\) OR
vinLast8:(\Solara\),
querystring:text:(Solara) OR slrDescEm:(\Solara\) OR
vinLast8:(\Solara\),
parsedquery:text:solara slrDescEm:solara vinLast8:solara,
parsedquery_toString:text:solara 

Re: queryResultCache hit count is not being increased when programmatically adding Lucene queries as filters in the SearchComponent

2011-12-14 Thread Igor Muntyan
Solr version: 3.2.0

--
View this message in context: 
http://lucene.472066.n3.nabble.com/queryResultCache-hit-count-is-not-being-increased-when-programmatically-adding-Lucene-queries-as-filt-tp3586892p3586904.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Chris Hostetter

I'm a little lost in this thread ... if you are programaticly construction 
a NumericRangeQuery object to execute in the JVM against a Solr index, 
that suggests you are writting some sort of SOlr plugin (or uembedding 
solr in some way)

why manually construct the query using options that may or may not be 
correct if/when someone changes the schema, when you could just ask the 
FieldType to construct the appropriate query for you?

FieldType ft = IndexSchema.getFieldType(your field name);
Query q = ft.getRnageQuery(...);

?


-Hoss


XPath with ExtractingRequestHandler

2011-12-14 Thread Michael Kelleher

I want to restrict the HTML that is returned by Tika to basically:


/xhtml:html/xhtml:body//xhtml:div[@class='bibliographicData']/descendant:node()



and it seems that the XPath class being used does not support the '//' 
syntax.


Is there anyway to configure Tika to use a different XPath evaluation class?




Re: How to get SolrServer within my own servlet

2011-12-14 Thread Chris Hostetter

: So what I want to do is to modify Solr a bit - add one servlet so I can
: trigger a full index of a folder in the file system.
...

: I guess there are two SolrServer instances(one is EmbeddedSolrServer,
: created by myself and the other is come with Solr itself and they are
: holding different index?

i suspect you are correct, but frankly i'm amazed hwat you are doing is 
working at all (you should be getting a write lock from having two 
distinct Solr instances trying to write to the same directory)

I think you need to back up and explain better what your overall goal is 
-- embedding Solr in other apps is:

a) a fairly advanced usage that i would not suggest you persue until 
you have a better grasp of solr fundementals
b) not something people usually do if they also want to be able to use 
solr via HTTP.

in general, if your only goal of mucking with the solr.war is to be able 
to index files on the local filesystem (relative where Solr is running) 
there are a lot of other ways to approach that goal (use DIH, or write a 
custom RequestHanlder you load as a plugin, etc...)

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341



-Hoss


Re: Arabic suppport,

2011-12-14 Thread Chris Hostetter

: how can I add arabic support to the solr?

https://wiki.apache.org/solr/LanguageAnalysis
https://wiki.apache.org/solr/LanguageAnalysis#Arabic



-Hoss


Re: Possible to adjust FieldNorm?

2011-12-14 Thread cnyee
Sorry, I did not give the full output in the first post. 
For what it looks, the fieldNorm is saying that:
1 match out of 3 words in record 1 is more significant than 2 matches out of
8 words in record 2.
That would be true for simple arithmetic, but unsatisfactory in human
'meaning'.

Here are the full explanation. Record 2 has some boosting as well.

Record 1:
1.5843434 = (MATCH) sum of:
  1.5372416 = (MATCH) sum of:
1.2878088 = (MATCH) max plus 0.1 times others of:
  1.2878088 = (MATCH) weight(crs_name_en:coach in 6430), product of:
0.20103075 = queryWeight(crs_name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
6.406029 = (MATCH) fieldWeight(crs_name_en:coach in 6430), product
of:
  1.0 = tf(termFreq(crs_name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  1.0 = fieldNorm(field=crs_name_en, doc=6430)
0.2494328 = (MATCH) max plus 0.1 times others of:
  0.2494328 = (MATCH) weight(crs_desc_en:leadership in 6430), product
of:
0.15826634 = queryWeight(crs_desc_en:leadership), product of:
  5.043302 = idf(docFreq=628, maxDocs=35862)
  0.03138149 = queryNorm
1.5760319 = (MATCH) fieldWeight(crs_desc_en:leadership in 6430),
product of:
  1.0 = tf(termFreq(crs_desc_en:leadership)=1)
  5.043302 = idf(docFreq=628, maxDocs=35862)
  0.3125 = fieldNorm(field=crs_desc_en, doc=6430)
  0.04710189 = (MATCH) product of:
0.09420378 = (MATCH) sum of:
  0.09420378 = (MATCH) product of:
0.3768151 = (MATCH) sum of:
  0.3768151 = (MATCH) weight(published_year:2008 in 6430), product
of:
0.10874291 = queryWeight(published_year:2008), product of:
  3.4651926 = idf(docFreq=3047, maxDocs=35862)
  0.03138149 = queryNorm
3.4651926 = (MATCH) fieldWeight(published_year:2008 in 6430),
product of:
  1.0 = tf(termFreq(published_year:2008)=1)
  3.4651926 = idf(docFreq=3047, maxDocs=35862)
  1.0 = fieldNorm(field=published_year, doc=6430)
0.25 = coord(1/4)
0.5 = coord(1/2)
  0.0 = (MATCH) FunctionQuery(int(crs_stars)), product of:
0.0 = int(crs_stars)=0
2.5 = boost
0.03138149 = queryNorm

Record 2:
1.5590522 = (MATCH) sum of:
  1.0096307 = (MATCH) sum of:
0.6206793 = (MATCH) max plus 0.1 times others of:
  0.56341636 = (MATCH) weight(crs_name_en:coach in 4744), product of:
0.20103075 = queryWeight(crs_name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
2.8026378 = (MATCH) fieldWeight(crs_name_en:coach in 4744), product
of:
  1.0 = tf(termFreq(crs_name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.4375 = fieldNorm(field=crs_name_en, doc=4744)
  0.11664742 = (MATCH) weight(meta_en:coach in 4744), product of:
0.11443973 = queryWeight(meta_en:coach), product of:
  3.646727 = idf(docFreq=2541, maxDocs=35862)
  0.03138149 = queryNorm
1.0192913 = (MATCH) fieldWeight(meta_en:coach in 4744), product of:
  2.236068 = tf(termFreq(meta_en:coach)=5)
  3.646727 = idf(docFreq=2541, maxDocs=35862)
  0.125 = fieldNorm(field=meta_en, doc=4744)
  0.4559821 = (MATCH) weight(crs_desc_en:coach in 4744), product of:
0.19534174 = queryWeight(crs_desc_en:coach), product of:
  6.2247434 = idf(docFreq=192, maxDocs=35862)
  0.03138149 = queryNorm
2.3342788 = (MATCH) fieldWeight(crs_desc_en:coach in 4744), product
of:
  2.0 = tf(termFreq(crs_desc_en:coach)=4)
  6.2247434 = idf(docFreq=192, maxDocs=35862)
  0.1875 = fieldNorm(field=crs_desc_en, doc=4744)
0.3889513 = (MATCH) max plus 0.1 times others of:
  0.36372444 = (MATCH) weight(crs_name_en:leadership in 4744), product
of:
0.16152287 = queryWeight(crs_name_en:leadership), product of:
  5.147074 = idf(docFreq=566, maxDocs=35862)
  0.03138149 = queryNorm
2.251845 = (MATCH) fieldWeight(crs_name_en:leadership in 4744),
product of:
  1.0 = tf(termFreq(crs_name_en:leadership)=1)
  5.147074 = idf(docFreq=566, maxDocs=35862)
  0.4375 = fieldNorm(field=crs_name_en, doc=4744)
  0.04061773 = (MATCH) weight(meta_en:leadership in 4744), product of:
0.076728955 = queryWeight(meta_en:leadership), product of:
  2.4450386 = idf(docFreq=8453, maxDocs=35862)
  0.03138149 = queryNorm
0.5293664 = (MATCH) fieldWeight(meta_en:leadership in 4744), product
of:
  1.7320508 = tf(termFreq(meta_en:leadership)=3)
  2.4450386 = idf(docFreq=8453, maxDocs=35862)
  0.125 = fieldNorm(field=meta_en, doc=4744)
  0.21165074 = (MATCH) weight(crs_desc_en:leadership in 4744), product
of:
0.15826634 = queryWeight(crs_desc_en:leadership), product of:
  5.043302 = 

Re: How to get SolrServer within my own servlet

2011-12-14 Thread Joey
Hi Chris,

There won't be deadlock I think because there is only one place(from my own
servlet) can trigger a index. 

Yes, I am trying to embed Solr application - I could separate my servlet to
another app and talk to Sorl via HTTP, but there will be two pieces(Solr and
my own app) of software I have to maintain - which is something I don't
like.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-SolrServer-within-my-own-servlet-tp3583304p3587157.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrate Lucene 2.9 To SOLR

2011-12-14 Thread Chris Hostetter

: I have a old project that use Lucene 2.9. Its possible to use the index
: created by lucene in SOLR? May i just copy de index to data directory of
: SOLR, or exists some mechanism to import Lucene index?

you can use an index created directly with lucene libraries in Solr, but 
in order for Solr to understand that index and do anything meaningful with 
it you have to configure solr with a schema.xml file that makes sense 
given the custom code used to build that index (ie: what fields did you 
store, what fields did you index, what analyzers did you use, what fields 
dod you index with term vectors, etc...)


-Hoss


Re: Too many connections in CLOSE_WAIT state on master solr server

2011-12-14 Thread samarth s
Thanks Erick and Mikhail. I'll try this out.

On Wed, Dec 14, 2011 at 7:11 PM, Erick Erickson erickerick...@gmail.com wrote:
 I'm guessing (and it's just a guess) that what's happening is that
 the container is queueing up your requests while waiting
 for the other connections to close, so Mikhail's suggestion
 seems like a good idea.

 Best
 Erick

 On Wed, Dec 14, 2011 at 12:28 AM, samarth s
 samarth.s.seksa...@gmail.com wrote:
 The updates to the master are user driven, and are needed to be
 visible quickly. Hence, the high frequency of replication. It may be
 that too many replication requests are being handled at a time, but
 why should that result in half closed connections?

 On Wed, Dec 14, 2011 at 2:47 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Replicating 40 cores every 20 seconds is just *asking* for trouble.
 How often do your cores change on the master? How big are
 they? Is there any chance you just have too many cores replicating
 at once?

 Best
 Erick

 On Tue, Dec 13, 2011 at 3:52 PM, Mikhail Khludnev
 mkhlud...@griddynamics.com wrote:
 You can try to reuse your connections (prevent them from closing) by
 specifying  
 -Dhttp.maxConnections=http://download.oracle.com/javase/1.4.2/docs/guide/net/properties.htmlN
 in jvm startup params. At client JVM!. Number should be chosen considering
 the number of connection you'd like to keep alive.

 Let me know if it works for you.

 On Tue, Dec 13, 2011 at 2:57 PM, samarth s 
 samarth.s.seksa...@gmail.comwrote:

 Hi,

 I am using solr replication and am experiencing a lot of connections
 in the state CLOSE_WAIT at the master solr server. These disappear
 after a while, but till then the master solr stops responding.

 There are about 130 open connections on the master server with the
 client as the slave m/c and all are in the state CLOSE_WAIT. Also, the
 client port specified on the master solr server netstat results is not
 visible in the netstat results on the client (slave solr) m/c.

 Following is my environment:
 - 40 cores in the master solr on m/c 1
 - 40 cores in the slave solr on m/c 2
 - The replication poll interval is 20 seconds.
 - Replication part in solrconfig.xml in the slave solr:
 requestHandler name=/replication class=solr.ReplicationHandler 
           lst name=slave

                   !--fully qualified url for the replication handler
 of master--
                   str name=masterUrl$mastercorename/replication/str

                   !--Interval in which the slave should poll master
 .Format is HH:mm:ss . If this is absent slave does not poll
 automatically.
                                But a fetchindex can be triggered from
 the admin or the http API--
                   str name=pollInterval00:00:20/str
                   !-- The following values are used when the slave
 connects to the master to download the index files.
                               Default values implicitly set as 5000ms
 and 1ms respectively. The user DOES NOT need to specify
                               these unless the bandwidth is extremely
 low or if there is an extremely high latency--
                   str name=httpConnTimeout5000/str
                   str name=httpReadTimeout1/str
          /lst
   /requestHandler

 Thanks for any pointers.

 --
 Regards,
 Samarth




 --
 Sincerely yours
 Mikhail Khludnev
 Developer
 Grid Dynamics
 tel. 1-415-738-8644
 Skype: mkhludnev
 http://www.griddynamics.com
  mkhlud...@griddynamics.com



 --
 Regards,
 Samarth



-- 
Regards,
Samarth


Re: Delta Replication in SOLR

2011-12-14 Thread Walter Underwood
On Dec 14, 2011, at 9:58 PM, mechravi25 wrote:

 We would like know whether it is possible to replicate only a certain
 documents from master to slave. More like a Delta Replication process. 

No, it is not.

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Solr Search Across Multiple Cores not working when quering on specific field

2011-12-14 Thread pravesh
but when i searched on a specific field than it is not working
http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1;
q=mnemonic_value:United

Why distributed search is not working when i search on a particular
field.? 

Since you have multiple shard infra, do the cores share the same
configurations(schema.xml/solrconfig.xml etc.)?? What error/output you are
getting for sharded query?

Regards
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-Across-Multiple-Cores-not-working-when-quering-on-specific-field-tp3585013p3587890.html
Sent from the Solr - User mailing list archive at Nabble.com.