RE: Regarding Copyfield

2013-01-15 Thread Harshvardhan Ojha
What is your  text_general  type definition in schema.xml?

-Original Message-
From: anurag.jain [mailto:anurag.k...@gmail.com] 
Sent: Tuesday, January 15, 2013 12:16 PM
To: solr-user@lucene.apache.org
Subject: Regarding Copyfield

hi

in copy field i am not storing first_name last_name etc. but in dest = text 
it is showing first_name .. etc. in auto suggestion mode.

my copy field are ..
   copyField source=percentage dest=text/
   copyField source=university_name dest=text/
   copyField source=course_name dest=text/
   ...

and field are ..
field name=id type=text_general indexed=true stored=true
required=true multiValued=false / 
   field name=first_name type=text_general indexed=false
stored=true/
   field name=last_name type=text_general indexed=false
stored=true/
   field name=date_of_birth type=text_general indexed=false
stored=true/
   field name=state_name type=text_general indexed=false
stored=true/
   field name=mobile_no type=text_general indexed=false
stored=true/
   ...


and also i want to make own field like text  named as autosuggest then it is 
also not working for autosuggestion. 



please reply urgent





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regarding-Copyfield-tp4033385.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multicore configuration

2013-01-15 Thread Bruno Dusausoy

Hi,

I'd like to use two separate indexes (Solr 3.6.1).
I've read several wiki pages and looked at the multicore example bundled 
with the distribution but it seems I missing something.



I have this hierarchy :
solr-home/
|
-- conf
  |
   -- solr.xml
   -- solrconfig.xml (if I don't put it, solr complains)
   -- schema.xml (idem)
   -- ...
|
-- cores
  |
  -- dossier
|
 -- conf
   |
-- dataconfig.xml
-- schema.xml
-- solrconfig.xml
|
 -- data
  |
  -- procedure
|
 -- conf
   |
-- dataconfig.xml
-- schema.xml
-- solrconfig.xml
|
 -- data

Here's the content of my solr.xml file :
http://paste.debian.net/224818/

And I launch my servlet container with 
-Dsolr.solr.home=my-directory/solr-home.


I've put nearly nothing in my solr-home/conf/schema.xml so Solr 
complains, but that's not the point.


When I go to the admin of core dossier,
http://localhost:8080/solr/dossier/admin, the container says it doesn't 
exist.
But when I go to http://localhost:8080/solr/admin it finds it, which 
makes me guess that Solr is stil in single core mode.


What am I missing ?

Regards.
--
Bruno Dusausoy
Software Engineer
YP5 Software
--
Pensez environnement : limitez l'impression de ce mail.
Please don't print this e-mail unless you really need to.


Re: Multicore configuration

2013-01-15 Thread Dariusz Borowski
Hi Bruno,

Maybe this helps. I wrote something about it:
http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/

Dariusz



On Tue, Jan 15, 2013 at 9:52 AM, Bruno Dusausoy bdusau...@yp5.be wrote:

 Hi,

 I'd like to use two separate indexes (Solr 3.6.1).
 I've read several wiki pages and looked at the multicore example bundled
 with the distribution but it seems I missing something.


 I have this hierarchy :
 solr-home/
 |
 -- conf
   |
-- solr.xml
-- solrconfig.xml (if I don't put it, solr complains)
-- schema.xml (idem)
-- ...
 |
 -- cores
   |
   -- dossier
 |
  -- conf
|
 -- dataconfig.xml
 -- schema.xml
 -- solrconfig.xml
 |
  -- data
   |
   -- procedure
 |
  -- conf
|
 -- dataconfig.xml
 -- schema.xml
 -- solrconfig.xml
 |
  -- data

 Here's the content of my solr.xml file :
 http://paste.debian.net/**224818/ http://paste.debian.net/224818/

 And I launch my servlet container with -Dsolr.solr.home=my-directory/**
 solr-home.

 I've put nearly nothing in my solr-home/conf/schema.xml so Solr complains,
 but that's not the point.

 When I go to the admin of core dossier,
 http://localhost:8080/solr/**dossier/adminhttp://localhost:8080/solr/dossier/admin,
 the container says it doesn't exist.
 But when I go to 
 http://localhost:8080/solr/**adminhttp://localhost:8080/solr/adminit finds 
 it, which makes me guess that Solr is stil in single core mode.

 What am I missing ?

 Regards.
 --
 Bruno Dusausoy
 Software Engineer
 YP5 Software
 --
 Pensez environnement : limitez l'impression de ce mail.
 Please don't print this e-mail unless you really need to.



Solr Query | Loading documents with large content (Performance)

2013-01-15 Thread Uwe Clement
Hi there,



sometimes we have to load very big documents, 1-2 multi-value-fields of it
can contain 10.000 items. And unfortunately we need this informations.



We have to load 50 documents in order to show to the result table in the
UI.



The query takes around 50 seconds. I guess 48 seconds of it, is just to
transfer the content of the documents over the net.



What can I do here?

-I know, can take out this long informations outside of the document.
But this is also not really a solution

-Then I was thinking about compressed-fields. They come with solr 4.1
again, right?



How is it with compressed field. As I understood the stored field will be
stored in a compressed way. Ok, but when they will be uncompressed?

-Before sending back to the client on server-side?

-Or, on the clientside? I am using solrJ.



Any other ideas? Can it work to increase the query performance using
compressed fields?



Thanks a lot for your ideas and answers!



Regards

Uwe







--

  Uwe Clement

  Software Architect

  Project Manager



___ |X__

   X|





eXXcellent solutions gmbh

Beim Alten Fritz 2



D-89075 Ulm



e |  mailto:uwe.clem...@exxcellent.de uwe.clem...@exxcellent.de

m | +49 [0]151-275 692 27

i |  http://www.exxcellent.de http://www.exxcellent.de







Geschäftsführer: Dr. Martina Burgetsmeier, Wilhelm Zorn, Gerhard Gruber
Sitz der Gesellschaft: Ulm, Registergericht: Ulm HRB 4309





Re: Multicore configuration

2013-01-15 Thread Upayavira
You should put your solr.xml into your 'cores' directory, and set
-Dsolr.solr.home=cores

That should get you going. 'cores' *is* your Solr Home. Otherwise, your
instanceDir entries in your current solr.xml will need correct paths to
../cores/procedure/ etc.

Upayavira

On Tue, Jan 15, 2013, at 08:52 AM, Bruno Dusausoy wrote:
 Hi,
 
 I'd like to use two separate indexes (Solr 3.6.1).
 I've read several wiki pages and looked at the multicore example bundled 
 with the distribution but it seems I missing something.
 
 
 I have this hierarchy :
 solr-home/
 |
 -- conf
|
 -- solr.xml
 -- solrconfig.xml (if I don't put it, solr complains)
 -- schema.xml (idem)
 -- ...
 |
 -- cores
|
-- dossier
  |
   -- conf
 |
  -- dataconfig.xml
  -- schema.xml
  -- solrconfig.xml
  |
   -- data
|
-- procedure
  |
   -- conf
 |
  -- dataconfig.xml
  -- schema.xml
  -- solrconfig.xml
  |
   -- data
 
 Here's the content of my solr.xml file :
 http://paste.debian.net/224818/
 
 And I launch my servlet container with 
 -Dsolr.solr.home=my-directory/solr-home.
 
 I've put nearly nothing in my solr-home/conf/schema.xml so Solr 
 complains, but that's not the point.
 
 When I go to the admin of core dossier,
 http://localhost:8080/solr/dossier/admin, the container says it doesn't 
 exist.
 But when I go to http://localhost:8080/solr/admin it finds it, which 
 makes me guess that Solr is stil in single core mode.
 
 What am I missing ?
 
 Regards.
 -- 
 Bruno Dusausoy
 Software Engineer
 YP5 Software
 --
 Pensez environnement : limitez l'impression de ce mail.
 Please don't print this e-mail unless you really need to.


Re: Multicore configuration

2013-01-15 Thread Bruno Dusausoy

Dariusz Borowski a écrit :

Hi Bruno,

Maybe this helps. I wrote something about it:
http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/


Hi Darius,

Thanks for the link.
I've found my - terrible - mistake : solr.xml was not in solr.home dir 
but in solr.home/conf dir, so it didn't take it :-/

It works perfectly now.

Sorry for the noise.

Regards.
--
Bruno Dusausoy
Software Engineer
YP5 Software
--
Pensez environnement : limitez l'impression de ce mail.
Please don't print this e-mail unless you really need to.


Re: Performance issue with group.ngroups=true

2013-01-15 Thread Mickael Magniez
Hi,

Retry on a better machine (2CPU, 8GB RAM, 1.5GB for java half used according
admin interface) still have the same issue.

It seems to grow with matches count : with a search matching 100k documents,
it takes 700ms, vs 70ms without ngroup (CPU is 100% during request)

For information, my index has 1M documents, for 700MB of data.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-issue-with-group-ngroups-true-tp4031888p4033422.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Results in same or different fields

2013-01-15 Thread Harshvardhan Ojha
Hi Gastone,

I am not very sure, but I think phrase query will resolve this problem.
q=title:white house will always have higher relevance that term white and 
house separately.

Regards
Harshvardhan Ojha

-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Tuesday, January 15, 2013 2:46 PM
To: solr-user@lucene.apache.org
Subject: Results in same or different fields

Hi,
i'm using solr 4.0 with edismax search handler.
i'm searching inside 3 fields with same boost.
i'd like to have high score for results in the same fields, instead of results 
in different fields

es.
qf=title,description


if white house is found in title, it must have higher score than white in 
title field and house in description field

how is  it possible?

ps. i set

omitTermFreqAndPositions=true


for all fields

thanx


*Gastone Penzo*
*
*


DataImportHandlerException: Unable to execute query with OPTIM

2013-01-15 Thread ashimbose
I have tried to search for my specific problem but have not found solution. I
have also read the wiki on the DIH and seem to have everything set up right
but my Query still fails. Thank you for your help

I am running Solr 3.6.1 with Tomcat 6.0 Windows7 64bit and IBM Optim Archive
File

I have all  jar file sitting in C:\Program Files\Apache Software
Foundation\Tomcat 6.0\lib

My solrconfig.xml is

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=configdb-data-config.xml/str  
  /lst
  /requestHandler

My db-data-config.xml

?xml version=1.0 encoding=utf-8?
dataConfig
  dataSource type=JdbcDataSource name=SAMPLE_OPTIM_DB
driver=com.ibm.optim.connect.jdbc.NvDriver
url=jdbc:attconnect://198.168.2.89:2551/NAVIGATOR;DefTdpName=SAMPLE_OPTIM_DB
batchSize=-1 user= password= readOnly=True /
  document name=headwords   
 entity name=CUSTOMERS dataSource=SAMPLE_OPTIM_DB query=SELECT * 
FROM
SAMPLE_OPTIM_DB:CUSTOMERS transformer=RegexTransformer
  field column=CUSTNAME name=CUSTNAME/
/entity
  /document
/dataConfig

I am Having below error

WARNING: no uniqueKey specified in schema.
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init
INFO: [core0] Opening new SolrCore at solr\core0\, dataDir=solr/core0\data\
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init
INFO: JMX monitoring not detected for core: core0
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners
INFO: [core0] Added SolrEventListener for newSearcher: org.apache.solr.core.
QuerySenderListener{queries=[{q=solr,start=0,rows=10},
{q=rocks,start=0,rows=10}
, {q=static newSearcher warming query from solrconfig.xml}]}
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners
INFO: [core0] Added SolrEventListener for firstSearcher: org.apache.solr.cor
e.QuerySenderListener{queries=[]}
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created standard: solr.StandardRequestHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created /dataimport:
org.apache.solr.handler.dataimport.DataImportHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created /search: org.apache.solr.handler.component.SearchHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created /update: solr.XmlUpdateRequestHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@53bc93fe main
Jan 15, 2013 4:05:44 PM org.apache.solr.update.CommitTracker init
INFO: commitTracker AutoCommit: disabled
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding 
component:org.apache.solr.handler.component.QueryComponent@781fb1f
b
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding 
component:org.apache.solr.handler.component.FacetComponent@68de135
9
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding 
component:org.apache.solr.handler.component.MoreLikeThisComponent@
4bc86dd8
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding 
component:org.apache.solr.handler.component.HighlightComponent@53a
3a6c6
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding 
component:org.apache.solr.handler.component.StatsComponent@1d1a3c1
0
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding  debug
component:org.apache.solr.handler.component.DebugComponent@2
55d4d5d
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting socketTimeout to: 0
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting urlScheme to: http://
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting connTimeout to: 0
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting maxConnectionsPerHost to: 20
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting corePoolSize to: 0
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting maximumPoolSize to: 2147483647
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting maxThreadIdleTime to: 5
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting sizeOfQueue to: -1
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting fairnessPolicy to: false
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.dataimport.DataImportHandler
pro
cessConfiguration
INFO: Processing 

Re: Index data from multiple tables into Solr

2013-01-15 Thread Naresh
Get user's input, form the solr query and send a request to the server (you
can also pass a parameter called wt (xml,json etc) to direct solr to return
output in that format). Parse the results from solr and display them to
user in your website.

Depending on what kind of server-side programming language you are using,
there might be some libraries available that will allow to integrate your
web-application with solr (for example: sunspot_solr in ruby)

On Tue, Jan 15, 2013 at 5:24 AM, hassancrowdc hassancrowdc...@gmail.comwrote:

 thanx, I got it.

 How Can i integrate solr with my website? so that i can use it for search?


 On Mon, Jan 14, 2013 at 4:04 PM, Lance Norskog-2 [via Lucene] 
 ml-node+s472066n4033291...@n3.nabble.com wrote:

  Try all of the links under the collection name in the lower left-hand
  columns. There several administration monitoring tools you may find
  useful.
 
  On 01/14/2013 11:45 AM, hassancrowdc wrote:
 
   ok stats are changing, so the data is indexed. But how can i do query
  with
   this data, or ow can i search it, like the command will be
   http://localhost:8983/solr/select?q=(any of my field column from
  table)?
   coz whatever i am putting in my url it shows me an xml file but the
   numFound are always 0?
  
  
   On Sat, Jan 12, 2013 at 1:24 PM, Alexandre Rafalovitch [via Lucene] 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=4033291i=0
  wrote:
  
   Have you tried the Admin interface yet? The one on :8983 port if you
  are
   running default setup. That has a bunch of different stats you can
 look
  at
   apart from a nice way of doing a query. I am assuming you are on Solr
  4,
   of
   course.
  
   Regards,
   Alex.
  
   On Fri, Jan 11, 2013 at 5:13 PM, hassancrowdc [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4032778i=0wrote:
  
  
   So, I followed all the steps and solr is working successfully, Can
 you
   please tell me how i can see if my data is indexed or not? do i have
  to
   enter specific url into my browser or anything. I want to make sure
  that
   the data is indexed.
  
  
  
   Personal blog: http://blog.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
 at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
  
  
   --
 If you reply to this email, your message will be added to the
  discussion
   below:
  
  
 
   .
   NAML
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
  
  
  
  
   --
   View this message in context:
 
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033268.html
 
   Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
  --
   If you reply to this email, your message will be added to the discussion
  below:
 
 
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033291.html
   To unsubscribe from Index data from multiple tables into Solr, click
 here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM=
 
  .
  NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033296.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards
Naresh


Re: Performance issue with group.ngroups=true

2013-01-15 Thread Mikhail Khludnev
Mickael,

I just wonder you have considered BlockJoin? it performs much better than
query time approaches
http://blog.griddynamics.com/2012/08/block-join-query-performs.html ,but
faceting hasn't been implemented for it yet.


On Tue, Jan 15, 2013 at 2:01 PM, Mickael Magniez
mickaelmagn...@gmail.comwrote:

 Hi,

 Retry on a better machine (2CPU, 8GB RAM, 1.5GB for java half used
 according
 admin interface) still have the same issue.

 It seems to grow with matches count : with a search matching 100k
 documents,
 it takes 700ms, vs 70ms without ngroup (CPU is 100% during request)

 For information, my index has 1M documents, for 700MB of data.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Performance-issue-with-group-ngroups-true-tp4031888p4033422.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: access matched token ids in the FacetComponent?

2013-01-15 Thread Mikhail Khludnev
Dmitry,

I have some relevant experience and ready to help, but I can not get the
core problem. Could you please expand the description and/or provide a
sample?


On Tue, Jan 15, 2013 at 11:01 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello!

 Is there a simple way of accessing the matched token ids in the
 FacetComponent? The use case is to text search on one field and facet on
 another. And in the facet counts we want to see the text hit counts.
 Can it be done via some other component / approach?

 Any input is greatly appreciated.

 Dmitry




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: SOlr 3.5 and sharding

2013-01-15 Thread Erick Erickson
You're confusing shards and slaves here. Shards are splitting a logical
index amongst N machines, where each machine contains a portion of the
index. In that setup, you have to configure the slaves to know about the
other shards, and the incoming query has to be distributed amongst all the
shards to find all the docs.

In your case, since you're really replicating (rather than sharding), you
only have to query _one_ slave, the query doesn't need to be distributed.

So pull all the sharding stuff out of your config files, put a load
balancer in front of your slaves and only send the request to one of them
would be the place I'd start.

Also, don't be at all surprised if the number of hits from the _master_
(which you shouldn't be searching, BTW) is different than the slaves,
there's the polling interval to consider.

Best
Erick


On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon 
jean-sebastien.vac...@wantedanalytics.com wrote:

 Hi,

 I`m setting up a small Sorl setup consisting of 1 master node and 4
 shards. For now, all four shards contains the exact same data. When I
 perform a query on each individual shards for the word `java` I am
 receiving the same number of docs (as expected). However, when I am going
 through the master node using the shards parameters, the number of results
 is slightly off by a few documents. There is nothing special in my setup so
 I`m looking for hints on why I am getting this problem

 Thanks



Re: SolrCloud :: Adding replica :: Sync-up issue

2013-01-15 Thread Erick Erickson
Trying again, original reply rejected as spam.

This won't be all that helpful, but 4.1 has a lot of improvements as
far as SolrCloud is concerned, and it's in the process of being put
together now.

So I suspect the best use of time would be to work with 4.1 (or a
nightly build between now and then, or a build off the 4.1 branch) and
report of the issue is still there.

As I said, not much help but

Best,
Erick


Error loading plugin

2013-01-15 Thread Mickael Magniez
Hi,

I'm trying to write my own search handler, but i have problem loading it
into solr.

Error message is : 
Caused by: org.apache.solr.common.SolrException: Error loading class
'com.company.solr.GroupRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:438)
... 14 more
Caused by: java.lang.ClassNotFoundException:
com.company.solr.GroupRequestHandler

The .jar file is loaded at startup : 
INFO: Adding
'file:/home/solr/solr/apache-solr-4.1-2013-01-10_05-50-28/company/solr/lib/GroupRequestHandler.jar'
to classloader

My jar seems correct : contains one file
com/company/solr/GroupRequestHandler.class


Any idea,

Mickael




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-loading-plugin-tp4033454.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: retrieving latest document **only**

2013-01-15 Thread J Mohamed Zahoor

The sum of all the count in the groups… does not match the total no of docs 
found.

./zahoor


On 12-Jan-2013, at 1:27 PM, Upayavira u...@odoko.co.uk wrote:

 Not sure exactly what you mean, can you give an example?
 
 Upayavira
 
 On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote:
 Cool… it worked… But the count of all the groups and the count inside
 stats component does not match…
 Is that a bug?
 
 ./zahoor
 
 
 On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote:
 
 could you use field collapsing? Boost by date and only show one value
 per group, and you'll have the most recent document only.
 
 Upayavira
 
 On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
 one crude way is first query and pick the latest date from the result
 then issue a query with q=timestamp[latestDate TO latestDate]
 
 But i dont want to execute two queries...
 
 ./zahoor
 
 On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote:
 
 
 
 
 What do you want?
 'the most recent ones' or '**only** the latest' ?
 
 Perhaps a range query q=timestamp:[refdate TO NOW] will match your 
 needs.
 
 Uwe
 
 
 
 I need **only** the latest documents...
 in the above query , refdate can vary based on the query.
 
 ./zahoor
 
 
 
 
 



Re: retrieving latest document **only**

2013-01-15 Thread Upayavira
Is your group field multivalued? Could docs appear in more than one
group?

Upayavira

On Tue, Jan 15, 2013, at 01:22 PM, J Mohamed Zahoor wrote:
 
 The sum of all the count in the groups… does not match the total no of
 docs found.
 
 ./zahoor
 
 
 On 12-Jan-2013, at 1:27 PM, Upayavira u...@odoko.co.uk wrote:
 
  Not sure exactly what you mean, can you give an example?
  
  Upayavira
  
  On Sat, Jan 12, 2013, at 06:32 AM, J Mohamed Zahoor wrote:
  Cool… it worked… But the count of all the groups and the count inside
  stats component does not match…
  Is that a bug?
  
  ./zahoor
  
  
  On 11-Jan-2013, at 6:48 PM, Upayavira u...@odoko.co.uk wrote:
  
  could you use field collapsing? Boost by date and only show one value
  per group, and you'll have the most recent document only.
  
  Upayavira
  
  On Fri, Jan 11, 2013, at 01:10 PM, jmozah wrote:
  one crude way is first query and pick the latest date from the result
  then issue a query with q=timestamp[latestDate TO latestDate]
  
  But i dont want to execute two queries...
  
  ./zahoor
  
  On 11-Jan-2013, at 6:37 PM, jmozah jmo...@gmail.com wrote:
  
  
  
  
  What do you want?
  'the most recent ones' or '**only** the latest' ?
  
  Perhaps a range query q=timestamp:[refdate TO NOW] will match your 
  needs.
  
  Uwe
  
  
  
  I need **only** the latest documents...
  in the above query , refdate can vary based on the query.
  
  ./zahoor
  
  
  
  
  
 


RE: SOlr 3.5 and sharding

2013-01-15 Thread Jean-Sebastien Vachon
Hi Erick,

Thanks for your comments but I am migrating an existing index (single instance) 
to a sharded setup and currently I have no access to the code involved in the 
indexation process. That`s why I made a simple copy of the index on each shards.

In the end, the data will be distributed among all shards.

I was just curious to know why I had not the expected number of documents with 
my four shards.

Can you elaborate on  this polling interval thing? I am pretty sure I never 
eared about this... 

Regards

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: January-15-13 8:00 AM
To: solr-user@lucene.apache.org
Subject: Re: SOlr 3.5 and sharding

You're confusing shards and slaves here. Shards are splitting a logical index 
amongst N machines, where each machine contains a portion of the index. In that 
setup, you have to configure the slaves to know about the other shards, and the 
incoming query has to be distributed amongst all the shards to find all the 
docs.

In your case, since you're really replicating (rather than sharding), you only 
have to query _one_ slave, the query doesn't need to be distributed.

So pull all the sharding stuff out of your config files, put a load balancer in 
front of your slaves and only send the request to one of them would be the 
place I'd start.

Also, don't be at all surprised if the number of hits from the _master_ (which 
you shouldn't be searching, BTW) is different than the slaves, there's the 
polling interval to consider.

Best
Erick


On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon  
jean-sebastien.vac...@wantedanalytics.com wrote:

 Hi,

 I`m setting up a small Sorl setup consisting of 1 master node and 4 
 shards. For now, all four shards contains the exact same data. When I 
 perform a query on each individual shards for the word `java` I am 
 receiving the same number of docs (as expected). However, when I am 
 going through the master node using the shards parameters, the number 
 of results is slightly off by a few documents. There is nothing 
 special in my setup so I`m looking for hints on why I am getting this 
 problem

 Thanks


-
Aucun virus trouvé dans ce message.
Analyse effectuée par AVG - www.avg.fr
Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013


Tutorial for Solr query language, dismax and edismax?

2013-01-15 Thread eShard
Does anyone have a great tutorial for learning the solr query language,
dismax and edismax?
I've searched endlessly for one but I haven't been able to locate one that
is comprehensive enough and has a lot of examples (that actually work!).
I also tried to use wildcards, logical operators, and a phrase search and it
either didn't work or behave the way I thought it would.

for example, I tried to search a multivalued field solr.title and a content
field that contains their phone number (and a lot of other data)
so, from the solr admin query page;
in the q field i tried lots of variations of this- solr.title:*Costa,
Julie* AND content:tel=
And I either got 0 results or ALL the results.
solr.title would only work if I put in solr.title:*Costa* but not anything
longer than that. Even though there are plenty of Costa, J's (John, Julie,
Julia, Jerry etc)
I should be able to do a phrase search out of the box, shouldn't I?
I also read on one site that only edismax can use logical operators but I
couldn't get that to work either.
Can anyone point me in the right direction?
I'm currently using Solr 4.0 Final with ManifoldCF v 1.2 dev

Thank you,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tutorial-for-Solr-query-language-dismax-and-edismax-tp4033465.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOlr 3.5 and sharding

2013-01-15 Thread Upayavira
He was referring to master/slave setup, where a slave will poll the
master periodically asking for index updates. That frequency is
configured in solrconfig.xml on the slave.

So, you are saying that you have, say 1m documents in your master index.
You then copy your index to four other boxes. At that point you have 1m
documents on each of those four. Eventually, you'll delete some docs,
so'd you have 250k on each. You're wondering, before the deletes, you're
not seeing 1m docs on each of your instances.

Or are you wondering why you're not seeing 1m docs when you do a
distributed query across all for of these boxes?

Is that correct? 

Upayavira

On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote:
 Hi Erick,
 
 Thanks for your comments but I am migrating an existing index (single
 instance) to a sharded setup and currently I have no access to the code
 involved in the indexation process. That`s why I made a simple copy of
 the index on each shards.
 
 In the end, the data will be distributed among all shards.
 
 I was just curious to know why I had not the expected number of documents
 with my four shards.
 
 Can you elaborate on  this polling interval thing? I am pretty sure I
 never eared about this... 
 
 Regards
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com] 
 Sent: January-15-13 8:00 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOlr 3.5 and sharding
 
 You're confusing shards and slaves here. Shards are splitting a logical
 index amongst N machines, where each machine contains a portion of the
 index. In that setup, you have to configure the slaves to know about the
 other shards, and the incoming query has to be distributed amongst all
 the shards to find all the docs.
 
 In your case, since you're really replicating (rather than sharding), you
 only have to query _one_ slave, the query doesn't need to be distributed.
 
 So pull all the sharding stuff out of your config files, put a load
 balancer in front of your slaves and only send the request to one of them
 would be the place I'd start.
 
 Also, don't be at all surprised if the number of hits from the _master_
 (which you shouldn't be searching, BTW) is different than the slaves,
 there's the polling interval to consider.
 
 Best
 Erick
 
 
 On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon 
 jean-sebastien.vac...@wantedanalytics.com wrote:
 
  Hi,
 
  I`m setting up a small Sorl setup consisting of 1 master node and 4 
  shards. For now, all four shards contains the exact same data. When I 
  perform a query on each individual shards for the word `java` I am 
  receiving the same number of docs (as expected). However, when I am 
  going through the master node using the shards parameters, the number 
  of results is slightly off by a few documents. There is nothing 
  special in my setup so I`m looking for hints on why I am getting this 
  problem
 
  Thanks
 
 
 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date:
 14/01/2013


Re: how to optimize same query with different start values

2013-01-15 Thread Upayavira
You are setting yourself up for disaster.

If you ask Solr for documents 1000 to 1010, it needs to sort documents 1
to 1010, and discard the first 1000, which causes horrible performance.

I'm curious to hear if others have strategies to extract content
sequentially from an index. I suspect a new SearchComponent could really
help here.

I suspect it would work better if you don't sort at all, in which case
you'll return the documents in index order. The issue is that a commit,
or a background merge could change index order which would mess up your
export.

Sorry no clearer answers.

Upayavira

On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote:
 Hello,
 
 I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I
 want
 to read (in a java test application) all my documents, but not in one
 shot
 (because it takes too much memory).
 
 So I send the same request, over and over, with
 
 q=*:*
 rows=1000
 sort=id desc  = to be sure I always get same ordering*
 and start parameter increased of 1000 at each iteration
 
 
 checking the solr logs, I realized that the query responding time
 increases
 as the start parameter gets bigger
 
 for instance
 
 with start  500 000, it takes about 500ms
 with start  1 100 000  and  1 200 000, it takes between 5000 and 5200
 ms
 with start  1 250 000 and  1 320 000, it takes between 6100 and 6400 ms
 
 
 Does someone have an idea how to optimize this query?
 
 Thanks,
 Elisabeth


Re: Results in same or different fields

2013-01-15 Thread Uwe Reh

Hi,

maybe it helps to have a closer look on the other params of edismax.

http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29


'mm=2' will be to strong, but th usage of pf, pf2, and pf is likely your 
solution.


uwe


Am 15.01.2013 10:15, schrieb Gastone Penzo:

Hi,
i'm using solr 4.0 with edismax search handler.
i'm searching inside 3 fields with same boost.
i'd like to have high score for results in the same fields,
instead of results in different fields

es.
qf=title,description


if white house is found in title, it must have higher score
than white in title field and house in description field

how is  it possible?

ps. i set

omitTermFreqAndPositions=true


for all fields

thanx


*Gastone Penzo*
*
*





RE: SOlr 3.5 and sharding

2013-01-15 Thread Jean-Sebastien Vachon
Ok I see what Erick`s meant now.. Thanks.

The original index I`m working on contains about 120k documents. Since I have 
no access to the code that pushes documents into the index, I made four copies 
of the same index.

The master node contains no data at all, it simply use the data available in 
its four shards. Knowing that I have 1000 documents matching the keyword java 
on each shard I was expecting to receive 4000 documents out of my sharded 
setup. There are only a few documents that are not accounted for (The result 
count is about 3996 which is pretty close but not accurate).

Right now, the index is static so there is no need for any replication so the 
polling interval has no effect.
Later this week, I will configure the replication and have the indexation 
modified to  distribute the documents to each shard using a simple ID modulo 4 
rule.

Were my expectations wrong about the number  of documents? 

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: January-15-13 9:21 AM
To: solr-user@lucene.apache.org
Subject: Re: SOlr 3.5 and sharding

He was referring to master/slave setup, where a slave will poll the master 
periodically asking for index updates. That frequency is configured in 
solrconfig.xml on the slave.

So, you are saying that you have, say 1m documents in your master index.
You then copy your index to four other boxes. At that point you have 1m 
documents on each of those four. Eventually, you'll delete some docs, so'd you 
have 250k on each. You're wondering, before the deletes, you're not seeing 1m 
docs on each of your instances.

Or are you wondering why you're not seeing 1m docs when you do a distributed 
query across all for of these boxes?

Is that correct? 

Upayavira

On Tue, Jan 15, 2013, at 02:11 PM, Jean-Sebastien Vachon wrote:
 Hi Erick,
 
 Thanks for your comments but I am migrating an existing index (single
 instance) to a sharded setup and currently I have no access to the 
 code involved in the indexation process. That`s why I made a simple 
 copy of the index on each shards.
 
 In the end, the data will be distributed among all shards.
 
 I was just curious to know why I had not the expected number of 
 documents with my four shards.
 
 Can you elaborate on  this polling interval thing? I am pretty sure 
 I never eared about this...
 
 Regards
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: January-15-13 8:00 AM
 To: solr-user@lucene.apache.org
 Subject: Re: SOlr 3.5 and sharding
 
 You're confusing shards and slaves here. Shards are splitting a 
 logical index amongst N machines, where each machine contains a 
 portion of the index. In that setup, you have to configure the slaves 
 to know about the other shards, and the incoming query has to be 
 distributed amongst all the shards to find all the docs.
 
 In your case, since you're really replicating (rather than sharding), 
 you only have to query _one_ slave, the query doesn't need to be distributed.
 
 So pull all the sharding stuff out of your config files, put a load 
 balancer in front of your slaves and only send the request to one of 
 them would be the place I'd start.
 
 Also, don't be at all surprised if the number of hits from the 
 _master_ (which you shouldn't be searching, BTW) is different than the 
 slaves, there's the polling interval to consider.
 
 Best
 Erick
 
 
 On Mon, Jan 14, 2013 at 9:58 AM, Jean-Sebastien Vachon  
 jean-sebastien.vac...@wantedanalytics.com wrote:
 
  Hi,
 
  I`m setting up a small Sorl setup consisting of 1 master node and 4 
  shards. For now, all four shards contains the exact same data. When 
  I perform a query on each individual shards for the word `java` I am 
  receiving the same number of docs (as expected). However, when I am 
  going through the master node using the shards parameters, the 
  number of results is slightly off by a few documents. There is 
  nothing special in my setup so I`m looking for hints on why I am 
  getting this problem
 
  Thanks
 
 
 -
 Aucun virus trouvé dans ce message.
 Analyse effectuée par AVG - www.avg.fr
 Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date:
 14/01/2013

-
Aucun virus trouvé dans ce message.
Analyse effectuée par AVG - www.avg.fr
Version: 2013.0.2890 / Base de données virale: 2638/6032 - Date: 14/01/2013


Re: how to optimize same query with different start values

2013-01-15 Thread Mikhail Khludnev
It's a well know search engines limitation. This post will help you get
into the core problem
http://www.searchworkings.org/blog/-/blogs/lucene-solr-and-deep-paging . it
seems that the solution is contributed into Lucene, but not yet for Solr.


On Tue, Jan 15, 2013 at 6:36 PM, Upayavira u...@odoko.co.uk wrote:

 You are setting yourself up for disaster.

 If you ask Solr for documents 1000 to 1010, it needs to sort documents 1
 to 1010, and discard the first 1000, which causes horrible performance.

 I'm curious to hear if others have strategies to extract content
 sequentially from an index. I suspect a new SearchComponent could really
 help here.

 I suspect it would work better if you don't sort at all, in which case
 you'll return the documents in index order. The issue is that a commit,
 or a background merge could change index order which would mess up your
 export.

 Sorry no clearer answers.

 Upayavira

 On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote:
  Hello,
 
  I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I
  want
  to read (in a java test application) all my documents, but not in one
  shot
  (because it takes too much memory).
 
  So I send the same request, over and over, with
 
  q=*:*
  rows=1000
  sort=id desc  = to be sure I always get same ordering*
  and start parameter increased of 1000 at each iteration
 
 
  checking the solr logs, I realized that the query responding time
  increases
  as the start parameter gets bigger
 
  for instance
 
  with start  500 000, it takes about 500ms
  with start  1 100 000  and  1 200 000, it takes between 5000 and 5200
  ms
  with start  1 250 000 and  1 320 000, it takes between 6100 and 6400 ms
 
 
  Does someone have an idea how to optimize this query?
 
  Thanks,
  Elisabeth




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


RE: DataImportHandlerException: Unable to execute query with OPTIM

2013-01-15 Thread Dyer, James
I think your JDBC driver is complaining because it doesn't like what is being 
set for the fetch size on the Statement.  Fetch size is controlled by the 
batchSize parameter on dataSource /  .

Using batchSize=-1, I believe, is a workaround for MySql but I suspect your 
driver requires it to be 0 (or at least -1).  If you omit batchSize 
entirely, DIH sets it to 500 as a default.  Also, setting it to -1 causes DIH 
to change this to Integer.MIN_VALUE.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: ashimbose [mailto:ashimb...@gmail.com]
Sent: Tuesday, January 15, 2013 4:48 AM
To: solr-user@lucene.apache.org
Subject: DataImportHandlerException: Unable to execute query with OPTIM

I have tried to search for my specific problem but have not found solution. I
have also read the wiki on the DIH and seem to have everything set up right
but my Query still fails. Thank you for your help

I am running Solr 3.6.1 with Tomcat 6.0 Windows7 64bit and IBM Optim Archive
File

I have all  jar file sitting in C:\Program Files\Apache Software
Foundation\Tomcat 6.0\lib

My solrconfig.xml is

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=configdb-data-config.xml/str
  /lst
  /requestHandler

My db-data-config.xml

?xml version=1.0 encoding=utf-8?
dataConfig
  dataSource type=JdbcDataSource name=SAMPLE_OPTIM_DB
driver=com.ibm.optim.connect.jdbc.NvDriver
url=jdbc:attconnect://198.168.2.89:2551/NAVIGATOR;DefTdpName=SAMPLE_OPTIM_DB
batchSize=-1 user= password= readOnly=True /
  document name=headwords
 entity name=CUSTOMERS dataSource=SAMPLE_OPTIM_DB query=SELECT * 
FROM
SAMPLE_OPTIM_DB:CUSTOMERS transformer=RegexTransformer
  field column=CUSTNAME name=CUSTNAME/
/entity
  /document
/dataConfig

I am Having below error

WARNING: no uniqueKey specified in schema.
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init
INFO: [core0] Opening new SolrCore at solr\core0\, dataDir=solr/core0\data\
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore init
INFO: JMX monitoring not detected for core: core0
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners
INFO: [core0] Added SolrEventListener for newSearcher: org.apache.solr.core.
QuerySenderListener{queries=[{q=solr,start=0,rows=10},
{q=rocks,start=0,rows=10}
, {q=static newSearcher warming query from solrconfig.xml}]}
Jan 15, 2013 4:05:44 PM org.apache.solr.core.SolrCore initListeners
INFO: [core0] Added SolrEventListener for firstSearcher: org.apache.solr.cor
e.QuerySenderListener{queries=[]}
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created standard: solr.StandardRequestHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created /dataimport:
org.apache.solr.handler.dataimport.DataImportHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created /search: org.apache.solr.handler.component.SearchHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.core.RequestHandlers
initHandlersFromCon
fig
INFO: created /update: solr.XmlUpdateRequestHandler
Jan 15, 2013 4:05:44 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@53bc93fe main
Jan 15, 2013 4:05:44 PM org.apache.solr.update.CommitTracker init
INFO: commitTracker AutoCommit: disabled
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding
component:org.apache.solr.handler.component.QueryComponent@781fb1f
b
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding
component:org.apache.solr.handler.component.FacetComponent@68de135
9
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding
component:org.apache.solr.handler.component.MoreLikeThisComponent@
4bc86dd8
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding
component:org.apache.solr.handler.component.HighlightComponent@53a
3a6c6
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding
component:org.apache.solr.handler.component.StatsComponent@1d1a3c1
0
Jan 15, 2013 4:05:44 PM org.apache.solr.handler.component.SearchHandler
inform
INFO: Adding  debug
component:org.apache.solr.handler.component.DebugComponent@2
55d4d5d
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting socketTimeout to: 0
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting urlScheme to: http://
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting connTimeout to: 0
Jan 15, 2013 4:05:44 PM
org.apache.solr.handler.component.HttpShardHandlerFactor
y getParameter
INFO: Setting maxConnectionsPerHost to: 20
Jan 15, 2013 4:05:44 PM

RE: Disabling document cache usage

2013-01-15 Thread Markus Jelsma
No, SolrIndexSearcher has no mechanism to do that. The only way is to disable 
the cache altogether or patch it up :)


 
-Original message-
 From:Otis Gospodnetic otis.gospodne...@gmail.com
 Sent: Tue 15-Jan-2013 16:57
 To: solr-user@lucene.apache.org
 Subject: Disabling document cache usage
 
 Hi,
 
 https://issues.apache.org/jira/browse/SOLR-2429 added the ability to
 disable filter and query caches on a request by request basis.
 
 Is there anything one can use to disable usage of (lookups and insertion
 into) document cache?
 
 Thanks,
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/
 


Re: Disabling document cache usage

2013-01-15 Thread Otis Gospodnetic
Hi,

Thanks Markus.
How are caches disabled these days... in Solr 4.0 that is?  I remember
trying to comment them out in the past, but seeing them still enabled and
used with some custom size and other settings.

Thanks,
Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Jan 15, 2013 at 11:00 AM, Markus Jelsma
markus.jel...@openindex.iowrote:

 No, SolrIndexSearcher has no mechanism to do that. The only way is to
 disable the cache altogether or patch it up :)



 -Original message-
  From:Otis Gospodnetic otis.gospodne...@gmail.com
  Sent: Tue 15-Jan-2013 16:57
  To: solr-user@lucene.apache.org
  Subject: Disabling document cache usage
 
  Hi,
 
  https://issues.apache.org/jira/browse/SOLR-2429 added the ability to
  disable filter and query caches on a request by request basis.
 
  Is there anything one can use to disable usage of (lookups and insertion
  into) document cache?
 
  Thanks,
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 



V 4.0.0.0 insert

2013-01-15 Thread Николай Измаилов

I don't understand how to add data into the document. I created a core in 
version 4.0.0 test_core I can read the data on solr/test_core/select and insert 
does not work. How to add data?

Re: Search across a specified number of boundaries

2013-01-15 Thread Mike Ree
Mikhail,

Yeah, I considered that originally, but then after analyzing the data
noticed that was not possible. Some of the content we analyze contains
large tables that after ocr get turned into long running sentences which
contain 500k+ words per a sentence. Overall there are probably around 10k
of those anomalies that stop the ranges from working as we run out of
positions with the max value an integer can contain and run the risk of a
future document breaking it.

I found a Jira on what I'm looking for. Going to look into it and see if I
can get it to work for my situation.

https://issues.apache.org/jira/browse/LUCENE-777

Thanks for the help.

Mike

On Mon, Jan 14, 2013 at 11:48 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Mike,

 When Lucene's Analyser indexes the text it adds positions into the index
 which are lately used by SpanQueries. Have you considered idea of position
 increment gap? e.g. the first sentence is indexed with words positions:
 0,1,2,3,... the second sentence with 100,101,102,103,..., third
 200,201,202.. Then applying some span constraint allows you search
 across/inside of the sentences.
 WDYT?


 On Sun, Jan 6, 2013 at 6:50 PM, Erick Erickson erickerick...@gmail.comwrote:

 Mike:

 I'm _really_ stretching here, but you might be able to do something
 interesting
  with payloads. Say each word had a payload with the sentence number and
 you _somehow_ made use of that information in a custom scorer. But like I
 said, I really have no good idea how to accomplish that...

 BTW, in future this kind of question is better asked on the user's list
 (either
 Lucene or Solr), this list if intended for discussing development work

 Best
 Erick


 On Fri, Jan 4, 2013 at 1:02 PM, Mike Ree mike.ad...@olytech.net wrote:

 d terms that are in nearby sentences.

 IE:
 TermA NEAR3 TermB would find all TermA's that are within 3 sentences
 of TermB.

 Have found ways to find TermA within same sentence





 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: how to optimize same query with different start values

2013-01-15 Thread Andre Bois-Crettez

It looks like a use case for using Solrj with queryAndStreamResponse ?

http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/client/solrj/SolrServer.html#queryAndStreamResponse%28org.apache.solr.common.params.SolrParams,%20org.apache.solr.client.solrj.StreamingResponseCallback%29

André

On 01/15/2013 04:49 PM, Mikhail Khludnev wrote:

It's a well know search engines limitation. This post will help you get
into the core problem
http://www.searchworkings.org/blog/-/blogs/lucene-solr-and-deep-paging . it
seems that the solution is contributed into Lucene, but not yet for Solr.


On Tue, Jan 15, 2013 at 6:36 PM, Upayavirau...@odoko.co.uk  wrote:


You are setting yourself up for disaster.

If you ask Solr for documents 1000 to 1010, it needs to sort documents 1
to 1010, and discard the first 1000, which causes horrible performance.

I'm curious to hear if others have strategies to extract content
sequentially from an index. I suspect a new SearchComponent could really
help here.

I suspect it would work better if you don't sort at all, in which case
you'll return the documents in index order. The issue is that a commit,
or a background merge could change index order which would mess up your
export.

Sorry no clearer answers.

Upayavira

On Tue, Jan 15, 2013, at 02:07 PM, elisabeth benoit wrote:

Hello,

I have a Solr instance (solr 3.6.1) with around 3 000 000 documents. I
want
to read (in a java test application) all my documents, but not in one
shot
(because it takes too much memory).

So I send the same request, over and over, with

q=*:*
rows=1000
sort=id desc  =  to be sure I always get same ordering*
and start parameter increased of 1000 at each iteration


checking the solr logs, I realized that the query responding time
increases
as the start parameter gets bigger

for instance

with start  500 000, it takes about 500ms
with start  1 100 000  and  1 200 000, it takes between 5000 and 5200
ms
with start  1 250 000 and  1 320 000, it takes between 6100 and 6400 ms


Does someone have an idea how to optimize this query?

Thanks,
Elisabeth




--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Index data from multiple tables into Solr

2013-01-15 Thread hassancrowdc
Hi,
once i have indexed data from multiple tables from mysql database into
solr, is there any way that it update data(automatically) if any change is
made to the data in mysql?


On Tue, Jan 15, 2013 at 6:13 AM, Naresh [via Lucene] 
ml-node+s472066n403343...@n3.nabble.com wrote:

 Get user's input, form the solr query and send a request to the server
 (you
 can also pass a parameter called wt (xml,json etc) to direct solr to
 return
 output in that format). Parse the results from solr and display them to
 user in your website.

 Depending on what kind of server-side programming language you are using,
 there might be some libraries available that will allow to integrate your
 web-application with solr (for example: sunspot_solr in ruby)

 On Tue, Jan 15, 2013 at 5:24 AM, hassancrowdc [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4033438i=0wrote:


  thanx, I got it.
 
  How Can i integrate solr with my website? so that i can use it for
 search?
 
 
  On Mon, Jan 14, 2013 at 4:04 PM, Lance Norskog-2 [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4033438i=1
 wrote:
 
   Try all of the links under the collection name in the lower left-hand
   columns. There several administration monitoring tools you may find
   useful.
  
   On 01/14/2013 11:45 AM, hassancrowdc wrote:
  
ok stats are changing, so the data is indexed. But how can i do
 query
   with
this data, or ow can i search it, like the command will be
http://localhost:8983/solr/select?q=(any of my field column from
   table)?
coz whatever i am putting in my url it shows me an xml file but the
numFound are always 0?
   
   
On Sat, Jan 12, 2013 at 1:24 PM, Alexandre Rafalovitch [via Lucene]
 
[hidden email] http://user/SendEmail.jtp?type=nodenode=4033291i=0

   wrote:
   
Have you tried the Admin interface yet? The one on :8983 port if
 you
   are
running default setup. That has a bunch of different stats you can
  look
   at
apart from a nice way of doing a query. I am assuming you are on
 Solr
   4,
of
course.
   
Regards,
Alex.
   
On Fri, Jan 11, 2013 at 5:13 PM, hassancrowdc [hidden email]
   http://user/SendEmail.jtp?type=nodenode=4032778i=0wrote:
   
   
So, I followed all the steps and solr is working successfully, Can
  you
please tell me how i can see if my data is indexed or not? do i
 have
   to
enter specific url into my browser or anything. I want to make
 sure
   that
the data is indexed.
   
   
   
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening
 all
  at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
   
   
--
  If you reply to this email, your message will be added to the
   discussion
below:
   
   
  
.
NAML
  
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
  
  
   
   
   
   
--
View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033268.html
  
Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
   --
If you reply to this email, your message will be added to the
 discussion
   below:
  
  
 
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033291.html
To unsubscribe from Index data from multiple tables into Solr, click
  here
 
 
   .
   NAML
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
  
  
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033296.html

  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Regards
 Naresh


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033438.html
  To unsubscribe from Index data from multiple tables into Solr, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM=
 .
 

Re: Tutorial for Solr query language, dismax and edismax?

2013-01-15 Thread Walter Underwood
You should not need to use wildcards.

Most configurations of Solr will index space-separated words as separate 
tokens. They can be matched separately.

DId you use a string field type (probably the wrong choice)? How are your 
fields tokenized? 

Solr/Lucene query syntax:

http://wiki.apache.org/solr/SolrQuerySyntax
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html

The analysis page in the admin UI is your friend here. You can put in text for 
the index and the query, choose a field type, and see how it is tokenized and 
matched.

wunder

On Jan 15, 2013, at 6:14 AM, eShard wrote:

 Does anyone have a great tutorial for learning the solr query language,
 dismax and edismax?
 I've searched endlessly for one but I haven't been able to locate one that
 is comprehensive enough and has a lot of examples (that actually work!).
 I also tried to use wildcards, logical operators, and a phrase search and it
 either didn't work or behave the way I thought it would.
 
 for example, I tried to search a multivalued field solr.title and a content
 field that contains their phone number (and a lot of other data)
 so, from the solr admin query page;
 in the q field i tried lots of variations of this- solr.title:*Costa,
 Julie* AND content:tel=
 And I either got 0 results or ALL the results.
 solr.title would only work if I put in solr.title:*Costa* but not anything
 longer than that. Even though there are plenty of Costa, J's (John, Julie,
 Julia, Jerry etc)
 I should be able to do a phrase search out of the box, shouldn't I?
 I also read on one site that only edismax can use logical operators but I
 couldn't get that to work either.
 Can anyone point me in the right direction?
 I'm currently using Solr 4.0 Final with ManifoldCF v 1.2 dev
 
 Thank you,
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Tutorial-for-Solr-query-language-dismax-and-edismax-tp4033465.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: Suggestion that preserve original phrase case

2013-01-15 Thread Selvam
Thanks Erick, can you tell me how to do the appending
(lowercaseversion:LowerCaseVersion) before indexing. I tried pattern
factory filters, but I could not get it right.


On Sun, Jan 13, 2013 at 8:49 PM, Erick Erickson erickerick...@gmail.comwrote:

 One way I've seen this done is to index pairs like
 lowercaseversion:LowerCaseVersion. You can't push this whole thing through
 your field as defined since it'll all be lowercased, you have to produce
 the left hand side of the above yourself and just use KeywordTokenizer
 without LowercaseFilter.

 Then, your application displays the right-hand-side of the returned token.

 Simple solution, not very elegant, but sometimes the easiest...

 Best
 Erick


 On Fri, Jan 11, 2013 at 1:30 AM, Selvam s.selvams...@gmail.com wrote:

  Hi*,
 
  *
  I have been trying to figure out a way for case insensitive suggestion
 but
  which should return original phrase as result.* *I am using* *solr 3.5*
 
  *
  *For eg:
 
  *
  If I index 'Hello world' and search  for 'hello' it needs to return
 *'Hello
  world'* not *'hello world'. *My configurations are as follows,*
  *
  *
  New field type:*
  fieldType class=solr.TextField name=text_auto
analyzer
 tokenizer class=solr.KeywordTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
 
  *Field values*:
 field name=label type=text indexed=true stored=true
  termVectors=true omitNorms=true/
 field name=label_autocomplete type=text_auto indexed=true
  stored=true multiValued=false/
 copyField source=label dest=label_autocomplete /
 
  *Spellcheck Component*:
searchComponent name=suggest class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetext_auto/str
  lst name=spellchecker
   str name=namesuggest/str
   str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str
  name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
  str name=buildOnOptimizetrue/str
  str name=buildOnCommittrue/str
  str name=fieldlabel_autocomplete/str
/lst
  /searchComponent
 
 
  Kindly share your suggestions to implement this behavior.
 
  --
  Regards,
  Selvam
  KnackForge http://knackforge.com
  Acquia Service Partner
  No. 1, 12th Line, K.K. Road, Venkatapuram,
  Ambattur, Chennai,
  Tamil Nadu, India.
  PIN - 600 053.
 




-- 
Regards,
Selvam
KnackForge http://knackforge.com
Acquia Service Partner
No. 1, 12th Line, K.K. Road, Venkatapuram,
Ambattur, Chennai,
Tamil Nadu, India.
PIN - 600 053.


RE: Disabling document cache usage

2013-01-15 Thread Markus Jelsma
Hi,

Commenting them out works fine. We don't use documentCaches either as they eat 
too much and return only so little.

Cheers

 
 
-Original message-
 From:Otis Gospodnetic otis.gospodne...@gmail.com
 Sent: Tue 15-Jan-2013 17:29
 To: solr-user@lucene.apache.org
 Subject: Re: Disabling document cache usage
 
 Hi,
 
 Thanks Markus.
 How are caches disabled these days... in Solr 4.0 that is?  I remember
 trying to comment them out in the past, but seeing them still enabled and
 used with some custom size and other settings.
 
 Thanks,
 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/
 
 
 
 
 
 On Tue, Jan 15, 2013 at 11:00 AM, Markus Jelsma
 markus.jel...@openindex.iowrote:
 
  No, SolrIndexSearcher has no mechanism to do that. The only way is to
  disable the cache altogether or patch it up :)
 
 
 
  -Original message-
   From:Otis Gospodnetic otis.gospodne...@gmail.com
   Sent: Tue 15-Jan-2013 16:57
   To: solr-user@lucene.apache.org
   Subject: Disabling document cache usage
  
   Hi,
  
   https://issues.apache.org/jira/browse/SOLR-2429 added the ability to
   disable filter and query caches on a request by request basis.
  
   Is there anything one can use to disable usage of (lookups and insertion
   into) document cache?
  
   Thanks,
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
 
 


Re: V 4.0.0.0 insert

2013-01-15 Thread Alexandre Rafalovitch
Have you gone through the tutorial on the wiki first? It should cover basic
use cases. If you have, how do you send the data in?

Regards,
 Alex
On 15 Jan 2013 11:22, Николай Измаилов bob...@mail.ru wrote:


 I don't understand how to add data into the document. I created a core in
 version 4.0.0 test_core I can read the data on solr/test_core/select and
 insert does not work. How to add data?


Re: Solr Query | Loading documents with large content (Performance)

2013-01-15 Thread Otis Gospodnetic
Hi,

Have a look under
http://wiki.apache.org/solr/UpdateCSV#Methods_of_uploading_CSV_recordsabout
uploading a *local* file.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Jan 15, 2013 at 3:59 AM, Uwe Clement uwe.clem...@exxcellent.dewrote:

 Hi there,



 sometimes we have to load very big documents, 1-2 multi-value-fields of it
 can contain 10.000 items. And unfortunately we need this informations.



 We have to load 50 documents in order to show to the result table in the
 UI.



 The query takes around 50 seconds. I guess 48 seconds of it, is just to
 transfer the content of the documents over the net.



 What can I do here?

 -I know, can take out this long informations outside of the document.
 But this is also not really a solution

 -Then I was thinking about compressed-fields. They come with solr 4.1
 again, right?



 How is it with compressed field. As I understood the stored field will be
 stored in a compressed way. Ok, but when they will be uncompressed?

 -Before sending back to the client on server-side?

 -Or, on the clientside? I am using solrJ.



 Any other ideas? Can it work to increase the query performance using
 compressed fields?



 Thanks a lot for your ideas and answers!



 Regards

 Uwe







 --

   Uwe Clement

   Software Architect

   Project Manager



 ___ |X__

X|





 eXXcellent solutions gmbh

 Beim Alten Fritz 2



 D-89075 Ulm



 e |  mailto:uwe.clem...@exxcellent.de uwe.clem...@exxcellent.de

 m | +49 [0]151-275 692 27

 i |  http://www.exxcellent.de http://www.exxcellent.de



 



 Geschäftsführer: Dr. Martina Burgetsmeier, Wilhelm Zorn, Gerhard Gruber
 Sitz der Gesellschaft: Ulm, Registergericht: Ulm HRB 4309






SolrCloud Performance for High Query Volume

2013-01-15 Thread Niran Fajemisin
Hi all,

I'm currently in the process of doing some performance testing in preparations 
for upgrading from Solr 3.6.1 to Solr 4.0. (We're badly in need of NRT 
functionality)

Our existing deployment is not a typical deployment for Solr, as we use it to 
search and facet on financial data such as accounts, positions and transactions 
records. To make matters worse, each request could potentially return upwards 
of 50,000 or more records from the index. As I said, it's not an ideal use case 
for Solr but this is the system that is in place and it really can't be changed 
at this point. With this defined use case, our current 3.6.1 deployment is able 
to scale to about 1500 queries per minute, with an average response time in the 
low 100-200ms. Note that this time includes the query time and the transport 
time (time to stream all the documents to the calling services). At the 50,000 
document mark, we're getting about 1.6-2 sec. response time. The client is 
willing to live with this as these type of requests are not very frequent.

Our hardware configuration on the 3.6.1 environment is as follows:
* 1 Master Server for indexing with 2 CPU (each 6 cores, 2.67GHz)  4GB 
of RAM and 150GB HDD
* 2 Slaves Servers for query only each with 2 of CPUs (each 6 cores, 
2.67GHz) with 12GB of RAM each and same HDD space. (mechanical drive)
Each of the servers are virtual servers in a VMWare environment. 

Now with the roughly the same schema and solrconfig configuration, the 
performance on Solr 4.0 is quite bad. Running just 500 queries per minute our 
query performance degrades to almost 2 minute response times in some cases. The 
average is about 40-50 sec. response time. Note that the index at the moment is 
only a fraction of the size of the existing environment (about 1/8th the size). 

The hardware setup for the SolrCloud deployment is as follows:
* 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 8GB 
of RAM and 150GB HDD

* 3 ZooKeeper server instances. We are using each Solr server instance 
to run 1 ZK instance, with the 4th server not running a ZK server.
We haven't observed any issues with memory utilization. Additionally the 
virtual servers are co-located. We're wondering if upgrading to Solid State 
Drives would improve performance significantly?

Are there any other pointers or configuration changes that we can make to help 
bring down our query times? Any tips will be greatly appreciated.

Thanks all!

Re: Index data from multiple tables into Solr

2013-01-15 Thread Shawn Heisey

On 1/15/2013 9:20 AM, hassancrowdc wrote:

Hi,
once i have indexed data from multiple tables from mysql database into
solr, is there any way that it update data(automatically) if any change is
made to the data in mysql?


You need to write a program to do this.

Although this list can provide guidance, such programs are highly 
customized to the particulars for your setup.  There is not really any 
general purpose solution here.


There are two typical approaches - have a program that initiates 
delta-imports with the dataimporter, or write a program that both talks 
to your database and uses a Solr client API to send updates to Solr.  I 
used to use the former approach, now I use the latter.  I still use the 
dataimporter for full reindexes, though.


Thanks,
Shawn



Re: Stored hierachical data in Solr

2013-01-15 Thread Upayavira
You can store structured data in Solr. You can't *query* it, in such a
way as respects its structure.

E.g. If I had xmlthisband/bthat/xml, I could parse that into
terms:

[this] [and] [that], and do searches upon them. 

But you couldn't search for documents that match an xpath such as
/xml/b='and'.

Upayavira

On Tue, Jan 15, 2013, at 05:02 PM, Nicholas Ding wrote:
 Hello,
 
 I'm thinking store hierachical data structure on Solr. I know I have to
 flatten the structure in a form like A_B_C, but it is possible to extend
 Solr to support hierachical data?
 What about I store JSON text into a field, then load it and process it
 while Solr output the response? Is that doable by extending Solr?
 
 Thanks
 Nicholas


RE: Index data from multiple tables into Solr

2013-01-15 Thread Swati Swoboda
He is talking about this list, the list we are using to communicate. You are 
sending your messages to a mailing list -- thousands are on it.

Example of programs that will run the delta-import/full-import commands: Cron
You are basically calling a URL with specific parameters to pull data from your 
DB

Example of program that will use the Solr API: these are all application 
specific (based on what fields are in your schema, etc.). 

Swati

-Original Message-
From: hassancrowdc [mailto:hassancrowdc...@gmail.com] 
Sent: Tuesday, January 15, 2013 2:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Index data from multiple tables into Solr

Which list are you reffering to?

and can you please give an example of such program(doesn't matter if it is for 
your setup)?


On Tue, Jan 15, 2013 at 12:06 PM, Shawn Heisey-4 [via Lucene] 
ml-node+s472066n4033518...@n3.nabble.com wrote:

 On 1/15/2013 9:20 AM, hassancrowdc wrote:
  Hi,
  once i have indexed data from multiple tables from mysql database 
  into solr, is there any way that it update data(automatically) if 
  any change
 is
  made to the data in mysql?

 You need to write a program to do this.

 Although this list can provide guidance, such programs are highly 
 customized to the particulars for your setup.  There is not really any 
 general purpose solution here.

 There are two typical approaches - have a program that initiates 
 delta-imports with the dataimporter, or write a program that both 
 talks to your database and uses a Solr client API to send updates to 
 Solr.  I used to use the former approach, now I use the latter.  I 
 still use the dataimporter for full reindexes, though.

 Thanks,
 Shawn



 --
  If you reply to this email, your message will be added to the 
 discussion
 below:

 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int
 o-Solr-tp4032266p4033518.html  To unsubscribe from Index data from 
 multiple tables into Solr, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
 =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN
 vbXw0MDMyMjY2fC00ODMwNzMyOTM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
 =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n
 amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb
 le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na
 bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em
 ail%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033545.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Index data from multiple tables into Solr

2013-01-15 Thread Swati Swoboda
https://wiki.apache.org/solr/Solrj client. You'd have to configure it / use it 
based on your application needs.

-Original Message-
From: hassancrowdc [mailto:hassancrowdc...@gmail.com] 
Sent: Tuesday, January 15, 2013 2:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Index data from multiple tables into Solr

ok.
so if i have manufacturer and id fields in schema file, what will be wat will 
be program that will use that will use solr API?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Top Terms Using Luke

2013-01-15 Thread Shawn Heisey

On 1/15/2013 11:54 AM, Lighton Phiri wrote:

I would like to get a sense of the top terms for fields in my index
and just enable the LukeRequestHandler [1] in my solrconfig.xml file.
However, Luke seems to include stopwords as well.

I've tried searching previous threads but nothing I've come across [2,
3, 4] has helped.

How can I tell Luke not to include stopwords?  Alternatively, what's
the easiest way of getting top terms without stopwords?


If you don't want stopwords in the top terms report, you have to remove 
them from your index.  IMHO, this is not a good idea because you will 
lose search precision, but using StopFilterFactory in a fieldType 
analysis chain is very common.


If you were to leave stopwords in your index but tell the tools to not 
display them, then the top terms list would be lying to you, and it 
would not be very useful as a troubleshooting tool.  Troubleshooting is 
one of Luke's primary purposes.


To get an idea for which non-stopwords are dominant in your index, just 
ask for more top terms, instead of just the top ten or top twenty.  If 
you are using a program to parse the information, have your program 
remove the terms that you don't want to include, then trim the list to 
the proper size.


Thanks,
Shawn



Re: Index data from multiple tables into Solr

2013-01-15 Thread Shawn Heisey

On 1/15/2013 12:00 PM, hassancrowdc wrote:

Which list are you reffering to?


The solr-user mailing list that we are both using here.


and can you please give an example of such program(doesn't matter if it is
for your setup)?


I can't do that.  It is confidential and proprietary code.  Although I 
wrote it, I do not have any rights to share it because it was written on 
the job.


Thanks,
Shawn



Solr exception when parsing XML

2013-01-15 Thread Zhang, Lisheng
Hi,
 
I got SolrException when submitting XML for indexing (using solr 3.6.1)
 

Jan 15, 2013 10:22:42 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, cod
e 31))
 at [row,col {unknown-source}]: [2,1169]
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:81)
 
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character 
((CTRL-CHAR, code 31))
...
 at [row,col {unknown-source}]: [2,1169]
at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660)
at 
com.ctc.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java:4240)
at 
com.ctc.wstx.sr.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java:3280)
at 
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2824)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)

 
I checked details, the data causing trouble is 
 
word1chr(31)word2
 
here both word1 and word2 are normail English characters and chr(31) is just 
the returning value of PHP
function chr(31). Our XML is well constructed and encoding/charset are well 
defined. 
 
The problem is due to chr(31), if I replace it with another UTF-8 character, 
indexing is OK. 
 
I checked source code com.ctc.wstx.sr.BasicStreamReader.java, it seems that it 
is by design any CTRL
character is not allowed inside CDATA text, but I am puzzled that how could we 
avoid CTRL character in
text in general (sure it is not a common occurance but can still happen)?
 
Thanks very much for helps, Lisheng


Re: Index data from multiple tables into Solr

2013-01-15 Thread hassancrowdc
okay, thank you.

After indexing data from database to solr. I want to search such that if i
write any word (that is included in the documents been indexed) it should
return all the documents that include that word. But it does not. When i
write http://localhost:8983/solr/select?q=anyword   i gives me error.

is there anything wrong with my http? or is this the wrong place to search?


On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene] 
ml-node+s472066n4033563...@n3.nabble.com wrote:

 https://wiki.apache.org/solr/Solrj client. You'd have to configure it /
 use it based on your application needs.

 -Original Message-
 From: hassancrowdc [mailto:[hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4033563i=0]

 Sent: Tuesday, January 15, 2013 2:38 PM
 To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033563i=1
 Subject: Re: Index data from multiple tables into Solr

 ok.
 so if i have manufacturer and id fields in schema file, what will be wat
 will be program that will use that will use solr API?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033556.html

 Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033563.html
  To unsubscribe from Index data from multiple tables into Solr, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033614.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Index data from multiple tables into Solr

2013-01-15 Thread Swati Swoboda
What error are you getting? Which field are you searching (default field)? Did 
you try specifying a default field? What is your schema like? Which analyzers 
did you use?

Which version of solr are you using? I highly recommend going through the 
tutorial to get a basic understanding of inserting, updating, and searching:

http://lucene.apache.org/solr/tutorial.html

Hours have been spent in setting up these tutorials and they are very 
informative.

-Original Message-
From: hassancrowdc [mailto:hassancrowdc...@gmail.com] 
Sent: Tuesday, January 15, 2013 3:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Index data from multiple tables into Solr

okay, thank you.

After indexing data from database to solr. I want to search such that if i 
write any word (that is included in the documents been indexed) it should 
return all the documents that include that word. But it does not. When i
write http://localhost:8983/solr/select?q=anyword   i gives me error.

is there anything wrong with my http? or is this the wrong place to search?


On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene] 
ml-node+s472066n4033563...@n3.nabble.com wrote:

 https://wiki.apache.org/solr/Solrj client. You'd have to configure it 
 / use it based on your application needs.

 -Original Message-
 From: hassancrowdc [mailto:[hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4033563i=0]

 Sent: Tuesday, January 15, 2013 2:38 PM
 To: [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4033563i=1
 Subject: Re: Index data from multiple tables into Solr

 ok.
 so if i have manufacturer and id fields in schema file, what will be 
 wat will be program that will use that will use solr API?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int
 o-Solr-tp4032266p4033556.html

 Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the 
 discussion
 below:

 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int
 o-Solr-tp4032266p4033563.html  To unsubscribe from Index data from 
 multiple tables into Solr, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
 =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN
 vbXw0MDMyMjY2fC00ODMwNzMyOTM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
 =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n
 amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb
 le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na
 bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em
 ail%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033614.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index data from multiple tables into Solr

2013-01-15 Thread hassancrowdc
I dont want to search by one field, i want to search as a whole. I am
following that tutorial i got indexing, updating but now for search i would
like to search through everything i have indexed not a specific field. I
can do by using defaultfield but i would like to search through everything
i have indexed. any hint how i can do that?


On Tue, Jan 15, 2013 at 3:49 PM, sswoboda [via Lucene] 
ml-node+s472066n4033617...@n3.nabble.com wrote:

 What error are you getting? Which field are you searching (default field)?
 Did you try specifying a default field? What is your schema like? Which
 analyzers did you use?

 Which version of solr are you using? I highly recommend going through the
 tutorial to get a basic understanding of inserting, updating, and
 searching:

 http://lucene.apache.org/solr/tutorial.html

 Hours have been spent in setting up these tutorials and they are very
 informative.

 -Original Message-
 From: hassancrowdc [mailto:[hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4033617i=0]

 Sent: Tuesday, January 15, 2013 3:38 PM
 To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4033617i=1
 Subject: Re: Index data from multiple tables into Solr

 okay, thank you.

 After indexing data from database to solr. I want to search such that if i
 write any word (that is included in the documents been indexed) it should
 return all the documents that include that word. But it does not. When i
 write http://localhost:8983/solr/select?q=anyword   i gives me error.

 is there anything wrong with my http? or is this the wrong place to
 search?


 On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene] 
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4033617i=2
 wrote:

  https://wiki.apache.org/solr/Solrj client. You'd have to configure it
  / use it based on your application needs.
 
  -Original Message-
  From: hassancrowdc [mailto:[hidden
  email]http://user/SendEmail.jtp?type=nodenode=4033563i=0]
 
  Sent: Tuesday, January 15, 2013 2:38 PM
  To: [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4033563i=1
  Subject: Re: Index data from multiple tables into Solr
 
  ok.
  so if i have manufacturer and id fields in schema file, what will be
  wat will be program that will use that will use solr API?
 
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int
  o-Solr-tp4032266p4033556.html
 
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  --
   If you reply to this email, your message will be added to the
  discussion
  below:
 
  http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int
  o-Solr-tp4032266p4033563.html  To unsubscribe from Index data from
  multiple tables into Solr, click
  herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
  =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN
  vbXw0MDMyMjY2fC00ODMwNzMyOTM=
  .
  NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
  =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n
  amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb
  le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na
  bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em
  ail%21nabble%3Aemail.naml
 




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033614.html

 Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033617.html
  To unsubscribe from Index data from multiple tables into Solr, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmNvbXw0MDMyMjY2fC00ODMwNzMyOTM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr exception when parsing XML

2013-01-15 Thread Alexandre Rafalovitch
Interesting point. Looks like CDATA is more limiting than I thought:
http://en.wikipedia.org/wiki/CDATA#Issues_with_encoding . Basically, the
recommendation is to avoid CDATA and automatically encode characters such
as yours, as well as less/more and ampersand.

Regards,
   Alex.


RE: Index data from multiple tables into Solr

2013-01-15 Thread Swati Swoboda
http://wiki.apache.org/solr/ExtendedDisMax

Specify your query fields in the qf parameter. Take a look at the example at 
the bottom of the page.



-Original Message-
From: hassancrowdc [mailto:hassancrowdc...@gmail.com] 
Sent: Tuesday, January 15, 2013 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Index data from multiple tables into Solr

I dont want to search by one field, i want to search as a whole. I am following 
that tutorial i got indexing, updating but now for search i would like to 
search through everything i have indexed not a specific field. I can do by 
using defaultfield but i would like to search through everything i have 
indexed. any hint how i can do that?


On Tue, Jan 15, 2013 at 3:49 PM, sswoboda [via Lucene] 
ml-node+s472066n4033617...@n3.nabble.com wrote:

 What error are you getting? Which field are you searching (default field)?
 Did you try specifying a default field? What is your schema like? 
 Which analyzers did you use?

 Which version of solr are you using? I highly recommend going through 
 the tutorial to get a basic understanding of inserting, updating, and
 searching:

 http://lucene.apache.org/solr/tutorial.html

 Hours have been spent in setting up these tutorials and they are very 
 informative.

 -Original Message-
 From: hassancrowdc [mailto:[hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4033617i=0]

 Sent: Tuesday, January 15, 2013 3:38 PM
 To: [hidden email] 
 http://user/SendEmail.jtp?type=nodenode=4033617i=1
 Subject: Re: Index data from multiple tables into Solr

 okay, thank you.

 After indexing data from database to solr. I want to search such that 
 if i write any word (that is included in the documents been indexed) 
 it should return all the documents that include that word. But it does not. 
 When i
 write http://localhost:8983/solr/select?q=anyword   i gives me error.

 is there anything wrong with my http? or is this the wrong place to 
 search?


 On Tue, Jan 15, 2013 at 2:48 PM, sswoboda [via Lucene]  [hidden 
 email] http://user/SendEmail.jtp?type=nodenode=4033617i=2
 wrote:

  https://wiki.apache.org/solr/Solrj client. You'd have to configure 
  it / use it based on your application needs.
 
  -Original Message-
  From: hassancrowdc [mailto:[hidden
  email]http://user/SendEmail.jtp?type=nodenode=4033563i=0]
 
  Sent: Tuesday, January 15, 2013 2:38 PM
  To: [hidden email]
  http://user/SendEmail.jtp?type=nodenode=4033563i=1
  Subject: Re: Index data from multiple tables into Solr
 
  ok.
  so if i have manufacturer and id fields in schema file, what will be 
  wat will be program that will use that will use solr API?
 
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-i
  nt
  o-Solr-tp4032266p4033556.html
 
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  --
   If you reply to this email, your message will be added to the 
  discussion
  below:
 
  http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-i
  nt o-Solr-tp4032266p4033563.html  To unsubscribe from Index data 
  from multiple tables into Solr, click 
  herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?mac
  ro 
  =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsL
  mN
  vbXw0MDMyMjY2fC00ODMwNzMyOTM=
  .
  NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?mac
  ro 
  =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml
  .n 
  amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-na
  bb 
  le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21
  na 
  bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_
  em
  ail%21nabble%3Aemail.naml
 




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int
 o-Solr-tp4032266p4033614.html

 Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the 
 discussion
 below:

 http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-int
 o-Solr-tp4032266p4033617.html  To unsubscribe from Index data from 
 multiple tables into Solr, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
 =unsubscribe_by_codenode=4032266code=aGFzc2FuY3Jvd2RjYXJlQGdtYWlsLmN
 vbXw0MDMyMjY2fC00ODMwNzMyOTM=
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
 =macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.n
 amespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabb
 le.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na
 bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_em
 ail%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-data-from-multiple-tables-into-Solr-tp4032266p4033622.html
Sent from the 

Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-15 Thread Mark Bennett
First off, just reporting this:

I wound up with approx 58% few documents having submitted via
ConcurrentUpdateSolrServer.  I went back and changed the code to use
HttpSolrServer and had 100%

This was a long running test, approx 12 hours, with gigabytes of data, so
conveniently shared / reproducible, but I at least wanted to email around,
in part to get it on the record, and second to see if anybody else has
seen this?  I didn't see anything in JIRA.

I realize that Concurrent update is asynchronous and I'm giving up the
ability to monitor things, but since it works using the old server, there's
nothing glaringly wrong at least.

Here's a few more details:
* Approx 2 M docs, submitted 1,000 at a time.
* Solr 4.0.0 on Windows Server 2008
* Solr server JVM configured with 4 Gigs of RAM
* Submitting client JVM (SolrJ) configured with 10 Gigs of RAM
* Did didn't see any OOM (Out Of Memory) errors on the asynchronous /
ConcurrentUpdateSolrServer run.  However, I didn't capture the entire log.
Usually with OOM it's just before the run crashes, and the end of the log
on the screen looked fine.
* I also didn't think there was OOM issues on the Solr server side, for the
same reason
* When submitting the same data synchronously (via HttpSolrServer) it
didn't have any problems

Questions:

The async client certainly finished faster, and since the underlying Solr
server presumably didn't do the real work any faster, presumably a backlog
built up somewhere.  Agreed?

I'm guessing this backlog had something to do with the failure.  Or are
there other areas to think about?

Which process would get backlogged, the SolrJ client or the Solr server?
I'd guess the server?

And if async submits are accumulated in the Solr server, is there some
mechanism to queue them onto disk, or does it try to hold them all in RAM?

And *if* the backlog caused an OOM condition, wouldn't that JVM have mostly
crashed (if not completely)?

Any guesses on the mostly likely failure point, and where to look?

Thanks,
Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


Re: Index data from multiple tables into Solr

2013-01-15 Thread Shawn Heisey

On 1/15/2013 1:37 PM, hassancrowdc wrote:

After indexing data from database to solr. I want to search such that if i
write any word (that is included in the documents been indexed) it should
return all the documents that include that word. But it does not. When i
write http://localhost:8983/solr/select?q=anyword   i gives me error.


You haven't told it which core (or collection if using SolrCloud) you 
want to search.


http://localhost:8983/solr/corename/select?q=anyword



Re: Synonyms and trailing wildcard

2013-01-15 Thread Jack Krupansky
It's certainly true that wildcard suppresses the synonym filter since it is 
not multi-term aware.


Other than implementing your own version of the synonym filter that was 
multi-term aware and interpreted wildcards, you may have to do your own 
preprocessor.


Or, you could do index-time synonyms, so that bill, billy, will, 
willy, and william were all indexed at the same location. Then the bil* 
wildcard would match william sincebill is also indexed at the same 
location.


-- Jack Krupansky

-Original Message- 
From: Roberto Isaac Gonzalez

Sent: Tuesday, January 15, 2013 3:10 PM
To: solr-user@lucene.apache.org
Subject: Synonyms and trailing wildcard

Hi

I'm working on adding nicknames capability to our system. It's basically a
synonym mapping stored in a nicknames.txt file that uses the SynonymFilter
framework.

In one of our search boxes (used for lookups), we automatically append a
trailing wildcard.

There's one use case we're dealing with which is expanding synonyms even if
there's a trailing wildcard.

i.e. Q: Bill*
Expected Results: Bill, Billie, William

Q: Bil*
Expected Results: Bill, so no synonym expansion.

Basically, for synonym expansion, we want to treat the token as if it
didn't contain the trailing wildcard and we also *don't* want to expand the
wildcard before doing the synonym matches.

We tried using the multiterm analysis chain but by definition that expects
one token *in* and one token
*out*(org.apache.solr.schema.TextField.analyzeMultiTerm()) so it
throws an
exception.

I'm looking for options about implementing this scenario and some of the
options I've explored are:

1. Use the multiterm analysis chain and allow Synonym expansion, so one
token in and multiple tokens out.
2. Iterate ourselves and see if the multiterm analysis chain returns more
than one token, if it does, then remove the SynonymFilter from the analysis
chain, something similar to ExtendedDismaxQParser.shouldRemoveStopFilter().
3. ExtendedDismaxQParser.preProcessUserQuery() to OR the non-wildcarded
term.

What do you guys think?


Best Regards,
Roberto Gonzalez 



Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-15 Thread Shawn Heisey

On 1/15/2013 2:10 PM, Mark Bennett wrote:

First off, just reporting this:

I wound up with approx 58% few documents having submitted via
ConcurrentUpdateSolrServer.  I went back and changed the code to use
HttpSolrServer and had 100%

This was a long running test, approx 12 hours, with gigabytes of data, so
conveniently shared / reproducible, but I at least wanted to email around,
in part to get it on the record, and second to see if anybody else has
seen this?  I didn't see anything in JIRA.

I realize that Concurrent update is asynchronous and I'm giving up the
ability to monitor things, but since it works using the old server, there's
nothing glaringly wrong at least.


You're not only giving up the ability to monitor things, you're also 
giving up the ability to detect errors.  All exceptions that get thrown 
by the internals of ConcurrentUpdateSolrServer are swallowed, your code 
will never know they happened.  The client log (slf4j with whatever 
binding  config you chose) may have such errors logged, but they are 
completely undetectable by the code.  Make sure you're actually logging 
someplace with your solrj app at a minimum level of INFO, then check 
that log.


It might be a case of errors being silently swallowed, or it might be a bug.

Thanks,
Shawn



from 1.4 to 3.6

2013-01-15 Thread kaveh minooie

HI
 I hope this doesn't turn out to be a very stupid question. I have 
upgraded from solr 1.4 to 3.6 and now in the response that I am getting 
from solr maxScore field in the [response] is missing. I am doing 
something wrong? how can I get it back?


thanks,
--
Kaveh Minooie

www.plutoz.com


Re: from 1.4 to 3.6

2013-01-15 Thread Shawn Heisey

On 1/15/2013 4:14 PM, kaveh minooie wrote:

HI
  I hope this doesn't turn out to be a very stupid question. I have
upgraded from solr 1.4 to 3.6 and now in the response that I am getting
from solr maxScore field in the [response] is missing. I am doing
something wrong? how can I get it back?


Just add the special score field to the fl parameter (field list).  If 
you don't have the fl parameter at all, use fl=score,* to get it.  If 
you aren't displaying the score, then it won't give you maxScore.


Thanks,
Shawn



Is *:* the only possible search with * on the left-hand-side?

2013-01-15 Thread Alexandre Rafalovitch
Hello,

Is *:* hardcoded somewhere as a unique special pattern or is there actually
a class of queries with *:'something'?

I tried searching for it, but I suspect this is not the patterns most
tokenizers will actually index as searchable. :-)

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


Re: Is *:* the only possible search with * on the left-hand-side?

2013-01-15 Thread Jack Krupansky

Semi-hard-coded.

In QueryParserBase.java:

protected Query getWildcardQuery(String field, String termStr) throws 
ParseException

{
 if (*.equals(field)) {
   if (*.equals(termStr)) return newMatchAllDocsQuery();

Otherwise, if you try *:x, * is an undefined field.

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Tuesday, January 15, 2013 7:06 PM
To: solr-user@lucene.apache.org
Subject: Is *:* the only possible search with * on the left-hand-side?

Hello,

Is *:* hardcoded somewhere as a unique special pattern or is there actually
a class of queries with *:'something'?

I tried searching for it, but I suspect this is not the patterns most
tokenizers will actually index as searchable. :-)

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book) 



Re: Top Terms Using Luke

2013-01-15 Thread Lighton Phiri
I suppose this will do; I just figured they'd be a built-in way of
excluding stopwords. Thank you.


On 15 January 2013 22:08, Shawn Heisey s...@elyograg.org wrote:
 To get an idea for which non-stopwords are dominant in your index, just ask
 for more top terms, instead of just the top ten or top twenty.  If you are
 using a program to parse the information, have your program remove the terms
 that you don't want to include, then trim the list to the proper size.



Lighton Phiri
http://lightonphiri.org


RE: DataImportHandlerException: Unable to execute query with OPTIM

2013-01-15 Thread ashimbose
Dear James Dyer ,

Thank You Very Much. Its really working now. I was struggling past 3 weeks
to solve it. You are really awesome. I am really happy now. Thank you to
make me happy.

Regards,
Ashim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandlerException-Unable-to-execute-query-with-OPTIM-tp4033436p4033755.html
Sent from the Solr - User mailing list archive at Nabble.com.