Re: filtering footer information

2012-05-23 Thread Otis Gospodnetic
I wonder if Boilerplate could be helpful here?  Boilerplate is now integrated 
in Tika.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>
> From: "Mark , N" 
>To: solr-user@lucene.apache.org 
>Sent: Thursday, May 24, 2012 1:39 AM
>Subject: filtering footer information
> 
>Is it possible to filter certain repeated  footer information from text
>documents while indexing to solr ?
>
>Are there any built-in filters similar to stop word filters ?
>
>
>
>
>-- 
>Thanks,
>
>*Nipen Mark *
>
>
>

Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread in.abdul
Hi Dmitry ,

There is no out of memory execution in solr ..
Thanks and Regards,
S SYED ABDUL KATHER



On Thu, May 24, 2012 at 1:14 AM, Dmitry Kan [via Lucene] <
ml-node+s472066n3985762...@n3.nabble.com> wrote:

> do you also see out of memory exception in your tomcat logs? If so, try
> setting the JVM's -Xmx to something reasonable.
>
> -- Dmitry
>
> On Wed, May 23, 2012 at 10:09 PM, in.abdul <[hidden 
> email]>
> wrote:
>
> > Sorry i missed the point i am already using Method.Post Only  .. Still i
> > could not able to execute
> > Thanks and Regards,
> >S SYED ABDUL KATHER
> >
> >
> >
> > On Thu, May 24, 2012 at 12:19 AM, iorixxx [via Lucene] <
> > [hidden email] >
> wrote:
> >
> > > > I have creteria where i am passing more than
> > > > 10 ids in Query like
> > > > q=(ROWINDEX:(1 2 3 4  )) using solrJ . i had increased
> > > > the Max Boolean
> > > > clause  = 10500 and i had increased the Max Header
> > > > Size in tomcat also
> > > > in sufficient amount .. But still its is throwing Null
> > > > Pointer Exception in
> > > > Tomcat and in Eclipse while debugging i had seen Error as
> > > > "Error Executing
> > > > Query" . Please give me suggestion for this.
> > >
> > >
> > > If you are using GET method ( which is default) try POST method
> instead.
> > > See how to use it : http://search-lucene.com/m/34M4GTEIaD
> > >
> > >
> > > --
> > >  If you reply to this email, your message will be added to the
> discussion
> > > below:
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985746.html
> > >  To unsubscribe from Lucene, click here<
> >
> >
> > > .
> > > NAML<
> >
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> > >
> > >
> >
> >
> > -
> > THANKS AND REGARDS,
> > SYED ABDUL KATHER
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985754.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
> --
> Regards,
>
> Dmitry Kan
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985762.html
>  To unsubscribe from Lucene, click 
> here
> .
> NAML
>


-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985834.html
Sent from the Solr - User mailing list archive at Nabble.com.

filtering footer information

2012-05-23 Thread Mark , N
Is it possible to filter certain repeated  footer information from text
documents while indexing to solr ?

Are there any built-in filters similar to stop word filters ?




-- 
Thanks,

*Nipen Mark *


Re: Big Data Analysis and Management - 2 day Workshop

2012-05-23 Thread Dikchant Sahi
Hi Manish,

The attachment seems to be missing. Would you mind sharing the same.

Am a Search Engineer based in Bangalore. Would me interested in attending
the workshop.

Best Regards,
Dikchant Sahi

On Thu, May 24, 2012 at 10:22 AM, Manish Bafna wrote:

> Dear Friend,
> We are organizing a workshop on Big Data. Here are details regarding the
> same.
> Please forward it to your company HR and also your friends and let me know
> if anyone is interested. We have early bird offer if registration is done
> before 31st May 2012.
>
>
> Big Data is one space that is buzzing in the market big time. There are
> several applications of various technologies involved around Big Data. Many
> a times when we work as part of various project or product development we
> all will be streamlining our time and energy towards its successful
> delivery. To ensure your colleagues don't miss out on this hot topic and to
> stay abreast with these niche things, we thought we will share our
> expertise with Senior Developers and Architects through this workshop on *Big
> Data Analysis and Management* that we have scheduled on *Bangalore on
> June 16th and 17th.*
>  **
> We will be covering various topics under the following 4 broad headlines.
> You can check the attached outline for a detailed insight into what we will
> cover under each head. It is definitely going to be an intensive and
> relevant hands-on session along with vivid explanation of concepts and
> theories around it. On a lighter note, there is definitely going to be lot
> of jargons flowing around all participants in this short span of two days.
> *
> *
> *Content Extraction (hands-on using Apache Tika)*
> *Distribute Content in NOSQL ways (hands-on using Cassandra, Neo4j)* *Search
> and Indexing (hands-on using Solr and Tika)* *Distributed computing and
> analysis using Hadoop MapReduce and Mahout (hands-on using Hadoop
> MapReduce, Mahout)*
> To register for this workshop, kindly send a mail to me along with the
> details of the participants (along with their profile will be better) and
> payment details.
>
> I am enclosing herewith the complete course details attached along with
> this mail. I along with two of my peers will be delivering this workshop.
> You can find our brief profile mentioned in the attached content.
>
> Feel free to contact me any time for any queries
>
>  With best regards,
> Manish.
>


Fwd: Big Data Analysis and Management - 2 day Workshop

2012-05-23 Thread Manish Bafna
Dear Friend,
We are organizing a workshop on Big Data. Here are details regarding the
same.
Please forward it to your company HR and also your friends and let me know
if anyone is interested. We have early bird offer if registration is done
before 31st May 2012.


Big Data is one space that is buzzing in the market big time. There are
several applications of various technologies involved around Big Data. Many
a times when we work as part of various project or product development we
all will be streamlining our time and energy towards its successful
delivery. To ensure your colleagues don't miss out on this hot topic and to
stay abreast with these niche things, we thought we will share our
expertise with Senior Developers and Architects through this workshop on *Big
Data Analysis and Management* that we have scheduled on *Bangalore on June
16th and 17th.*
 **
We will be covering various topics under the following 4 broad headlines.
You can check the attached outline for a detailed insight into what we will
cover under each head. It is definitely going to be an intensive and
relevant hands-on session along with vivid explanation of concepts and
theories around it. On a lighter note, there is definitely going to be lot
of jargons flowing around all participants in this short span of two days.
*
*
*Content Extraction (hands-on using Apache Tika)*
*Distribute Content in NOSQL ways (hands-on using Cassandra, Neo4j)* *Search
and Indexing (hands-on using Solr and Tika)* *Distributed computing and
analysis using Hadoop MapReduce and Mahout (hands-on using Hadoop
MapReduce, Mahout)*
To register for this workshop, kindly send a mail to me along with the
details of the participants (along with their profile will be better) and
payment details.

I am enclosing herewith the complete course details attached along with
this mail. I along with two of my peers will be delivering this workshop.
You can find our brief profile mentioned in the attached content.

Feel free to contact me any time for any queries

 With best regards,
Manish.


Re: System requirements in my case?

2012-05-23 Thread Jan Høydahl
Well, 12000 is probably too little to do a representative sizing, but you can 
try an optimize() and then calculate what the size will be for 80mill docs. 
You'll definitely not be able to cache the whole index in memory on one server, 
but if you can live with that kind of performance then it's ok. Btw. clustering 
may get very expensive since you need to return many more hits than normally 
and that means disk I/O..

> Another question concerning the execution of solr, have just to run java -jar 
> start.jar ?
> or you think I must run it with another way ?

First you must decide what application server to use. You may of course use the 
built-in Jetty server which comes with Solr if you wish. You will have to set 
JVM parameters for memory, garbage collection, logging etc. See 
http://wiki.apache.org/solr/SolrPerformanceFactors and 
http://wiki.apache.org/solr/SolrJetty

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 22. mai 2012, at 15:16, Bruno Mannina wrote:

> Hi Jan,
> 
> Thanks for all these details !
> 
> Answers are below.
> 
> Sincerely,
> Bruno
> 
> 
> Le 22/05/2012 13:58, Jan Høydahl a écrit :
>> Hi,
>> 
>> It is impossible to guess the required HW size without more knowledge about 
>> data and usage. 80 mill docs is a fair amount.
>> 
>> Here's how I would approach sizing the setup:
>> 1) Get your schema in shape, removing unnecessary stored/indexed fields
> Ok good idea !
>> 2) To a test index locally of a part of the dataset, e.g. 10 mill docs and 
>> perform an Optimize
> Concerning test, I have only actually a sample with 12000 docs. no more :'(
>> 3) Measure the size of the index folder, multiply with 8 to get a clue of 
>> total index size
> With 12 000 docs my index folder size is: 33Mo
> ps: I use "solr.clustering.enabled=true"
> 
>> 4) Do some benchmarking with realistic types of queries to identify 
>> performance bottlenecks on query side
> yep, this point is for later.
> 
>> Depending on your requirements for search performance, you can beef up your 
>> RAM to hold the whole index or depend on slow disks as a bottleneck. If you 
>> find that total size of index is 16Gb, you should leave>16Gb free for OS 
>> disk caching, e.g. allocate 8Gb to Tomcat/Solr and leave the rest for the 
>> OS. If I should guess, you probably find that one server gets overloaded or 
>> too slow with your amount of docs, and that you end up with sharding across 
>> 2-4 servers.
> I will take a look to see if I can easely increase RAM on the server 
> (actually 24Go)
> 
> Another question concerning the execution of solr, have just to run java -jar 
> start.jar ?
> or you think I must run it with another way ?
> 
> 
>> PS: Do you always need to search all data? A trick may be to partition your 
>> data such that say 80% of searches go to a "fresh" index with 10% of the 
>> content, while the remaining searches include everything.
> Yes I need to search to the whole index, even old document must be requested.
> 
> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.facebook.com/Cominvent
>> Solr Training - www.solrtraining.com
>> 
>> On 22. mai 2012, at 11:06, Bruno Mannina wrote:
>> 
>>> My choice: http://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml
>>> 
>>> 24 Go DDR3
>>> 
>>> Le 22/05/2012 10:26, findbestopensource a écrit :
 Dedicated Server may not be required. If you want to cut down cost, then
 prefer shared server.
 
 How much the RAM?
 
 Regards
 Aditya
 www.findbestopensource.com
 
 
 On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina   wrote:
 
> Dear Solr users,
> 
> My company would like to use solr to index around 80 000 000 documents
> (xml files with around 5~10ko size each).
> My program (robot) will connect to this solr with boolean requests.
> 
> Number of users: around 1000
> Number of requests by user and by day: 300
> Number of users by day: 30
> 
> I would like to subscribe to a host provider with this configuration:
> - Dedicated Server
> - Ubuntu
> - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go
> - Unlimited bandwidth
> - IP fixe
> 
> Do you think this configuration is enough?
> 
> Thanks for your info,
> Sincerely
> Bruno
> 
>> 
>> 
> 



Re: Dismax boost + payload boost

2012-05-23 Thread matteosilv
I finally get it working... I was compiling using a different version of solr
my class 3.6, while my working solr version was the 3.5

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-boost-payload-boost-tp3432650p3985797.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-23 Thread Yandong Yao
Hi Mark, Darren

Thanks very much for your help, Will try collection for each customer then.

Regards,
Yandong


2012/5/22 Mark Miller 

> I think the key is this: you want to think of a SolrCore on a single node
> Solr installation as a collection on a multi node SolrCloud installation.
>
> So if you would use multiple SolrCore's with a std Solr setup, you should
> be using multiple collections in SolrCloud. If you were going to try to do
> everything in one SolrCore, that would be like putting everything in one
> collection in SolrCloud. I don't think it generally makes sense to try and
> work at the SolrCore level when working with SolrCloud. This will be made
> more clear once we add a simple collections api.
>
> So I think your choice should be similar to using a single node - do you
> want to put everything in one 'collection' and use a filter to separate
> customers (with all its caveats and limitations) or do you want to use a
> collection per customer. You can always start up more clusters if you reach
> any limits.
>
>
>
> On May 22, 2012, at 10:08 AM, Darren Govoni wrote:
>
> > I'm curious what the solrcloud experts say, but my suggestion is to try
> not to over-engineering the search architecture  on solrcloud. For example,
> what is the benefit of managing the what cores are indexed and searched?
> Having to know those details, in my mind, works against the automation in
> solrcore, but maybe there's a good reason you want to do it this way.
> >
> > --- Original Message ---
> > On 5/22/2012  07:35 AM Yandong Yao wrote:Hi Darren,
> > 
> > Thanks very much for your reply.
> > 
> > The reason I want to control core indexing/searching is that I want
> to
> > use one core to store one customer's data (all customer share same
> > config):  such as customer 1 use coreForCustomer1 and customer 2
> > use coreForCustomer2.
> > 
> > Is there any better way than using different core for different
> customer?
> > 
> > Another way maybe use different collection for different customer,
> while
> > not sure how many collections solr cloud could support. Which way is
> better
> > in terms of flexibility/scalability? (suppose there are tens of
> thousands
> > customers).
> > 
> > Regards,
> > Yandong
> > 
> > 2012/5/22 Darren Govoni 
> > 
> > > Why do you want to control what gets indexed into a core and then
> > > knowing what core to search? That's the kind of "knowing" that
> SolrCloud
> > > solves. In SolrCloud, it handles the distribution of documents
> across
> > > shards and retrieves them regardless of which node is searched
> from.
> > > That is the point of "cloud", you don't know the details of where
> > > exactly documents are being managed (i.e. they are cloudy). It can
> > > change and re-balance from time to time. SolrCloud performs the
> > > distributed search for you, therefore when you try to search a
> node/core
> > > with no documents, all the results from the "cloud" are retrieved
> > > regardless. This is considered "A Good Thing".
> > >
> > > It requires a change in thinking about indexing and searching
> > >
> > > On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
> > > > Hi Guys,
> > > >
> > > > I use following command to start solr cloud according to solr
> cloud wiki.
> > > >
> > > > yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
> > > > -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar
> start.jar
> > > > yydzero:example2 bjcoe$ java -Djetty.port=7574
> -DzkHost=localhost:9983
> > > -jar
> > > > start.jar
> > > >
> > > > Then I have created several cores using CoreAdmin API (
> > > > http://localhost:8983/solr/admin/cores?action=CREATE&name=
> > > > &collection=collection1), and clusterstate.json show
> following
> > > > topology:
> > > >
> > > >
> > > > collection1:
> > > > -- shard1:
> > > >   -- collection1
> > > >   -- CoreForCustomer1
> > > >   -- CoreForCustomer3
> > > >   -- CoreForCustomer5
> > > > -- shard2:
> > > >   -- collection1
> > > >   -- CoreForCustomer2
> > > >   -- CoreForCustomer4
> > > >
> > > >
> > > > 1) Index:
> > > >
> > > > Using following command to index mem.xml file in exampledocs
> directory.
> > > >
> > > > yydzero:exampledocs bjcoe$ java -Durl=
> > > > http://localhost:8983/solr/coreForCustomer3/update -jar
> post.jar mem.xml
> > > > SimplePostTool: version 1.4
> > > > SimplePostTool: POSTing files to
> > > > http://localhost:8983/solr/coreForCustomer3/update..
> > > > SimplePostTool: POSTing file mem.xml
> > > > SimplePostTool: COMMITting Solr index changes.
> > > >
> > > > And now SolrAdmin UI shows that 'coreForCustomer1',
> 'coreForCustomer3',
> > > > 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and
> other 2
> > > > core has 0 documents.
> > > >
> > > > *Question 1:*  Is this expected behavior? How do I to index
> documents
> > > into
> > > > a specific core?
> > > >
> > > > *Question 2*:  If SolrCloud don't support this yet, how cou

Re: Many Cores with Solr

2012-05-23 Thread Mike Douglass
My interest in this is the desire to create one index per user of a system -
the issue here is privacy - data indexed for one user should not be visible
to other users.

For this purpose solr will be hidden behind a proxy which steers
authenticated sessions to the appropriat ecore.

Does this seem like a valid/feasible approach?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3985789.html
Sent from the Solr - User mailing list archive at Nabble.com.


field compression in solr 3.6

2012-05-23 Thread pramila_tha...@ontla.ola.org
Hi Everyone,

solr 3.6 does not seem to be honoring the field compress.

While merging the indexes the size of Index is very big. 

Is there any other way to  handle this to keep compression functionality?

thanks,

--Pramila

--
View this message in context: 
http://lucene.472066.n3.nabble.com/field-compression-in-solr-3-6-tp3985780.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread Dmitry Kan
do you also see out of memory exception in your tomcat logs? If so, try
setting the JVM's -Xmx to something reasonable.

-- Dmitry

On Wed, May 23, 2012 at 10:09 PM, in.abdul  wrote:

> Sorry i missed the point i am already using Method.Post Only  .. Still i
> could not able to execute
> Thanks and Regards,
>S SYED ABDUL KATHER
>
>
>
> On Thu, May 24, 2012 at 12:19 AM, iorixxx [via Lucene] <
> ml-node+s472066n3985746...@n3.nabble.com> wrote:
>
> > > I have creteria where i am passing more than
> > > 10 ids in Query like
> > > q=(ROWINDEX:(1 2 3 4  )) using solrJ . i had increased
> > > the Max Boolean
> > > clause  = 10500 and i had increased the Max Header
> > > Size in tomcat also
> > > in sufficient amount .. But still its is throwing Null
> > > Pointer Exception in
> > > Tomcat and in Eclipse while debugging i had seen Error as
> > > "Error Executing
> > > Query" . Please give me suggestion for this.
> >
> >
> > If you are using GET method ( which is default) try POST method instead.
> > See how to use it : http://search-lucene.com/m/34M4GTEIaD
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985746.html
> >  To unsubscribe from Lucene, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
> -
> THANKS AND REGARDS,
> SYED ABDUL KATHER
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985754.html
> Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,

Dmitry Kan


Re: shard distribution of multiple collections in SolrCloud

2012-05-23 Thread Mark Miller
Yeah, currently you have to create the core on each node...we are working on a 
'collections' api that will make this a simple one call operation.

We should have this soon.

- Mark

On May 23, 2012, at 2:36 PM, Daniel Brügge wrote:

> Hi,
> 
> i am creating several cores using the following script. I use this for
> testing SolrCloud and to learn about the distribution of multiple
> collections.
> 
> max=500
>> for ((i=2; i<=$max; ++i )) ;
>> do
>>curl "
>> http://solrinstance1:8983/solr/admin/cores?action=CREATE&name=collection$i&collection=collection$i&collection.configName=myconfig
>> "
>> done
> 
> 
> I've setup a SolrCloud with 2 shards which are each replicated by 2 other
> instances I start.
> 
> When I first start the installation I have the default "collection1" in
> place which is sharded over shard1 and shard2 with 2 leader nodes and 2
> nodes which replicate the leaders.
> 
> When I run this script above which calls the Coreadmin on one of the
> shards, all the collections are created on only this shard without a
> replica. So e.g.
> 
> 
> "collection8":{"shard1":{"solrinstance1:8983_solr_collection8":{
>"shard":"shard1",
>"leader":"true",
>"state":"active",
>"core":"collection8",
>"collection":"collection8",
>"node_name":"solrinstance1:8983_solr",
> 
>"base_url":"http://solrinstance1:8983/solr"}}}
> 
> 
> I always thought, that via zookeeper these collections are sharded and
> replicated or do I need to call on each node the create core action? But
> then I need to know about these nodes, right?
> 
> 
> Thanks & regards
> 
> Daniel

- Mark Miller
lucidimagination.com













Re: always getting distinct count of -1 in luke response (solr4 snapshot)

2012-05-23 Thread Mike Hugo
Explicitly running an optimize on the index via the admin screens solved
this problem - the correct counts are now being returned.

On Tue, May 22, 2012 at 4:33 PM, Mike Hugo  wrote:

> We're testing a snapshot of Solr4 and I'm looking at some of the responses
> from the Luke request handler.  Everything looks good so far, with the
> exception of the "distinct" attribute which (in Solr3) shows me the
> distinct number of terms for a given field.
>
> Given the request below, I'm consistently getting a response back with a
> value in the "distinct" field of -1.  Is there something different I need
> to do to get back the actual distinct count?
>
> Thanks!
>
> Mike
>
> http://localhost:8080/solr/core1/admin/luke?wt=json&fl=label&numTerms=1
>
> "fields": {
> "label": {
> "type": "text_general",
> "schema": "IT-M--",
> "index": "(unstored field)",
> "docs": 63887,
> *"distinct": -1,*
> "topTerms": [
>


Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread in.abdul
Sorry i missed the point i am already using Method.Post Only  .. Still i
could not able to execute
Thanks and Regards,
S SYED ABDUL KATHER



On Thu, May 24, 2012 at 12:19 AM, iorixxx [via Lucene] <
ml-node+s472066n3985746...@n3.nabble.com> wrote:

> > I have creteria where i am passing more than
> > 10 ids in Query like
> > q=(ROWINDEX:(1 2 3 4  )) using solrJ . i had increased
> > the Max Boolean
> > clause  = 10500 and i had increased the Max Header
> > Size in tomcat also
> > in sufficient amount .. But still its is throwing Null
> > Pointer Exception in
> > Tomcat and in Eclipse while debugging i had seen Error as
> > "Error Executing
> > Query" . Please give me suggestion for this.
>
>
> If you are using GET method ( which is default) try POST method instead.
> See how to use it : http://search-lucene.com/m/34M4GTEIaD
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985746.html
>  To unsubscribe from Lucene, click 
> here
> .
> NAML
>


-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Throws-Null-Pointer-Exception-Even-Query-is-Correct-in-solr-tp3985736p3985754.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore solr

2012-05-23 Thread Amit Jha
Please any one can help me on this 

Rgds
AJ

On 23-May-2012, at 14:37, Jens Grivolla  wrote:

> So are you even doing text search in Solr at all, or just using it as a
> key-value store?
> 
> If the latter, do you have your schema configured so
> that only the search_id field is indexed (with a keyword tokenizer) and 
> everything else only stored? Also, are you sure that Solr is the best option 
> as a key-value store?
> 
> Jens
> 
> On 05/23/2012 04:34 AM, Amit Jha wrote:
>> Hi,
>> 
>> Thanks for your advice. It is basically a meta search application.
>> Users can perform a search on N number of data sources at a time. We
>> broadcast Parallel search to each  selected data sources and write
>> data to solr using custom build API(API and solr are deployed on
>> separate machine API job is to perform parallel search, write data to
>> solr ). API respond to application that some results are available
>> then application fires  a search query to display the results(query
>> would be q=unique_search_id). And other side API keep writing data to
>> solr and user can fire a search to solr to view all results.
>> 
>> In current scenario we are using single solr server&  we performing
>> real time index and search. Performing these operations on single
>> solr making process slow as index size increases.
>> 
>> So we are planning to use multi core solr and each user will have its
>> core. All core will have the same schema.
>> 
>> Please suggest if this approach has any issues.
>> 
>> Rgds AJ
>> 
>> On 22-May-2012, at 20:14, Sohail Aboobaker
>> wrote:
>> 
>>> It would help if you provide your use case. What are you indexing
>>> for each user and why would you need a separate core for indexing
>>> each user? How do you decide schema for each user? It might be
>>> better to describe your use case and desired results. People on the
>>> list will be able to advice on the best approach.
>>> 
>>> Sohail
>> 
> 
> 


Re: how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-23 Thread srini
hi iorixxx,

Thank you for your reply. Appreciate it. There are few areas I need little
clarity. I am not using any queries. Every thing is been implemented as part
of config files(schema.xml, data-config.xml, solr-config.xml). Could you
give some more hints based on below config files specification.

data-config.xml:


schema.xml:

   

solrconfig.xml

  
 
DESCRIPTION


when I use solr/browse it does display description. I want this to be 300
char width and need to be in form of hyperlink so that user can read entire
description by clicking link.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-reduce-the-result-size-to-2-3-lines-and-expand-based-on-user-interest-tp3985692p3985747.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread Ahmet Arslan
>     I have creteria where i am passing more than
> 10 ids in Query like
> q=(ROWINDEX:(1 2 3 4  )) using solrJ . i had increased
> the Max Boolean
> clause  = 10500 and i had increased the Max Header
> Size in tomcat also
> in sufficient amount .. But still its is throwing Null
> Pointer Exception in
> Tomcat and in Eclipse while debugging i had seen Error as
> "Error Executing
> Query" . Please give me suggestion for this.


If you are using GET method ( which is default) try POST method instead.
See how to use it : http://search-lucene.com/m/34M4GTEIaD


shard distribution of multiple collections in SolrCloud

2012-05-23 Thread Daniel Brügge
Hi,

i am creating several cores using the following script. I use this for
testing SolrCloud and to learn about the distribution of multiple
collections.

max=500
> for ((i=2; i<=$max; ++i )) ;
> do
> curl "
> http://solrinstance1:8983/solr/admin/cores?action=CREATE&name=collection$i&collection=collection$i&collection.configName=myconfig
> "
> done


I've setup a SolrCloud with 2 shards which are each replicated by 2 other
instances I start.

When I first start the installation I have the default "collection1" in
place which is sharded over shard1 and shard2 with 2 leader nodes and 2
nodes which replicate the leaders.

When I run this script above which calls the Coreadmin on one of the
shards, all the collections are created on only this shard without a
replica. So e.g.


"collection8":{"shard1":{"solrinstance1:8983_solr_collection8":{
"shard":"shard1",
"leader":"true",
"state":"active",
"core":"collection8",
"collection":"collection8",
"node_name":"solrinstance1:8983_solr",

"base_url":"http://solrinstance1:8983/solr"}}}


I always thought, that via zookeeper these collections are sharded and
replicated or do I need to call on each node the create core action? But
then I need to know about these nodes, right?


Thanks & regards

Daniel


Throws Null Pointer Exception Even Query is Correct in solr

2012-05-23 Thread syed kather
Team ,
I have creteria where i am passing more than 10 ids in Query like
q=(ROWINDEX:(1 2 3 4  )) using solrJ . i had increased the Max Boolean
clause  = 10500 and i had increased the Max Header Size in tomcat also
in sufficient amount .. But still its is throwing Null Pointer Exception in
Tomcat and in Eclipse while debugging i had seen Error as "Error Executing
Query" . Please give me suggestion for this.

Note: While the ids are below or equal to 99800 the Query is returning the
Result
Thanks and Regards,
S SYED ABDUL KATHER


Re: configuring solr3.6 for a large intensive index only run

2012-05-23 Thread Lance Norskog
If you want to suppress merging, set the 'mergeFactor' very high.
Perhaps 100. Note that Lucene opens many files (50? 100? 200?) for
each segment. You would have to set the 'ulimit' for file descriptors
to 'unlimited' or 'millions'.

Later, you can call optimize with a 'maxSegments' value. Optimize will
stop at maxSegments instead of merging down to one. Lucene these days
does not need to have one segment, so merging down to 20 or 50 is
fine.

On Wed, May 23, 2012 at 11:19 AM, Scott Preddy  wrote:
> I am trying to do a very large insertion (about 68million documents) into a
> solr instance.
>
> Our schema is pretty simple. About 40 fields using these types:
>
>   
>       omitNorms="true"/>
>       positionIncrementGap="100">
>         
>            
>            
>         
>         
>            
>            
>         
>      
>       omitNorms="true" positionIncrementGap="0"/>
>   
>
> We are running solrj clients from a hadoop cluster, and are struggling with
> the merge process as time progresses.
> As the number of documents grows, merging will eventually hog everything.
>
> What we would really like to do is turn merging off and just do an index
> run with a sparse solrconfig and then
> start things back up with our runtime config which would kick off merging
> when it starts.
>
> Is there a way to do this?
>
> I came close to finding an answer in this post, but did not find out how to
> actually turn off merging.
>
> Post by Mike McCandless:
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html



-- 
Lance Norskog
goks...@gmail.com


configuring solr3.6 for a large intensive index only run

2012-05-23 Thread Scott Preddy
I am trying to do a very large insertion (about 68million documents) into a
solr instance.

Our schema is pretty simple. About 40 fields using these types:

   
  
  
 


 
 


 
  
  
   

We are running solrj clients from a hadoop cluster, and are struggling with
the merge process as time progresses.
As the number of documents grows, merging will eventually hog everything.

What we would really like to do is turn merging off and just do an index
run with a sparse solrconfig and then
start things back up with our runtime config which would kick off merging
when it starts.

Is there a way to do this?

I came close to finding an answer in this post, but did not find out how to
actually turn off merging.

Post by Mike McCandless:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html


Tips on creating a custom QueryCache?

2012-05-23 Thread Aaron Daubman
Greetings,

I'm looking for pointers on where to start when creating a
custom QueryCache.
Our usage patterns are possibly a bit unique, so let me explain the desired
use case:

Our Solr index is read-only except for dedicated periods where it is
updated and re-optimized.

On startup, I would like to create a specific QueryCache that would cache
the top ~20,000 (arbitrary but large) queries. This cache should never
evict entries, and, after the "warming process" to populate, should never
be added to either.

The warming process would be to run through the (externally determined)
list of anticipated top X (say 20,000) queries and cache these results.

This cache would then be used for the duration of the solr run-time (until
the period, perhaps daily, where the index is updated and re-optimized, at
which point the cache would be re-created)

Where should I begin looking to implement such a cache?

The reason for this somewhat different approach to caching is that we may
get any number of odd queries throughout the day for which performance
isn't important, and we don't want any of these being added to the cache or
evicting other entries from the cache. We need to ensure high performance
for this pre-determined list of queries only (while still handling other
arbitrary queries, if not as quickly)

Thanks,
  Aaron


Re: how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-23 Thread Ahmet Arslan
> I am using DIH to import data from
> Oracle and every thing is working fine.
> the description field usually contains  more lines
> (from 10-300 lines). When
> I present the results through Solr/Browse It displays the
> results. However I
> have a requirement to show only 2-3 lines as description and
> provide some
> kind of link to expand the message if user is interested in
> reading it
> further.

Not sure about lines but if number of characters satisfies your needs, you can 
use this trick to return first N (300) characters to users.

&hl=true&hl.fl=TEXT&hl.maxAnalyzedChars=0&f.TEXT.hl.alternateField=TEXT&f.TEXT.hl.maxAlternateFieldLength=300

(replace TEXT with description in above URL)

Please note that you will display results coming under  section. Not 

Re: getTransformer error

2012-05-23 Thread watson
Anyone found a solution to the getTransformer error. I am getting the same
error.

Here is my output:


Problem accessing /solr/JOBS/select/. Reason:

getTransformer fails in getContentType

java.lang.RuntimeException: getTransformer fails in getContentType
at
org.apache.solr.response.XSLTResponseWriter.getContentType(XSLTResponseWriter.java:72)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
at
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:129)
at
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:59)
at
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:122)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:110)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Unable to initialize Templates
'example.xslt'
at
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:117)
at
org.apache.solr.util.xslt.TransformerProvider.getTransformer(TransformerProvider.java:77)
at
org.apache.solr.response.XSLTResponseWriter.getTransformer(XSLTResponseWriter.java:130)
at
org.apache.solr.response.XSLTResponseWriter.getContentType(XSLTResponseWriter.java:69)
... 23 more
Caused by: javax.xml.transform.TransformerConfigurationException: Could not
compile stylesheet
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(Unknown
Source)
at
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)
... 26 more


--
View this message in context: 
http://lucene.472066.n3.nabble.com/getTransformer-error-tp3047726p3985687.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr error when querying.

2012-05-23 Thread watson
Here is my query:
http://127.0.0.1:/solr/JOBS/select/??q=Apache&wt=xslt&tr=example.xslt

The response I get is the following.  I have example.xslt in the /conf/xslt
path.   What is wrong here?  Thanks!


HTTP ERROR 500

Problem accessing /solr/JOBS/select/. Reason:

getTransformer fails in getContentType

java.lang.RuntimeException: getTransformer fails in getContentType
at
org.apache.solr.response.XSLTResponseWriter.getContentType(XSLTResponseWriter.java:72)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:261)
at
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:129)
at
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:59)
at
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:122)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:110)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Unable to initialize Templates
'example.xslt'
at
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:117)
at
org.apache.solr.util.xslt.TransformerProvider.getTransformer(TransformerProvider.java:77)
at
org.apache.solr.response.XSLTResponseWriter.getTransformer(XSLTResponseWriter.java:130)
at
org.apache.solr.response.XSLTResponseWriter.getContentType(XSLTResponseWriter.java:69)
... 23 more
Caused by: javax.xml.transform.TransformerConfigurationException: Could not
compile stylesheet
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(Unknown
Source)
at
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)
... 26 more


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-error-when-querying-tp3985677.html
Sent from the Solr - User mailing list archive at Nabble.com.


TermComponent and Optimize

2012-05-23 Thread Dario Rigolin
We have an issue with TermComponent on Solr 3.6 (and 3.5), using term list on 
filed id (unique id of documents) we receive as reply that we have multiple 
documents with the same id!!
Doing a search only one doc is returned as expected.

After more deep investigation this issue is "fixed" doing an index optimize. 
After an optimize all term list on id field returns that all ids are unique.
If we update a single document this document id will be listed by termcomponent 
as used in two documents.
It seems that TermComponent is looking at all versions of documents in the 
index.

Does this is the expected behavior for TermComponent? Any suggestion about how 
to solve this?
We use TermComponent to do a smart autocomplete about values on fields.

Thank you

---
Comperio srl
Dario Rigolin 

how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-23 Thread srini
I am using DIH to import data from Oracle and every thing is working fine.
the description field usually contains  more lines (from 10-300 lines). When
I present the results through Solr/Browse It displays the results. However I
have a requirement to show only 2-3 lines as description and provide some
kind of link to expand the message if user is interested in reading it
further.

I am guessing I need to play around templates in velocity folder, Can anyone
throw some light???

Thanks
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-reduce-the-result-size-to-2-3-lines-and-expand-based-on-user-interest-tp3985692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceted on Similarity ?

2012-05-23 Thread Robby
Hi Lee Carrol,

Will take a look on the pointers. Really appreciate your feedback, thank
you.

Regards,

Robby

On Wed, May 23, 2012 at 5:04 AM, Lee Carroll
wrote:

> Take a look at the clustering component
>
> http://wiki.apache.org/solr/ClusteringComponent
>
> Consider clustering off line and indexing the pre calculated group
> memberships
>
> I might be wrong but I don't think their is any faceting mileage here.
> Depending upon the use case
> you might get some use out of the mlt handler
>
> http://wiki.apache.org/solr/MoreLikeThis
>


RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> I'd guess that this is because SnowballPorterFilterFactory 
> does not implement MultiTermAwareComponent. Not sure, though.

Yes, I think this hinders the automagically multiterm awarness to do it's
job.
Could an own analyzer chain with  help? Like
described (very, very short, too short...) here:
http://wiki.apache.org/solr/MultitermQueryAnalysis



Re: Indexing files using multi-cores - could not fix after many retries

2012-05-23 Thread sudarshan
Thanks Gora i,t worked.

 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-files-using-multi-cores-could-not-fix-after-many-retries-tp3985253p3985672.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Michael Ryan
I'd guess that this is because SnowballPorterFilterFactory does not implement 
MultiTermAwareComponent. Not sure, though.

-Michael


RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
> Maybe a filter like ISOLatin1AccentFilter that doesn't get 
> applied when 
> using wildcards? How do the terms actually appear in the index?

Bär get indexed as bar.

I use not ISOLatin1AccentFilter . My field def is this:


  
 





  
  





 
  

 



Re: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Jens Grivolla
Maybe a filter like ISOLatin1AccentFilter that doesn't get applied when 
using wildcards? How do the terms actually appear in the index?


Jens

On 05/23/2012 01:19 PM, spr...@gmx.eu wrote:

No one an idea?

Thx.



The text may contain "FooBar".

When I do a wildcard search like this: "Foo*" - no hits.
When I do a wildcard search like this: "foo*" - doc is
found.


Please see http://wiki.apache.org/solr/MultitermQueryAnalysis



Well, it works in 3.6. With one exception: If I use german umlauts it does
not work anymore.

Text: Bär

Bä* ->  no hits
Bär ->  hits

What can I do in this case?

Thank you







Planning of future Solr setup

2012-05-23 Thread Christian von Wendt-Jensen
Hi,

I'm in the middle of planning a new Solr setup. The situation is this:
- We currently have one document type with around 20 fields, indexed, not 
stored, except for a few date fields
- We currently have indexed 400M documents across 20+ shards.
- The number of documents to be indexed is around 1M/day, and this number is 
increasing.
- The index files totals to around 750GB
- Users will mostly search newly indexed documents (news), and therefore the 
shards represents dateranges.
- Each month or so, we add a new shard.


In my planning, my goals are:
- it should be very easy to add a new shard and bring it online. Maybe it could 
even be fully automated.
- it should be very easy to retire a (old) shard in order to reclaim the 
hardware resources for newer documents.
- It should be very easy to scale wide or high by adding more machines or more 
CPU/RAM. The resources should be able to autobalance the shards for optimum 
resources usage.
- Rebalancing should be very fast.
- The setup should support one writer and many readers of the same physical 
index. This avoids replication and moving large files around. This again 
supports fast rebalancing of hardware resources.
- Clients should be notified about shards coming online or going offline.

The goals require a kind of distributed configuration and notifcation system. 
Here I imagine Zookeeper comes into play.
In order to make rebalancing very fast, the index should stay where they are, 
and not be moved around. Instead Solr instances on available resources should 
be configured to point to relevant shards. This requires a SAN storage, I 
imagine.


Questions:
1. What is best practice in regard to using a machines resources: one tomcat 
instance per one shard until memory and CPU is used up? Or rather one 
tomcat/multiple cores, and the tomcat gets all memory available on the machine?
2. Would it be a good idea to mix master and slave cores in the same tomcat 
instance or should a machine be dedicated to either master cores or slave cores?
3. What would be the best way to notify the slave cores about recent commits by 
the masters, remembering that replication is disabled?
4. In the one writer, many readers scenario, what happens when the writer 
merges/updates segments? Will the index files be physically deleted/altered? 
And how will the slaves react to that?
5. Would it be advisable to use a SAN for sharing index files between readers 
and writers (one writer)? Any best practices on this area? I imagine one large 
share on the SAN that all "resources" can mount.






Med venlig hilsen / Best Regards

Christian von Wendt-Jensen



RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
 

> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com] 
> Sent: Mittwoch, 23. Mai 2012 14:02
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
> 
> do umlauts arrive properly on the server side, no encoding 
> issues?

Yes, works fine.

It must, since I have hits for Bär or bär.
It's just the combination between umlauts and wildcards.
Must be something with the automagically Multiterm feature in Solr 3.6.




Re: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Dmitry Kan
do umlauts arrive properly on the server side, no encoding issues? Check
the query params of the response xml/json/.. set debugQuery to true as well
to see if it produces any useful diagnostic info.

On Wed, May 23, 2012 at 2:58 PM,  wrote:

> No. No hits for bä*.
> It's something with the umlauts but I have no idea what...
>
> > -Original Message-
> > From: Dmitry Kan [mailto:dmitry@gmail.com]
> > Sent: Mittwoch, 23. Mai 2012 13:36
> > To: solr-user@lucene.apache.org
> > Subject: Re: Wildcard-Search Solr 3.5.0
> >
> > what about bä*->hits?
> >
> > -- Dmitry
> >
> > On Wed, May 23, 2012 at 2:19 PM,  wrote:
> >
> > > No one an idea?
> > >
> > > Thx.
> > >
> > >
> > > > > The text may contain "FooBar".
> > > > >
> > > > > When I do a wildcard search like this: "Foo*" - no hits.
> > > > > When I do a wildcard search like this: "foo*" - doc is
> > > > > found.
> > > >
> > > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
> > >
> > >
> > > Well, it works in 3.6. With one exception: If I use german
> > umlauts it does
> > > not work anymore.
> > >
> > > Text: Bär
> > >
> > > Bä* -> no hits
> > > Bär -> hits
> > >
> > > What can I do in this case?
> > >
> > > Thank you
> > >
> > >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>
>


-- 
Regards,

Dmitry Kan


RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
No. No hits for bä*.
It's something with the umlauts but I have no idea what...

> -Original Message-
> From: Dmitry Kan [mailto:dmitry@gmail.com] 
> Sent: Mittwoch, 23. Mai 2012 13:36
> To: solr-user@lucene.apache.org
> Subject: Re: Wildcard-Search Solr 3.5.0
> 
> what about bä*->hits?
> 
> -- Dmitry
> 
> On Wed, May 23, 2012 at 2:19 PM,  wrote:
> 
> > No one an idea?
> >
> > Thx.
> >
> >
> > > > The text may contain "FooBar".
> > > >
> > > > When I do a wildcard search like this: "Foo*" - no hits.
> > > > When I do a wildcard search like this: "foo*" - doc is
> > > > found.
> > >
> > > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
> >
> >
> > Well, it works in 3.6. With one exception: If I use german 
> umlauts it does
> > not work anymore.
> >
> > Text: Bär
> >
> > Bä* -> no hits
> > Bär -> hits
> >
> > What can I do in this case?
> >
> > Thank you
> >
> >
> 
> 
> -- 
> Regards,
> 
> Dmitry Kan
> 



Re: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Dmitry Kan
what about bä*->hits?

-- Dmitry

On Wed, May 23, 2012 at 2:19 PM,  wrote:

> No one an idea?
>
> Thx.
>
>
> > > The text may contain "FooBar".
> > >
> > > When I do a wildcard search like this: "Foo*" - no hits.
> > > When I do a wildcard search like this: "foo*" - doc is
> > > found.
> >
> > Please see http://wiki.apache.org/solr/MultitermQueryAnalysis
>
>
> Well, it works in 3.6. With one exception: If I use german umlauts it does
> not work anymore.
>
> Text: Bär
>
> Bä* -> no hits
> Bär -> hits
>
> What can I do in this case?
>
> Thank you
>
>


-- 
Regards,

Dmitry Kan


RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread spring
No one an idea?

Thx.


> > The text may contain "FooBar".
> > 
> > When I do a wildcard search like this: "Foo*" - no hits.
> > When I do a wildcard search like this: "foo*" - doc is
> > found.
> 
> Please see http://wiki.apache.org/solr/MultitermQueryAnalysis


Well, it works in 3.6. With one exception: If I use german umlauts it does
not work anymore.

Text: Bär

Bä* -> no hits
Bär -> hits

What can I do in this case?

Thank you



Re: Dismax boost + payload boost

2012-05-23 Thread matteosilv
yes, as in the linked post:

public class PayloadSimilarity extends DefaultSimilarity
{
@Override public float scorePayload(int docId, String fieldName, int
start, int end, byte[] payload, int offset, int length)
{
if (length > 0) {
return PayloadHelper.decodeFloat(payload, offset);
}
return 1.0f;
}
}

and a line in the schema.xml pointing at it

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-boost-payload-boost-tp3432650p3985640.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strategy for maintaining De-normalized indexes

2012-05-23 Thread Aditya
Hi Sohail,

In my previous mail, I mentioned about storing categories as separate
record. You should store and index Category name, MainProduct name as
separate record. Index ChildProduct name, MainProduct as separate record.

When you want the count,
1. Retrieve the main product name matching the category
2. Retrieve the list of child products matching the main product

You may need to two query but it is worth. You don't need to delete and
bunch of records.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 5:12 PM, Sohail Aboobaker wrote:

> We are still in design phase, so we haven't hit any performance issues. We
> do not want to discover performance issues too late during QA :) We would
> rather account for any issues during the design phase.
>
> The refresh rate on fields that we are using from master table will be
> rare. May be three or four times in a year.
>
> Regards,
> Sohail
>


Re: Multicore solr

2012-05-23 Thread Shanu Jha
Jens,

Yes we are doing text search.

My question to all is, the approach of creating cores for each user is a
good idea?

AJ

On Wed, May 23, 2012 at 2:37 PM, Jens Grivolla  wrote:

> So are you even doing text search in Solr at all, or just using it as a
> key-value store?
>
> If the latter, do you have your schema configured so
> that only the search_id field is indexed (with a keyword tokenizer) and
> everything else only stored? Also, are you sure that Solr is the best
> option as a key-value store?
>
> Jens
>
>
> On 05/23/2012 04:34 AM, Amit Jha wrote:
>
>> Hi,
>>
>> Thanks for your advice. It is basically a meta search application.
>> Users can perform a search on N number of data sources at a time. We
>> broadcast Parallel search to each  selected data sources and write
>> data to solr using custom build API(API and solr are deployed on
>> separate machine API job is to perform parallel search, write data to
>> solr ). API respond to application that some results are available
>> then application fires  a search query to display the results(query
>> would be q=unique_search_id). And other side API keep writing data to
>> solr and user can fire a search to solr to view all results.
>>
>> In current scenario we are using single solr server&  we performing
>>
>> real time index and search. Performing these operations on single
>> solr making process slow as index size increases.
>>
>> So we are planning to use multi core solr and each user will have its
>> core. All core will have the same schema.
>>
>> Please suggest if this approach has any issues.
>>
>> Rgds AJ
>>
>> On 22-May-2012, at 20:14, Sohail Aboobaker
>> wrote:
>>
>>  It would help if you provide your use case. What are you indexing
>>> for each user and why would you need a separate core for indexing
>>> each user? How do you decide schema for each user? It might be
>>> better to describe your use case and desired results. People on the
>>> list will be able to advice on the best approach.
>>>
>>> Sohail
>>>
>>
>>
>
>


Re: Multicore solr

2012-05-23 Thread Jens Grivolla

So are you even doing text search in Solr at all, or just using it as a
key-value store?

If the latter, do you have your schema configured so
that only the search_id field is indexed (with a keyword tokenizer) and 
everything else only stored? Also, are you sure that Solr is the best 
option as a key-value store?


Jens

On 05/23/2012 04:34 AM, Amit Jha wrote:

Hi,

Thanks for your advice. It is basically a meta search application.
Users can perform a search on N number of data sources at a time. We
broadcast Parallel search to each  selected data sources and write
data to solr using custom build API(API and solr are deployed on
separate machine API job is to perform parallel search, write data to
solr ). API respond to application that some results are available
then application fires  a search query to display the results(query
would be q=unique_search_id). And other side API keep writing data to
solr and user can fire a search to solr to view all results.

In current scenario we are using single solr server&  we performing
real time index and search. Performing these operations on single
solr making process slow as index size increases.

So we are planning to use multi core solr and each user will have its
core. All core will have the same schema.

Please suggest if this approach has any issues.

Rgds AJ

On 22-May-2012, at 20:14, Sohail Aboobaker
wrote:


It would help if you provide your use case. What are you indexing
for each user and why would you need a separate core for indexing
each user? How do you decide schema for each user? It might be
better to describe your use case and desired results. People on the
list will be able to advice on the best approach.

Sohail







Re: clickable links as results?

2012-05-23 Thread Dmitry Kan
Hello,

Set up a schema with at least 3 fields: id (integer or string, unique),
doc_contents (of type text_en for example), link (string).
Index each document into doc_contents field and use hightlights when
searching (hl=true&hl.fl=doc_contents). For each hit, count how many
highlights you have gotten and that's your occurrence_no, retrieve the link
as well.

-- Dmitry

On Tue, May 22, 2012 at 10:02 PM, 12rad  wrote:

> Hi,
>
> I want to display - a clickable link to the document along if a search
> matches along with the no of times the search query matched.
> What should i be looking at?
> I am fairly new to Solr and don't know how I can achieve this.
>
> Thanks for the help!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/clickable-links-as-results-tp3985505.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Dmitry Kan


Re: Dismax boost + payload boost

2012-05-23 Thread Ahmet Arslan
> In the query debug i can see that i'm using the modified
> query parser
> however there aren't debug information about the payload
> boosts.
> I've not implemented a request handler, but i'm specifiying
> all the
> parameters (e.g. defType=payload plf=entityIdList...) in the
> request.
> What am i possibly doing wrong?

Do you have a custom similarity that overrides scorePayload too?


Re: Dismax boost + payload boost

2012-05-23 Thread matteosilv
I'm trying to get working the digitalpebble payload queryparser...
*I've a multivalued field with payloads:*



*with type:*


  




  
  


  


In the query debug i can see that i'm using the modified query parser
however there aren't debug information about the payload boosts.
I've not implemented a request handler, but i'm specifiying all the
parameters (e.g. defType=payload plf=entityIdList...) in the request.
What am i possibly doing wrong?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-boost-payload-boost-tp3432650p3985603.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore solr

2012-05-23 Thread Shanu Jha
Awaiting for suggestions.

On Wed, May 23, 2012 at 8:04 AM, Amit Jha  wrote:

> Hi,
>
> Thanks for your advice.
> It is basically a meta search application. Users can perform a search on N
> number of data sources at a time. We broadcast Parallel search to each
>  selected data sources and write data to solr using custom build API(API
> and solr are deployed on separate machine API job is to perform parallel
> search, write data to solr ). API respond to application that some results
> are available then application fires  a search query to display the
> results(query would be q=unique_search_id). And other side API keep writing
> data to solr and user can fire a search to solr to view all results.
>
> In current scenario we are using single solr server & we performing real
> time index and search. Performing these operations on single solr making
> process slow as index size increases.
>
> So we are planning to use multi core solr and each user will have its
> core. All core will have the same schema.
>
> Please suggest if this approach has any issues.
>
> Rgds
> AJ
>
> On 22-May-2012, at 20:14, Sohail Aboobaker  wrote:
>
> > It would help if you provide your use case. What are you indexing for
> each
> > user and why would you need a separate core for indexing each user? How
> do
> > you decide schema for each user? It might be better to describe your use
> > case and desired results. People on the list will be able to advice on
> the
> > best approach.
> >
> > Sohail
>