Re: Doubts in PathHierarchyTokenizer

2012-09-12 Thread mechravi25
Thanks a lot koji. It worked.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doubts-in-PathHierarchyTokenizer-tp4007216p4007373.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to know the indexversion when sending a document ?

2012-09-12 Thread Laurent Vaills
I forgot to mention that I am using Solr 3.6 .

Laurent

2012/9/12 Laurent Vaills 

> Hi,
>
> When I index a document in Solr, I do not commit immediately after, I use
> the autoCommit feature.
> But I would like to know in which index's version the document will be
> available. Is that possible to get this information in the HTTP response
> when sending the document to index ?
>
> Also, Is it possible to have the index's version in the Solr
> responseHeader block when I issue a search query  ?
>
> Regards,
> Laurent
>
>


Re: Can solr return matched fields?

2012-09-12 Thread Mikhail Khludnev
Dan,

if you have "foo bar" search phrase against field: NAME, BRAND, and you
have 10K docs matched and 100 first ones is displayed, what do you actually
want to see as "fields the query matched" and for which docs?

looking forward for additional details.

On Thu, Sep 13, 2012 at 2:40 AM, Jack Krupansky wrote:

> But presumably "matched" fields relates to indexed fields, which might not
> have stored values.
>
> -- Jack Krupansky
>
> -Original Message- From: Casey Callendrello
> Sent: Wednesday, September 12, 2012 6:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can solr return matched fields?
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: SolrCloud fail over

2012-09-12 Thread andy
Cool,Thanks Mark!

Mark Miller-3 wrote
> 
> Either setup a load balancer, or use the SolrCloud solrj client
> CloudSolrServer - it takes a comma separated list of zk servers rather
> than a solr url.
> 
> On Tue, Sep 11, 2012 at 10:17 PM, andy  wrote:
>> I know fail over is available in solr4.0 right now, if one server
>> crashes,other servers also support query,I set up a solr cloud like this
>> http://lucene.472066.n3.nabble.com/file/n4007117/Selection_028.png
>>
>> I use http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml for
>> query
>> at first, if the node  8983 crashes, I have to access other nodes for
>> query
>> like http://localhost:8900/solr/collection1/select?q=*%3A*&wt=xml
>>
>> but I use the nodes url in the solrj, how to change the request url
>> dynamically?
>> does SolrCloud support something like virtual ip address? for example I
>> use
>> url http://collections1 in the solrj, and forward the request to
>> available
>> url automatically.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrCloud-fail-over-tp4007117.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> - Mark
> 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-fail-over-tp4007117p4007360.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to do an "if" statement in a Solr query?

2012-09-12 Thread Amit Nithian
If the fact that it's "original" vs "generic" is a field "is_original"
0/1 can you sort by is_original? Similarly, could you put a huge boost
on is_original in the dismax so that document matches on is_original
score higher than those that aren't original? Or is your goal to not
show generics *at all*?


On Wed, Sep 12, 2012 at 2:47 PM, Walter Underwood  wrote:
> You may be able to do this with grouping. Group on the medicine "family", and 
> only show the Original if there are multiple items in the family.
>
> wunder
>
> On Sep 12, 2012, at 2:09 PM, Gustav wrote:
>
>> Hello everyone, I'm working on an e-commerce website and using Solr as my
>> Search Engine, im really enjoying its funcionality and the search
>> options/performance.
>> But i am stucky in a kinda tricky cenario... That what happens:
>>
>> I Have  a medicine web-store, where i indexed all necessary products in my
>> Index Solr.
>> But when i search for some medicine, following my business rules, i have to
>> verify if the result of my search contains any Original medicine, if there
>> is any, then i wouldn't show the generics of this respective medicine, on
>> the other hand, if there wasnt any original product in the result i would
>> have to return its generics.
>> Im currently returning the original and generics, is there a way to do this
>> kind of "checking" in solr?
>>
>> Thanks! :)
>>
>
>
>
>


Re: How does Solr handle overloads so well?

2012-09-12 Thread Otis Gospodnetic
Hm, I'm not sure how to approach this. Solr is not alone here - there's
container like jetty, solr inside it and lucene inside solr.
Next, that index is rally small, so there is no disk IO. The request
rate is also not super high and if you did this over a fast connection then
there are also no issues with slow response writing or with having lots of
concurrent connections or running out of threads ...

...so it's not really that surprising solr keeps working :)

But...tell us more.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Sep 12, 2012 8:51 PM, "Mike Gagnon"  wrote:

> Hi,
>
> I have been studying how server software responds to requests that cause
> CPU overloads (such as infinite loops).
>
> In my experiments I have observed that Solr performs unusually well when
> subjected to such loads. Every other piece of web software I've
> experimented with drops to zero service under such loads. Do you know how
> Solr achieves such good performance? I am guessing that when Solr is
> overload sheds load to make room for incoming requests, but I could not
> find any documentation that describes Solr's overload strategy.
>
> Experimental setup: I ran Solr 3.1 on a 12-core machine with 12 GB ram,
> using it index and search about 10,000 pages on MediaWiki. I test both
> Solr+Jetty and Solr+Tomcat. I submitted a variety of Solr queries at a rate
> of 300 requests per second. At the same time, I submitted "overload
> requests" at a rate of 60 requests per second. Each overload request caused
> an infinite loop in Solr via
> https://issues.apache.org/jira/browse/SOLR-2631.
>
> With Jetty about 70% of non-overload requests completed --- 95% of requests
> completing within 0.6 seconds.
> With Tomcat about 34% of non-overload requests completed --- 95% of
> requests completing within 0.6 seconds.
>
> I also ran Solr+Jetty with non-overload requests coming in 65 requests per
> second (overload requests remain at 60 requests per second). In this
> workload, the completion rate drops to 15% and the 95th percentile latency
> increases to 25.
>
> Cheers,
> Mike Gagnon
>


How does Solr handle overloads so well?

2012-09-12 Thread Mike Gagnon
Hi,

I have been studying how server software responds to requests that cause
CPU overloads (such as infinite loops).

In my experiments I have observed that Solr performs unusually well when
subjected to such loads. Every other piece of web software I've
experimented with drops to zero service under such loads. Do you know how
Solr achieves such good performance? I am guessing that when Solr is
overload sheds load to make room for incoming requests, but I could not
find any documentation that describes Solr's overload strategy.

Experimental setup: I ran Solr 3.1 on a 12-core machine with 12 GB ram,
using it index and search about 10,000 pages on MediaWiki. I test both
Solr+Jetty and Solr+Tomcat. I submitted a variety of Solr queries at a rate
of 300 requests per second. At the same time, I submitted "overload
requests" at a rate of 60 requests per second. Each overload request caused
an infinite loop in Solr via https://issues.apache.org/jira/browse/SOLR-2631.

With Jetty about 70% of non-overload requests completed --- 95% of requests
completing within 0.6 seconds.
With Tomcat about 34% of non-overload requests completed --- 95% of
requests completing within 0.6 seconds.

I also ran Solr+Jetty with non-overload requests coming in 65 requests per
second (overload requests remain at 60 requests per second). In this
workload, the completion rate drops to 15% and the 95th percentile latency
increases to 25.

Cheers,
Mike Gagnon


Re: Want a multi-datacenter environment with Solr?

2012-09-12 Thread Otis Gospodnetic
Is that with plain Apache Solr or Datastax?

Otis
--
Performance Monitoring - http://sematext.com/spm
On Sep 12, 2012 7:55 PM, "Stephanie Huynh"  wrote:

> Does anyone want me to send them a white paper on having a
> multi-datacenter environment with Solr?
>
> Best,
> Stephanie
>


Want a multi-datacenter environment with Solr?

2012-09-12 Thread Stephanie Huynh
Does anyone want me to send them a white paper on having a
multi-datacenter environment with Solr?

Best,
Stephanie


Re: Can solr return matched fields?

2012-09-12 Thread Jack Krupansky
But presumably "matched" fields relates to indexed fields, which might not 
have stored values.


-- Jack Krupansky

-Original Message- 
From: Casey Callendrello

Sent: Wednesday, September 12, 2012 6:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Can solr return matched fields?



Re: How to post atomic updates using xml

2012-09-12 Thread jimtronic
Figured it out.

in JSON: 

 {"id"   : "book1",
  "author"   : {"set":"Neal Stephenson"}
 }

in XML:

book1

This seems to work.

Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-post-atomic-updates-using-xml-tp4007323p4007325.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to post atomic updates using xml

2012-09-12 Thread jimtronic
There's a good intro to atomic updates here:
http://yonik.com/solr/atomic-updates/ but it does not describe how to
structure the updates using xml.

Anyone have any idea on how these would look?

Thanks! Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-post-atomic-updates-using-xml-tp4007323.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can solr return matched fields?

2012-09-12 Thread Casey Callendrello
What about using the FastVectorHighlighter? It should get you what
you're looking for (fields with matches) without much of a query-time
performance impact.

--Casey


On 9/12/12 3:01 PM, Dan Foley wrote:
> is there a way for solr to tell me what fields the query matched,
> other then turning debug on?
>
> I'd like my application to take different actions based on what fields
> were matched.
>




signature.asc
Description: OpenPGP digital signature


Can solr return matched fields?

2012-09-12 Thread Dan Foley
is there a way for solr to tell me what fields the query matched,
other then turning debug on?

I'd like my application to take different actions based on what fields
were matched.

-- 
Dan Foley
Owner - PHP Web Developer
___
Micamedia.com - PHP Web Development


Re: Is it possible to do an "if" statement in a Solr query?

2012-09-12 Thread Jack Krupansky
You could implement a custom "search component" with that logic, if you 
don't mind the complexity of writing Java code that runs inside the Solr 
environment. Otherwise, just implement that logic in your app. Or, or 
implement an "app server" which sits between Solr and your app.


http://wiki.apache.org/solr/SearchComponent

-- Jack Krupansky

-Original Message- 
From: Gustav

Sent: Wednesday, September 12, 2012 5:09 PM
To: solr-user@lucene.apache.org
Subject: Is it possible to do an "if" statement in a Solr query?

Hello everyone, I'm working on an e-commerce website and using Solr as my
Search Engine, im really enjoying its funcionality and the search
options/performance.
But i am stucky in a kinda tricky cenario... That what happens:

I Have  a medicine web-store, where i indexed all necessary products in my
Index Solr.
But when i search for some medicine, following my business rules, i have to
verify if the result of my search contains any Original medicine, if there
is any, then i wouldn't show the generics of this respective medicine, on
the other hand, if there wasnt any original product in the result i would
have to return its generics.
Im currently returning the original and generics, is there a way to do this
kind of "checking" in solr?

Thanks! :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-do-an-if-statement-in-a-Solr-query-tp4007311.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Is it possible to do an "if" statement in a Solr query?

2012-09-12 Thread Walter Underwood
You may be able to do this with grouping. Group on the medicine "family", and 
only show the Original if there are multiple items in the family.

wunder

On Sep 12, 2012, at 2:09 PM, Gustav wrote:

> Hello everyone, I'm working on an e-commerce website and using Solr as my
> Search Engine, im really enjoying its funcionality and the search
> options/performance. 
> But i am stucky in a kinda tricky cenario... That what happens:
> 
> I Have  a medicine web-store, where i indexed all necessary products in my
> Index Solr. 
> But when i search for some medicine, following my business rules, i have to
> verify if the result of my search contains any Original medicine, if there
> is any, then i wouldn't show the generics of this respective medicine, on
> the other hand, if there wasnt any original product in the result i would
> have to return its generics.
> Im currently returning the original and generics, is there a way to do this
> kind of "checking" in solr?
> 
> Thanks! :)
> 






Is it possible to do an "if" statement in a Solr query?

2012-09-12 Thread Gustav
Hello everyone, I'm working on an e-commerce website and using Solr as my
Search Engine, im really enjoying its funcionality and the search
options/performance. 
But i am stucky in a kinda tricky cenario... That what happens:

I Have  a medicine web-store, where i indexed all necessary products in my
Index Solr. 
But when i search for some medicine, following my business rules, i have to
verify if the result of my search contains any Original medicine, if there
is any, then i wouldn't show the generics of this respective medicine, on
the other hand, if there wasnt any original product in the result i would
have to return its generics.
Im currently returning the original and generics, is there a way to do this
kind of "checking" in solr?

Thanks! :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-do-an-if-statement-in-a-Solr-query-tp4007311.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TikaException: Unsupported AutoCAD drawing version

2012-09-12 Thread Ahmet Arslan
> I am indexing data with Solr Cell,
> using mainly the code from here: 
> http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
> 
> But in my Solr server i got the TikaException followed by a
> solrexception 
> in my solrj programm.
> 
> Is there a way to suppress this and similar exceptions
> directly in the 
> Server?

Taken from : 
http://search-lucene.com/m/ZOs8xGNL6j2/TikaException+ignore&subj=ignoreTikaException+value

  
 
 :
   true
 :
 
   


Hey solr-user MODERATOR (was: Re: failure notice from zju.edu.cn)

2012-09-12 Thread Otis Gospodnetic
Same here.  Changed subject to attract more attention.

Otis

On Wed, Sep 12, 2012 at 1:34 PM, Steven A Rowe  wrote:
> I get the same thing, after nearly every email I send directly to the 
> lucene/solr lists (as opposed to auto-sent JIRA posts).
>
> I don't think it delays my messages though.
>
> Steve
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Wednesday, September 12, 2012 1:24 PM
> To: solr-user@lucene.apache.org
> Subject: failure notice from zju.edu.cn
>
> Hello All,
>
>
> Sometimes (in a random manner) I get the following when I reply a post :
>
> "Hi. This is the deliver program at zju.edu.cn.
> I'm afraid I wasn't able to deliver your message to the following addresses.
> This is a permanent error; I've given up. Sorry it didn't work out.
>
> new...@zju.edu.cn
> reject mail "
>
> David asked this question before : http://search-lucene.com/m/mlfOKh7WXn/
> But I always use plain text e-mails. Can anybody explain what this 
> mailer-dae...@zju.edu.cn or new...@zju.edu.cn thing is? Are they subscribers 
> of solr-user Mailing List? How can I prevent this? This seems delaying my 
> mails appearing on ML.
>
> Thanks,
> Ahmet


Re: 3.6.1 - Suggester and spellcheker Implementation

2012-09-12 Thread Otis Gospodnetic
Hi Sujatha,

No, suggester and spellchecker are separate beasts.

Otis
-- 
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Wed, Sep 12, 2012 at 3:18 PM, Sujatha Arun  wrote:
> Hi ,
>
> If I am looking to implement Suggester Implementation with 3.6.1 ,I beleive
> this creates it own index , now If I want to also use the spellcheck  also
> ,would it be using the same index as suggester?
>
> Regards
> Sujatha


Re: Beginner questions

2012-09-12 Thread Alexandre Rafalovitch
I would start with version 4, hands down.

I started with Solr 4 alpha and has moved to beta. Final can't be too
far behind. So far, it has been extremely stable for me.

And unless you are going into production in a next week, it will
probably be final while you are learning.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Sep 12, 2012 at 1:20 PM, Ken Clarke  wrote:
> Hi Folks,
>
> I'm going to setup a SOLR search server for the first time.  Hope you 
> don't mind a few beginner questions.  Perhaps a quick summary of how I intend 
> to use it will help.
>
> The SOLR server will be installed on a single VPS host and bound to a 
> internal IP (192.168.?.?).  Search parameters will be received by a mod_perl 
> script which will handle input validation, SOLR query language generation, 
> submition to SOLR, SOLR response parsing and search request response.
>
> Should I go with Beta 4 or stable 3?
>
> Which servlet container would you suggest is the most efficient for my 
> implementation?
>
> I'm unclear if the JDK is required or I can just install a JRE.  I was 
> guessing that Oracle's Java SE 7u7 would probably be the best implementation, 
> yes/no?
>
> How relevant is the "Apache Solr 3 Enterprise Search Server" book to 
> working with version 4?  I couldn't find a list of differences anywhere.
>
> Apreesh!
>
>>> Ken Clarke
>>> Contract Web Programmer / E-commerce Technologist


Re: Beginner questions

2012-09-12 Thread Ahmet Arslan
>     Should I go with Beta 4 or stable 3?

I would use solr 4, since this is first time installation.

>     Which servlet container would you suggest is
> the most efficient for my implementation?

Folks use both jetty and tomcat.

>     I'm unclear if the JDK is required or I can
> just install a JRE.  I was guessing that Oracle's Java
> SE 7u7 would probably be the best implementation, yes/no?

README.txt says 
"Download the Java SE 6 JDK (Java Development Kit) 
 You will need the JDK installed, and the $JAVA_HOME/bin (Windows: 
%JAVA_HOME%\bin) folder included on your command path. To test this, issue a 
"java -version" command from your shell (command prompt) and verify that the 
Java version is 1.6 or later."
 
>     How relevant is the "Apache Solr 3 Enterprise
> Search Server" book to working with version 4?  I
> couldn't find a list of differences anywhere.

I suggest you to read this book (without worrying about solr4). It is easy to 
understand and it covers lots of things.


3.6.1 - Suggester and spellcheker Implementation

2012-09-12 Thread Sujatha Arun
Hi ,

If I am looking to implement Suggester Implementation with 3.6.1 ,I beleive
this creates it own index , now If I want to also use the spellcheck  also
,would it be using the same index as suggester?

Regards
Sujatha


Aw: Re: Cannot parse ":", using HTTP-URL as id

2012-09-12 Thread sysrq
my bad, using "term query parser" works, thanks ahmet.


> Gesendet: Mittwoch, 12. September 2012 um 19:40 Uhr
> Von: sy...@web.de
> An: solr-user@lucene.apache.org
> Betreff: Aw: Re: Cannot parse ":", using HTTP-URL as id
>
> > term query parser is your friend in this case. With this you don't need to 
> > escape anything.
> >   SolrQuery query = new SolrQuery();
> >   query.setQuery("{!term f=id}bar_http://bar.com/?doc=452";);
> 
> But how can I *store* a document with an URL as a field value ? E.g. 
> "domain_http://www.domain.com/?p=12345";
> The "term query parser" may be able to *retrieve* field values with an ":", 
> but my current problem is that I can't store value with ":" with *Solrj*, the 
> Java library to communicate with Solr.
> 
> > --- On Wed, 9/12/12, sy...@web.de  wrote:
> > 
> > > From: sy...@web.de 
> > > Subject: Cannot parse ":", using HTTP-URL as id
> > > To: solr-user@lucene.apache.org
> > > Date: Wednesday, September 12, 2012, 7:40 PM
> > > Hi,
> > > 
> > > I defined a field "id" in my schema.xml and use it as an
> > > :
> > >    > > stored="true" required="true" />
> > >   id
> > > 
> > > I want to store URLs with a prefix in this field to be sure
> > > that every id is unique among websites. For example:
> > >   domain_http://www.domain.com/?p=12345
> > >   foo_http://foo.com
> > >   bar_http://bar.com/?doc=452
> > > I wrote a Java app, which uses Solrj to communicate with a
> > > running Solr instance. Solr (or Solrj, not sure about this)
> > > complains that it can't parse ":":
> > >   Exception in thread "main"
> > > org.apache.solr.common.SolrException:
> > >  
> > > org.apache.lucene.queryparser.classic.ParseException:
> > >   Cannot parse 'id:domain_http://www.domain.com/?p=12345': Encountered " 
> > > ":" ":
> > > "" at line 1, column 14.
> > > 
> > > How should I handle characters like ":" to solve this
> > > problem?
> > > 
> > > I already tried to escape the ":" like this:
> > >   String id = "domain_http://www.domain.com/?p=12345".replaceAll(":",
> > > ":"));
> > >   ...
> > >   document.addField("id", id);
> > >   ...
> > > But then Solr (or Solrj) complains again:
> > >   Exception in thread "main"
> > > org.apache.solr.common.SolrException:
> > >  
> > > org.apache.lucene.queryparser.classic.ParseException:
> > >   Cannot parse
> > > 'id:domain_http\://www.domain.com/?p=12345': Lexical error
> > > at line 1, column 42.  Encountered:  after :
> > > "/?p=12345"
> > > I use 4 backslashes () for double-escape. The first
> > > escape is for Java itself, the second is for Solr to handle
> > > it (I guess).
> > > 
> > > So what is the correct or usual way to deal with special
> > > characters like ":" in Solr (or Solrj)? I don't know if Solr
> > > or Solrj is the problem, but I guess it is Solrj?
> > >
> > 
> 


Aw: Re: Cannot parse ":", using HTTP-URL as id

2012-09-12 Thread sysrq
> term query parser is your friend in this case. With this you don't need to 
> escape anything.
>   SolrQuery query = new SolrQuery();
>   query.setQuery("{!term f=id}bar_http://bar.com/?doc=452";);

But how can I *store* a document with an URL as a field value ? E.g. 
"domain_http://www.domain.com/?p=12345";
The "term query parser" may be able to *retrieve* field values with an ":", but 
my current problem is that I can't store value with ":" with *Solrj*, the Java 
library to communicate with Solr.

> --- On Wed, 9/12/12, sy...@web.de  wrote:
> 
> > From: sy...@web.de 
> > Subject: Cannot parse ":", using HTTP-URL as id
> > To: solr-user@lucene.apache.org
> > Date: Wednesday, September 12, 2012, 7:40 PM
> > Hi,
> > 
> > I defined a field "id" in my schema.xml and use it as an
> > :
> >    > stored="true" required="true" />
> >   id
> > 
> > I want to store URLs with a prefix in this field to be sure
> > that every id is unique among websites. For example:
> >   domain_http://www.domain.com/?p=12345
> >   foo_http://foo.com
> >   bar_http://bar.com/?doc=452
> > I wrote a Java app, which uses Solrj to communicate with a
> > running Solr instance. Solr (or Solrj, not sure about this)
> > complains that it can't parse ":":
> >   Exception in thread "main"
> > org.apache.solr.common.SolrException:
> >  
> > org.apache.lucene.queryparser.classic.ParseException:
> >   Cannot parse 'id:domain_http://www.domain.com/?p=12345': Encountered " 
> > ":" ":
> > "" at line 1, column 14.
> > 
> > How should I handle characters like ":" to solve this
> > problem?
> > 
> > I already tried to escape the ":" like this:
> >   String id = "domain_http://www.domain.com/?p=12345".replaceAll(":",
> > ":"));
> >   ...
> >   document.addField("id", id);
> >   ...
> > But then Solr (or Solrj) complains again:
> >   Exception in thread "main"
> > org.apache.solr.common.SolrException:
> >  
> > org.apache.lucene.queryparser.classic.ParseException:
> >   Cannot parse
> > 'id:domain_http\://www.domain.com/?p=12345': Lexical error
> > at line 1, column 42.  Encountered:  after :
> > "/?p=12345"
> > I use 4 backslashes () for double-escape. The first
> > escape is for Java itself, the second is for Solr to handle
> > it (I guess).
> > 
> > So what is the correct or usual way to deal with special
> > characters like ":" in Solr (or Solrj)? I don't know if Solr
> > or Solrj is the problem, but I guess it is Solrj?
> >
> 


RE: failure notice from zju.edu.cn

2012-09-12 Thread Steven A Rowe
I get the same thing, after nearly every email I send directly to the 
lucene/solr lists (as opposed to auto-sent JIRA posts).

I don't think it delays my messages though.

Steve

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Wednesday, September 12, 2012 1:24 PM
To: solr-user@lucene.apache.org
Subject: failure notice from zju.edu.cn

Hello All,


Sometimes (in a random manner) I get the following when I reply a post :

"Hi. This is the deliver program at zju.edu.cn.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

new...@zju.edu.cn
reject mail "

David asked this question before : http://search-lucene.com/m/mlfOKh7WXn/
But I always use plain text e-mails. Can anybody explain what this 
mailer-dae...@zju.edu.cn or new...@zju.edu.cn thing is? Are they subscribers of 
solr-user Mailing List? How can I prevent this? This seems delaying my mails 
appearing on ML.

Thanks,
Ahmet


Re: Count disctint groups in grouping distributed

2012-09-12 Thread Jason Rutherglen
Distinct in a distributed environment would require de-duplication
en-masse, use Hive or MapReduce instead.

On Wed, Sep 12, 2012 at 11:53 AM, yriveiro  wrote:
> Hi,
>
> Exists the possibility of do a distinct group count in a grouping done using
> a sharding schema?
>
> This issue https://issues.apache.org/jira/browse/SOLR-3436 make a fixe in
> the way to sum all groups returned in a distributed grouping operation, but
> not always we want the sum, in some cases is interesting have the distinct
> groups between shards.
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Count-disctint-groups-in-grouping-distributed-tp4007257.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: PrecedenceQueryParser usage

2012-09-12 Thread Maciej Pestka
Thank you!

It seems to me that I managed to get it work.
Just for future reference short I attach source code. The jar should be placed 
under core/lib folder:
Please let me know if you have any comments or if I got sth incorrect...

public class PrecedenceQParserPlugin extends QParserPlugin {
private static final Logger LOG = 
LoggerFactory.getLogger(PrecedenceQParserPlugin.class);

@Override
public void init(NamedList list) {
}

@Override
public QParser createParser(String qstr, SolrParams localParams, 
SolrParams params, SolrQueryRequest req) {
LOG.debug("creating new PrecedenceQParser:", new Object[] 
{qstr, localParams, params, req});
return new PrecedenceQParser(qstr, localParams, params, req);
}
}

class PrecedenceQParser extends QParser {
private static final Logger LOG = 
LoggerFactory.getLogger(PrecedenceQParser.class);

private final PrecedenceQueryParser parser;
public PrecedenceQParser(String qstr, SolrParams localParams, 
SolrParams params, SolrQueryRequest req) {
super(qstr, localParams, params, req);
this.parser = new PrecedenceQueryParser();
}

@Override
public Query parse() throws ParseException {
LOG.debug("parse(): ", qstr);
if (null==qstr) {
return null;
}
final String defaultField = 
QueryParsing.getDefaultField(getReq().getSchema(),getParam(CommonParams.DF));
try {
return parser.parse(qstr, defaultField);
} catch (QueryNodeException e) {
throw new ParseException(e.getMessage(), e);
}
}
}


Best Regards
Maciej Pestka


Dnia 10-09-2012 o godz. 17:46 Ahmet Arslan napisał(a):
> > In order for Solr to use this parser,
> > you'll need to wrap it with a QParser and QParserPlugin
> > implementations, then wire your implementation into
> > solrconfig.xml. 
> 
> SurroundQParserPlugin.java (api-4_0_0-BETA) can be an example of such 
> implementation.
> 
> http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/search/SurroundQParserPlugin.html





failure notice from zju.edu.cn

2012-09-12 Thread Ahmet Arslan
Hello All,


Sometimes (in a random manner) I get the following when I reply a post :

"Hi. This is the deliver program at zju.edu.cn.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

new...@zju.edu.cn
reject mail "

David asked this question before : http://search-lucene.com/m/mlfOKh7WXn/
But I always use plain text e-mails. Can anybody explain what this 
mailer-dae...@zju.edu.cn or new...@zju.edu.cn thing is? Are they subscribers of 
solr-user Mailing List? How can I prevent this? This seems delaying my mails 
appearing on ML.

Thanks,
Ahmet


Beginner questions

2012-09-12 Thread Ken Clarke
Hi Folks,

I'm going to setup a SOLR search server for the first time.  Hope you don't 
mind a few beginner questions.  Perhaps a quick summary of how I intend to use 
it will help.

The SOLR server will be installed on a single VPS host and bound to a 
internal IP (192.168.?.?).  Search parameters will be received by a mod_perl 
script which will handle input validation, SOLR query language generation, 
submition to SOLR, SOLR response parsing and search request response.

Should I go with Beta 4 or stable 3?

Which servlet container would you suggest is the most efficient for my 
implementation?

I'm unclear if the JDK is required or I can just install a JRE.  I was 
guessing that Oracle's Java SE 7u7 would probably be the best implementation, 
yes/no?

How relevant is the "Apache Solr 3 Enterprise Search Server" book to 
working with version 4?  I couldn't find a list of differences anywhere.

Apreesh!
  
>> Ken Clarke
>> Contract Web Programmer / E-commerce Technologist


Unable to implememnt SolrNet Authentication.

2012-09-12 Thread Suneel Pandey
Hello,

I am working on solr authentication with the help of solrnet dll and
windsolr container getting some issue. Please suggest me and provide me some
link this will be very helpful for me.



-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-implememnt-SolrNet-Authentication-tp4007259.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Cannot parse ":", using HTTP-URL as id

2012-09-12 Thread Ahmet Arslan

Hello,

term query parser is your friend in this case. With this you don't need to 
escape anything.

SolrQuery query = new SolrQuery();

query.setQuery("{!term f=id}bar_http://bar.com/?doc=452";);

--- On Wed, 9/12/12, sy...@web.de  wrote:

> From: sy...@web.de 
> Subject: Cannot parse ":", using HTTP-URL as id
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 12, 2012, 7:40 PM
> Hi,
> 
> I defined a field "id" in my schema.xml and use it as an
> :
>    stored="true" required="true" />
>   id
> 
> I want to store URLs with a prefix in this field to be sure
> that every id is unique among websites. For example:
>   domain_http://www.domain.com/?p=12345
>   foo_http://foo.com
>   bar_http://bar.com/?doc=452
> I wrote a Java app, which uses Solrj to communicate with a
> running Solr instance. Solr (or Solrj, not sure about this)
> complains that it can't parse ":":
>   Exception in thread "main"
> org.apache.solr.common.SolrException:
>  
> org.apache.lucene.queryparser.classic.ParseException:
>   Cannot parse 'id:domain_http://www.domain.com/?p=12345': Encountered " ":" 
> ":
> "" at line 1, column 14.
> 
> How should I handle characters like ":" to solve this
> problem?
> 
> I already tried to escape the ":" like this:
>   String id = "domain_http://www.domain.com/?p=12345".replaceAll(":",
> ":"));
>   ...
>   document.addField("id", id);
>   ...
> But then Solr (or Solrj) complains again:
>   Exception in thread "main"
> org.apache.solr.common.SolrException:
>  
> org.apache.lucene.queryparser.classic.ParseException:
>   Cannot parse
> 'id:domain_http\://www.domain.com/?p=12345': Lexical error
> at line 1, column 42.  Encountered:  after :
> "/?p=12345"
> I use 4 backslashes () for double-escape. The first
> escape is for Java itself, the second is for Solr to handle
> it (I guess).
> 
> So what is the correct or usual way to deal with special
> characters like ":" in Solr (or Solrj)? I don't know if Solr
> or Solrj is the problem, but I guess it is Solrj?
>


Count disctint groups in grouping distributed

2012-09-12 Thread yriveiro
Hi, 

Exists the possibility of do a distinct group count in a grouping done using
a sharding schema?

This issue https://issues.apache.org/jira/browse/SOLR-3436 make a fixe in
the way to sum all groups returned in a distributed grouping operation, but
not always we want the sum, in some cases is interesting have the distinct
groups between shards.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Count-disctint-groups-in-grouping-distributed-tp4007257.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Doubts in PathHierarchyTokenizer

2012-09-12 Thread Koji Sekiguchi

Use delimiter option instead of pattern for PathHierarchyTokenizerFactory:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory

koji
--
http://soleami.com/blog/starting-lab-work.html

(12/09/12 22:22), mechravi25 wrote:

Hi,

Im Using Solr 3.6.1 version and I have a field which is having values like

A|B|C
B|C|D|EE
A|C|B
A|B|D
..etc..

So, When I search for "A|B", I should get documents starting with
"A" and "A|B"

To implement this, I've used PathHierarchyTokenizer for the above field as



  

  

 



But, When I use the solr analysis page to check if its being split on the
pipe symbol ("|") on indexing, I see that its being taken as the entire
token and its not getting split on the delimiter (i.e. the searching is done
only for "A|B" in the above case)

I also tried using "\|" as the delimiter but also its not working.

Am I missing anything here? Or Will the Path Hierarchy not accept pipe
symbol ("|") as delimiter?
Can anyone guide me on this?

Thanks a lot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doubts-in-PathHierarchyTokenizer-tp4007216.html
Sent from the Solr - User mailing list archive at Nabble.com.







Cannot parse ":", using HTTP-URL as id

2012-09-12 Thread sysrq
Hi,

I defined a field "id" in my schema.xml and use it as an :
  
  id

I want to store URLs with a prefix in this field to be sure that every id is 
unique among websites. For example:
  domain_http://www.domain.com/?p=12345
  foo_http://foo.com
  bar_http://bar.com/?doc=452
I wrote a Java app, which uses Solrj to communicate with a running Solr 
instance. Solr (or Solrj, not sure about this) complains that it can't parse 
":":
  Exception in thread "main" org.apache.solr.common.SolrException:
  org.apache.lucene.queryparser.classic.ParseException:
  Cannot parse 'id:domain_http://www.domain.com/?p=12345': Encountered " ":" ": 
"" at line 1, column 14.

How should I handle characters like ":" to solve this problem?

I already tried to escape the ":" like this:
  String id = "domain_http://www.domain.com/?p=12345".replaceAll(":", ":"));
  ...
  document.addField("id", id);
  ...
But then Solr (or Solrj) complains again:
  Exception in thread "main" org.apache.solr.common.SolrException:
  org.apache.lucene.queryparser.classic.ParseException:
  Cannot parse 'id:domain_http\://www.domain.com/?p=12345': Lexical error at 
line 1, column 42.  Encountered:  after : "/?p=12345"
I use 4 backslashes () for double-escape. The first escape is for Java 
itself, the second is for Solr to handle it (I guess).

So what is the correct or usual way to deal with special characters like ":" in 
Solr (or Solrj)? I don't know if Solr or Solrj is the problem, but I guess it 
is Solrj?


Authentication Not working in solrnet getting 401 error

2012-09-12 Thread Suneel Pandey
Hi,

I am trying to connect with authenticated solr instance. I have added latest
solrnet .dll  but getting authentication issue. Please Suggest me where i
did wrong.

ISolrOperations oSolrOperations = null;
const string core0url = "http://localhost:8080/solr/products";;
const string core1url = "http://localhost:8080/solr/products";;
var solrFacility = new SolrNetFacility(core0url);
var container = new WindsorContainer();
container.AddFacility("solr", solrFacility);
BasicAuthHttpWebRequestFactory OAuth = new
BasicAuthHttpWebRequestFactory("djsrNPvHsUnBSETg", "x");
// override core1 components
const string core1Connection = "core1.connection";
   
container.Register(Component.For().ImplementedBy().Named(core1Connection).Parameters(Castle.MicroKernel.Registration.Parameter.ForKey("serverURL").Eq(core1url)));

   
container.Register(Component.For(typeof(ISolrBasicOperations),
typeof(ISolrBasicReadOnlyOperations))
  
.ImplementedBy>()
  
.ServiceOverrides(ServiceOverride.ForKey("connection").Eq(core1Connection)));

   
container.Register(Component.For(typeof(ISolrOperations),
typeof(ISolrReadOnlyOperations))
  
.ImplementedBy>()
  
.ServiceOverrides(ServiceOverride.ForKey("connection").Eq(core1Connection)));

   
container.Register(Component.For>().ImplementedBy>()
  
.ServiceOverrides(ServiceOverride.ForKey("connection").Eq(core1Connection)));

//Authentication//
   
container.Register(Component.For().ImplementedBy().ServiceOverrides(ServiceOverride.ForKey("connection").Eq("")));

oSolrOperations =
container.Resolve>();
oSolrOperations.Ping();



-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Authentication-Not-working-in-solrnet-getting-401-error-tp4007254.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [Solr4 beta] error 503 on commit

2012-09-12 Thread Radim Kolar



could not be solr able to close oldest warming searcher and replace it by
new one?

That approach can easily lead to starvation (i.e. you never get a new
searcher usable for queries).

It will not. If there is more then 1 warming searcher. Look at this schema:

1. current in use searcher
2. 1st warming searcher
3. 2nd warming searcher

if new warming searcher is needed, close (3) and create a new one (3).
(2) will finish work uninterrupted and it will replace (1)


Re: Solr unique key can't be blank

2012-09-12 Thread Ahmet Arslan
> Thank you Ahmet! In fact, I did not know that the
> updateRequestProcessorChain needed to be defined in
> solrconfig.xml and
> I had tried to define it in schema.xml. I don't have access
> to
> solrconfig.xml (I am using Websolr) but I will contact them
> about
> adding it.

Please not that you need to reference it to UpdateRequestHander that you are 
using. (this can be extracting, dataimport etc)

  


   
 uuid
  
  



Re: Solr unique key can't be blank

2012-09-12 Thread Jack Krupansky
The UniqueKey wiki was recently updated to indicate this new Solr 4.0 
requirement:


http://wiki.apache.org/solr/UniqueKey

"in Solr 4, this field must be populated via 
solr.UUIDUpdateProcessorFactory"


The changes you were given are contained on that updated wiki page.

-- Jack Krupansky

-Original Message- 
From: Dotan Cohen

Sent: Wednesday, September 12, 2012 10:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr unique key can't be blank

On Wed, Sep 12, 2012 at 5:27 PM, Ahmet Arslan  wrote:

Hi Dotan,

Did you define the following update processor chain in solrconfig.xml ?
And did you reference it in an update handler?



  id






Thank you Ahmet! In fact, I did not know that the
updateRequestProcessorChain needed to be defined in solrconfig.xml and
I had tried to define it in schema.xml. I don't have access to
solrconfig.xml (I am using Websolr) but I will contact them about
adding it.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com 



Re: SolrCloud and Optimize

2012-09-12 Thread Walter Underwood
Do not run optimize. It is not necessary. Solr continually optimizes in the 
background. 

wunder

On Sep 11, 2012, at 11:15 PM, Nikhil Chhaochharia wrote:

> Hi,
> 
> I am using a recent nightly of Solr 4 and have setup a simple SolrCloud 
> cluster of 2 shards without any replicas.  If I send the 'optimize' command, 
> then it is executed on the shards one-by-one instead of in parallel.
> 
> Is this by design?How can I run optimize in parallel on all the shards?
> 
> Thanks,
> Nikhil
> 







Re: Solr unique key can't be blank

2012-09-12 Thread Dotan Cohen
On Wed, Sep 12, 2012 at 5:27 PM, Ahmet Arslan  wrote:
> Hi Dotan,
>
> Did you define the following update processor chain in solrconfig.xml ?
> And did you reference it in an update handler?
>
> 
> 
>   id
> 
> 
> 
>

Thank you Ahmet! In fact, I did not know that the
updateRequestProcessorChain needed to be defined in solrconfig.xml and
I had tried to define it in schema.xml. I don't have access to
solrconfig.xml (I am using Websolr) but I will contact them about
adding it.

Thank you.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Semantic document format... standards?

2012-09-12 Thread Michael Della Bitta
Actually at my company, we do a lot of NLP work and we've ended up
using bespoke formats, formerly a FeatureStructure serialized to JSON,
but most recently in protobufs. Possibly not the answer you were
looking for, Otis, but at least it's a datapoint.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Sep 12, 2012 at 7:36 AM, Alexandre Rafalovitch
 wrote:
> Otis,
>
> If you are doing Named Entity Recognition, you may want to look at the
> research area concerned with Named Entity Recognition. :-) In general,
> there is inline markup and standoff markup. You seem to be going for
> standoff/stand-alone markup. I am not clear though whether it is just
> 'discovery' format or actual annotation format (with reference to
> where in the sentence it is with offsets or token ids).
>
> UIMA (which Solr integrate with already, right?), does NER so it must
> be using some sort of format.
>
> Also, TREC is one of the competitions and they provide marked-up
> datasets you might be able to learn something from:
> http://ilps.science.uva.nl/trec-entity/
>
> If you are not sure where to start with NER, you can look at my
> collection of papers, though most of them are probably too specific:
> http://www.citeulike.org/user/arafalov
>
> Finally,  if you have to deal with overlapping entities, there was an
> article about a month about some sort of general format. I can't seem
> to find the article right now, but I could try digging if you are
> still stuck.
>
> Regards,
> Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic
>  wrote:
>> Hello,
>>
>> If I'm extracting named entities, topics, key phrases/tags, etc. from 
>> documents and I want to have a representation of this document, what format 
>> should I use? Are there any standard or at least common formats or 
>> approaches people use in such situations?
>>
>> For example, the most straight forward format might be something like this:
>>
>>
>> 
>>   doc title
>>   meta keywords coming from the web page
>>   page meat
>>   name entities recognized in the document
>>   topics extracted by the annotator
>>   tags extracted by the annotator
>>   relations extracted by the annotator
>> 
>>
>> But this is a made up format - the XML tags above are just what somebody 
>> happened to pick.
>>
>> Are there any standard or at least common formats for this?
>>
>>
>> Thanks,
>> Otis
>> 
>> Performance Monitoring - Solr - ElasticSearch - HBase - 
>> http://sematext.com/spm
>>
>> Search Analytics - http://sematext.com/search-analytics/index.html


Re: Solr 4.0 Beta Release

2012-09-12 Thread Jack Krupansky
Yes, it has been released. Read the details here (including download 
instructions/links):

http://lucene.apache.org/solr/solrnews.html

-- Jack Krupansky

-Original Message- 
From: samarth s

Sent: Wednesday, September 12, 2012 9:54 AM
To: solr-user@lucene.apache.org
Subject: Solr 4.0 Beta Release

Hi All,

Would just like to verify if Solr 4.0 Beta has been released. Does the
following url give the official beta release:
http://www.apache.org/dyn/closer.cgi/lucene/solr/4.0.0-BETA

--
Regards,
Samarth 



Re: Solr unique key can't be blank

2012-09-12 Thread Ahmet Arslan


--- On Wed, 9/12/12, Dotan Cohen  wrote:

> From: Dotan Cohen 
> Subject: Solr unique key can't be blank
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 12, 2012, 5:06 PM
> Consider this simple schema:
> 
> 
> 
>     
>          name="uuid" class="solr.UUIDField" indexed="true" />
>     
>     
>          type="uuid" indexed="true" stored="true"
> required="true"/>
>     
> 
> 
> When trying to upload it to Websolr I am getting this
> error:
> Solr unique key can't be blank
> 
> I also tried adding this element to the XML, after
> :
> id
> 
> However this did not help. What could be the issue? I The
> code is
> taken verbatim from this page:
> http://wiki.apache.org/solr/UniqueKey
> 
> Note that this is on a Solr 4 Alpha index. Thanks.

Hi Dotan,

Did you define the following update processor chain in solrconfig.xml ?
And did you reference it in an update handler?
 


  id






Re: Retrieval of large number of documents

2012-09-12 Thread Alexandre Rafalovitch
Have you tried asking for CSV as an output format? Then, you don't
have any XML wrappers and you will get your IDs one per line. I tried
it with returning about 40 rows and it was just fine.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Sep 12, 2012 at 9:52 AM, Paul Libbrecht  wrote:
> Isn't XSLT the bottleneck here?
> I have not yet met an incremental XSLT processor, although I heard XSLT 1 
> claimed it could be done in principle.
>
> If you start to do this kind of processing, I think you have no other choice 
> than write your own output method.
>
> Paul
>
>
> Le 12 sept. 2012 à 15:47, Rohit Harchandani a écrit :
>
>> Hi all,
>> I have a solr index with 5,000,000 documents and my index size is 38GB. But
>> when I query for about 400,000 documents based on certain criteria, solr
>> searches it really quickly but does not return data for close to 2 minutes.
>> The unique key field is the only field i am requesting for. Also, I apply
>> an xslt transformation to the response to get a comma separated list of
>> unique keys. Is there a way to improve this speed?? Would sharding help in
>> this case?
>> I am currently using solr 4.0 beta in my application.
>> Thanks,
>> Rohit
>


Solr 4.0 Beta Release

2012-09-12 Thread samarth s
Hi All,

Would just like to verify if Solr 4.0 Beta has been released. Does the
following url give the official beta release:
http://www.apache.org/dyn/closer.cgi/lucene/solr/4.0.0-BETA

-- 
Regards,
Samarth


Re: Retrieval of large number of documents

2012-09-12 Thread Paul Libbrecht
Isn't XSLT the bottleneck here?
I have not yet met an incremental XSLT processor, although I heard XSLT 1 
claimed it could be done in principle.

If you start to do this kind of processing, I think you have no other choice 
than write your own output method.

Paul


Le 12 sept. 2012 à 15:47, Rohit Harchandani a écrit :

> Hi all,
> I have a solr index with 5,000,000 documents and my index size is 38GB. But
> when I query for about 400,000 documents based on certain criteria, solr
> searches it really quickly but does not return data for close to 2 minutes.
> The unique key field is the only field i am requesting for. Also, I apply
> an xslt transformation to the response to get a comma separated list of
> unique keys. Is there a way to improve this speed?? Would sharding help in
> this case?
> I am currently using solr 4.0 beta in my application.
> Thanks,
> Rohit



Retrieval of large number of documents

2012-09-12 Thread Rohit Harchandani
Hi all,
I have a solr index with 5,000,000 documents and my index size is 38GB. But
when I query for about 400,000 documents based on certain criteria, solr
searches it really quickly but does not return data for close to 2 minutes.
The unique key field is the only field i am requesting for. Also, I apply
an xslt transformation to the response to get a comma separated list of
unique keys. Is there a way to improve this speed?? Would sharding help in
this case?
I am currently using solr 4.0 beta in my application.
Thanks,
Rohit


Doubts in PathHierarchyTokenizer

2012-09-12 Thread mechravi25
Hi,

Im Using Solr 3.6.1 version and I have a field which is having values like

A|B|C
B|C|D|EE
A|C|B 
A|B|D
..etc..

So, When I search for "A|B", I should get documents starting with 
"A" and "A|B"

To implement this, I've used PathHierarchyTokenizer for the above field as



   

 





But, When I use the solr analysis page to check if its being split on the
pipe symbol ("|") on indexing, I see that its being taken as the entire
token and its not getting split on the delimiter (i.e. the searching is done
only for "A|B" in the above case)

I also tried using "\|" as the delimiter but also its not working. 

Am I missing anything here? Or Will the Path Hierarchy not accept pipe
symbol ("|") as delimiter?
Can anyone guide me on this?

Thanks a lot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doubts-in-PathHierarchyTokenizer-tp4007216.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud fail over

2012-09-12 Thread Mark Miller
Either setup a load balancer, or use the SolrCloud solrj client
CloudSolrServer - it takes a comma separated list of zk servers rather
than a solr url.

On Tue, Sep 11, 2012 at 10:17 PM, andy  wrote:
> I know fail over is available in solr4.0 right now, if one server
> crashes,other servers also support query,I set up a solr cloud like this
> http://lucene.472066.n3.nabble.com/file/n4007117/Selection_028.png
>
> I use http://localhost:8983/solr/collection1/select?q=*%3A*&wt=xml for query
> at first, if the node  8983 crashes, I have to access other nodes for query
> like http://localhost:8900/solr/collection1/select?q=*%3A*&wt=xml
>
> but I use the nodes url in the solrj, how to change the request url
> dynamically?
> does SolrCloud support something like virtual ip address? for example I use
> url http://collections1 in the solrj, and forward the request to available
> url automatically.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-fail-over-tp4007117.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
- Mark


Re: Partial search

2012-09-12 Thread Jack Krupansky
Add &debugQuery=true to your query request and look at the "explain" 
section. The scores will indicate why a document ranks as it does.


When you say that your query was "Energy Field", was that a quoted phrase or 
just two keywords? I assume the latter. I also assume that you were using 
the "OR" operator as default (not "AND"). Is that the case? Are you 
filtering out stop words at index time?


I tried your three test docs on the Solr 4.0-BETA example schema (putting 
the doc text in the "features_en" dynamic field) and your query actually 
reorders the three docs as expected, doc3, doc2, doc1.


What release of Solr are you using?

There is probably additional info you are not telling us. See if you can 
reproduce the scenario using only the stock Solr example schema. And if you 
have to make changes, tell us what they are.


-- Jack Krupansky

-Original Message- 
From: Mani

Sent: Tuesday, September 11, 2012 8:29 PM
To: solr-user@lucene.apache.org
Subject: Partial search

I have three documents with the following search field (text_en type) 
values.


When I search for "Energy Field", I am getting the document in this order
presented. However if you look at the match, I would expect the Doc3 should
come first and Doc1 should be the last.


Doc1 : Automic Energy and Peace
Doc2 : Energy One Energy Two Energy Three Energy Four
Doc3 : Mathematic Field Energy Field

What is the best way to configure my search to accomodate as many terms
match as possible?







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-search-tp4007097.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: suggester issues

2012-09-12 Thread aniljayanti
Hi,

 I m also facing same issue while using suggester (working in c#.net). 
Below is my configurations.

suggest/?q="michael ja"
---


  
  
  
  
   
   
  
  
   
  







Response :

 
- 
- 
  0 
  1 
  
   
- 
- 
- 
  10 
  1 
  8 
- 
  michael "bully" herbig 
  michael bolton 
  michael bolton: arias 
  michael falch 
  michael holm 
  michael jackson 
  michael neale 
  michael penn 
  michael salgado 
  michael w. smith 
  
  
- 
  10 
  9 
  11 
- 
  ja me tanssimme 
  jacob andersen 
  jacob haugaard 
  jagged edge 
  jaguares 
  jamiroquai 
  jamppa tuominen 
  jane olivor 
  janis joplin 
  janne tulkki 
  
  
  "michael "bully" herbig ja me tanssimme" 
  
  
  

Please Help,

AnilHayanti 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p4007205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Semantic document format... standards?

2012-09-12 Thread Alexandre Rafalovitch
Otis,

If you are doing Named Entity Recognition, you may want to look at the
research area concerned with Named Entity Recognition. :-) In general,
there is inline markup and standoff markup. You seem to be going for
standoff/stand-alone markup. I am not clear though whether it is just
'discovery' format or actual annotation format (with reference to
where in the sentence it is with offsets or token ids).

UIMA (which Solr integrate with already, right?), does NER so it must
be using some sort of format.

Also, TREC is one of the competitions and they provide marked-up
datasets you might be able to learn something from:
http://ilps.science.uva.nl/trec-entity/

If you are not sure where to start with NER, you can look at my
collection of papers, though most of them are probably too specific:
http://www.citeulike.org/user/arafalov

Finally,  if you have to deal with overlapping entities, there was an
article about a month about some sort of general format. I can't seem
to find the article right now, but I could try digging if you are
still stuck.

Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic
 wrote:
> Hello,
>
> If I'm extracting named entities, topics, key phrases/tags, etc. from 
> documents and I want to have a representation of this document, what format 
> should I use? Are there any standard or at least common formats or approaches 
> people use in such situations?
>
> For example, the most straight forward format might be something like this:
>
>
> 
>   doc title
>   meta keywords coming from the web page
>   page meat
>   name entities recognized in the document
>   topics extracted by the annotator
>   tags extracted by the annotator
>   relations extracted by the annotator
> 
>
> But this is a made up format - the XML tags above are just what somebody 
> happened to pick.
>
> Are there any standard or at least common formats for this?
>
>
> Thanks,
> Otis
> 
> Performance Monitoring - Solr - ElasticSearch - HBase - 
> http://sematext.com/spm
>
> Search Analytics - http://sematext.com/search-analytics/index.html


Re: [Solr4 beta] error 503 on commit

2012-09-12 Thread Yonik Seeley
On Tue, Sep 11, 2012 at 10:52 AM, Radim Kolar  wrote:
>> After investigating more, here is the tomcat log herebelow. It is indeed
>> the same problem: "exceeded limit of maxWarmingSearchers=2,".
>
> could not be solr able to close oldest warming searcher and replace it by
> new one?

That approach can easily lead to starvation (i.e. you never get a new
searcher usable for queries).

-Yonik
http://lucidworks.com


Re: [Solr4 beta] error 503 on commit

2012-09-12 Thread Radim Kolar
> After investigating more, here is the tomcat log herebelow. It is 
indeed the same problem: "exceeded limit of maxWarmingSearchers=2,".


could not be solr able to close oldest warming searcher and replace it 
by new one?