Re: storing key,value pair in Solr document

2009-08-10 Thread Avlesh Singh
You can have a dynamicField in your schema called entity_* and map it to
the your corresponding data structure in this way:
@Field (entity_*)
MapString,String entity;
The key would be your fieldName (other than the entity_).

SOLR-1129 https://issues.apache.org/jira/browse/SOLR-1129 will give you
more insights.

Cheers
Avlesh

On Mon, Aug 10, 2009 at 10:58 AM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi,
 I have a Entitiy and a Value associated with it. I want to store this value
 as a key,value pair in Solr.
 I have a Java Object which I am mapping to Solr Doc using
 org.apache.solr.client.solrj.beans.Field . Can I also store a Map? and how
 can I do so?
 This is how I want it to be done:
 @Field
  MapString,String entity;



Re: storing key,value pair in Solr document

2009-08-10 Thread Ninad Raut
Hi Avlesh,
Can we use SimpleOrderedMap? It seems deprecated. Is it safe to use , and
how is going to be mapped to the field?
@Field(ne)
  SimpleOrderedMapString ne = new SimpleOrderedMapString();
wont work right??

On Mon, Aug 10, 2009 at 11:36 AM, Avlesh Singh avl...@gmail.com wrote:

 You can have a dynamicField in your schema called entity_* and map it to
 the your corresponding data structure in this way:
 @Field (entity_*)
 MapString,String entity;
 The key would be your fieldName (other than the entity_).

 SOLR-1129 https://issues.apache.org/jira/browse/SOLR-1129 will give you
 more insights.

 Cheers
 Avlesh

 On Mon, Aug 10, 2009 at 10:58 AM, Ninad Raut hbase.user.ni...@gmail.com
 wrote:

  Hi,
  I have a Entitiy and a Value associated with it. I want to store this
 value
  as a key,value pair in Solr.
  I have a Java Object which I am mapping to Solr Doc using
  org.apache.solr.client.solrj.beans.Field . Can I also store a Map? and
 how
  can I do so?
  This is how I want it to be done:
  @Field
   MapString,String entity;
 



Re: Embedded Solr Clustering

2009-08-10 Thread Shalin Shekhar Mangar
On Mon, Aug 10, 2009 at 3:57 AM, born2fish tswan...@yahoo.com wrote:


 Hi everyone,

 We have a web app that uses embedded solr for better performance.


I would advise against it. We use Solr on sites with millions of page views
a month on HTTP. With HTTP keep-alives, the overhead of an http request is
minimal as compared to the actual search. You have the advantages of
replication as well as the option of adding a http cache in front of Solr.

 Is there a performance problem you're trying to solve by using embedded
solr?


 Now we are
 trying to deploy the app to a clustered environment. My question is:

 1. Can we configure the embedded solr instances to share the same index on
 the network?


Yes. It may be very slow though. Best to benchmark it before going to
production. Also, you'll need to make sure that only one Solr instance is
writing to the index at one time. It is better to have separate indexes.



 2. If the answer to question 1 is no, can we configure embedded solr
 instances to replicate indexes in a master / slave fashion just like normal
 web based Solr?


Yes you can use the script based replication. You'd need to expose a way to
call commit on your application if you use embedded solr.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Guide to using SolrQuery object

2009-08-10 Thread Aleksander M. Stensby
You'll find the available parameters in various interfaces in the package  
org.apache.solr.common.params.*


For instance:
import org.apache.solr.common.params.FacetParams;
import org.apache.solr.common.params.ShardParams;
import org.apache.solr.common.params.TermVectorParams;

As a side note to what Shalin said, SolrQuery extends ModifiableSolrParams  
(just so that you are aware of that).

Hope that helps a bit.

Cheers,
 Aleks

On Tue, 14 Jul 2009 16:27:50 +0200, Reuben Firmin reub...@benetech.org  
wrote:


Also, are there enums or constants around the various param names that  
can

be passed in, or do people tend to define those themselves?
Thanks!
Reuben




--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.com
http://twitter.com/Integrasco

Please consider the environment before printing all or any of this e-mail


Re: storing key,value pair in Solr document

2009-08-10 Thread Avlesh Singh

 Can we use SimpleOrderedMap?

No, Ninad that wouldn't work.

Cheers
Avlesh

On Mon, Aug 10, 2009 at 11:46 AM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi Avlesh,
 Can we use SimpleOrderedMap? It seems deprecated. Is it safe to use , and
 how is going to be mapped to the field?
 @Field(ne)
  SimpleOrderedMapString ne = new SimpleOrderedMapString();
 wont work right??

 On Mon, Aug 10, 2009 at 11:36 AM, Avlesh Singh avl...@gmail.com wrote:

  You can have a dynamicField in your schema called entity_* and map it
 to
  the your corresponding data structure in this way:
  @Field (entity_*)
  MapString,String entity;
  The key would be your fieldName (other than the entity_).
 
  SOLR-1129 https://issues.apache.org/jira/browse/SOLR-1129 will give
 you
  more insights.
 
  Cheers
  Avlesh
 
  On Mon, Aug 10, 2009 at 10:58 AM, Ninad Raut hbase.user.ni...@gmail.com
  wrote:
 
   Hi,
   I have a Entitiy and a Value associated with it. I want to store this
  value
   as a key,value pair in Solr.
   I have a Java Object which I am mapping to Solr Doc using
   org.apache.solr.client.solrj.beans.Field . Can I also store a Map? and
  how
   can I do so?
   This is how I want it to be done:
   @Field
MapString,String entity;
  
 



Multiple Unique Ids

2009-08-10 Thread Ninad Raut
Hi,
I have two Ids DocumentId and AuthorId. I want both of them unique. Can i
have two uniqueKey in my document?
 uniqueKeyid/uniqueKey
 uniqueKeyauthorId/uniqueKey

Regards,
Ninad Raut


AW: mergeContiguous for multiple search terms

2009-08-10 Thread Hachmann, Bjoern
Hallo,

we are using Solr-1.3.

Thanks for your time.
Björn

 

 -Ursprüngliche Nachricht-
 Von: 
 solr-user-return-24991-hachmann.bjoern=guj...@lucene.apache.or
 g 
 [mailto:solr-user-return-24991-hachmann.bjoern=guj...@lucene.a
 pache.org] Im Auftrag von Avlesh Singh
 Gesendet: Montag, 10. August 2009 04:01
 An: solr-user@lucene.apache.org
 Betreff: Re: mergeContiguous for multiple search terms
 
 Which Solr version are you using?
 
 Cheers
 Avlesh
 
 On Wed, Aug 5, 2009 at 5:55 PM, Hachmann, Bjoern 
 hachmann.bjo...@guj.dewrote:
 
  Hello,
 
  we would like to use the highlightingComponent with the 
  mergeContiguous parameter set to true.
 
  We have a field with value: Ökonom Charles Goodhart.
 
  If we search for all three words, they are found correctly: 
  emÖkonom/em emCharles/em emGoodhart/em
 
  But, as I set the mergeContiguous parameter to true, I expected: 
  emÖkonom Charles Goodhart/em. Am I misunderstanding the 
 behaviour 
  of this parameter? We are using the dismax-query parser and 
 solr-1.3.
 
  Thank you very much for your time.
  Björn Hachmann
 
 
 
 
 


Pojo not getting added to Solr Index

2009-08-10 Thread Ninad Raut
I am not getting any excpetion, but the document is not getting added to
Solr.
Here is the code:
public class ClientSearch {

 public SolrServer getSolrServer() throws MalformedURLException{
  //the instance can be reused
  return new CommonsHttpSolrServer(http://germinait22:8983/solr/core0/;);
 }

 void store() throws IOException, SolrServerException {
  IthursDocument ithursDocument = new IthursDocument();
  System.out.println(Created IthursDocument..);
  ithursDocument.setId(testID_2);
  ithursDocument.setMedia(BLOG);
  ithursDocument.setContent(Khatoo is a good Gal);
  Date date = new Date(23/08/2009);
  ithursDocument.setPubDate(date);
  MapString,String namedEntity = new HashMapString,String();
  namedEntity.put(Germinait, 0.7);
  ithursDocument.setNe(namedEntity);
  ithursDocument.setSentiment(0.1f);
  SolrServer server = getSolrServer();
  server.addBean(ithursDocument);
 }

 void query() throws MalformedURLException, SolrServerException {
  SolrServer server = getSolrServer();
  SolrQuery query = new SolrQuery();
  query.setQuery(id:testID);

  QueryResponse rsp = server.query(query);
  ListIthursDocument list= rsp.getBeans(IthursDocument.class);
  System.out.println(list.size());

 }

 public static void main(String[] args) {
  ClientSearch clientSearch = new ClientSearch();
  try {
  clientSearch.store();
  clientSearch.query();
  } catch (IOException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
  } catch (SolrServerException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
  }
 }
The logs show the following:
192.168.0.115 - - [10/08/2009:12:10:47 +] POST
/solr/core0/update?wt=javabinversion=1 HTTP/1.1 200 40
Where am I going wrong??


Re: Pojo not getting added to Solr Index

2009-08-10 Thread Avlesh Singh

 Where am I going wrong??

I think you forgot to commit after adding beans via the SolrServer.

PS: I am damn sure that you don't intend to create a new instance of
CommonsHttpSolrServer everytime.

Cheers
Avlesh

On Mon, Aug 10, 2009 at 5:55 PM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 I am not getting any excpetion, but the document is not getting added to
 Solr.
 Here is the code:
 public class ClientSearch {

  public SolrServer getSolrServer() throws MalformedURLException{
  //the instance can be reused
  return new CommonsHttpSolrServer(http://germinait22:8983/solr/core0/;);
  }

  void store() throws IOException, SolrServerException {
  IthursDocument ithursDocument = new IthursDocument();
  System.out.println(Created IthursDocument..);
  ithursDocument.setId(testID_2);
  ithursDocument.setMedia(BLOG);
  ithursDocument.setContent(Khatoo is a good Gal);
  Date date = new Date(23/08/2009);
  ithursDocument.setPubDate(date);
  MapString,String namedEntity = new HashMapString,String();
  namedEntity.put(Germinait, 0.7);
  ithursDocument.setNe(namedEntity);
  ithursDocument.setSentiment(0.1f);
  SolrServer server = getSolrServer();
  server.addBean(ithursDocument);
  }

  void query() throws MalformedURLException, SolrServerException {
  SolrServer server = getSolrServer();
  SolrQuery query = new SolrQuery();
  query.setQuery(id:testID);

  QueryResponse rsp = server.query(query);
  ListIthursDocument list= rsp.getBeans(IthursDocument.class);
  System.out.println(list.size());

  }

  public static void main(String[] args) {
  ClientSearch clientSearch = new ClientSearch();
  try {
  clientSearch.store();
  clientSearch.query();
  } catch (IOException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
  } catch (SolrServerException e) {
  // TODO Auto-generated catch block
  e.printStackTrace();
  }
  }
 The logs show the following:
 192.168.0.115 - - [10/08/2009:12:10:47 +] POST
 /solr/core0/update?wt=javabinversion=1 HTTP/1.1 200 40
 Where am I going wrong??



Re: Pojo not getting added to Solr Index

2009-08-10 Thread Ninad Raut
thanks Avlesh, u saved my day... !! yes I am not going to have a new
instance of server every time...  this is just a Proof of concept.

On Mon, Aug 10, 2009 at 6:06 PM, Avlesh Singh avl...@gmail.com wrote:

 
  Where am I going wrong??
 
 I think you forgot to commit after adding beans via the SolrServer.

 PS: I am damn sure that you don't intend to create a new instance of
 CommonsHttpSolrServer everytime.

 Cheers
 Avlesh

 On Mon, Aug 10, 2009 at 5:55 PM, Ninad Raut hbase.user.ni...@gmail.com
 wrote:

  I am not getting any excpetion, but the document is not getting added to
  Solr.
  Here is the code:
  public class ClientSearch {
 
   public SolrServer getSolrServer() throws MalformedURLException{
   //the instance can be reused
   return new CommonsHttpSolrServer(http://germinait22:8983/solr/core0/
 );
   }
 
   void store() throws IOException, SolrServerException {
   IthursDocument ithursDocument = new IthursDocument();
   System.out.println(Created IthursDocument..);
   ithursDocument.setId(testID_2);
   ithursDocument.setMedia(BLOG);
   ithursDocument.setContent(Khatoo is a good Gal);
   Date date = new Date(23/08/2009);
   ithursDocument.setPubDate(date);
   MapString,String namedEntity = new HashMapString,String();
   namedEntity.put(Germinait, 0.7);
   ithursDocument.setNe(namedEntity);
   ithursDocument.setSentiment(0.1f);
   SolrServer server = getSolrServer();
   server.addBean(ithursDocument);
   }
 
   void query() throws MalformedURLException, SolrServerException {
   SolrServer server = getSolrServer();
   SolrQuery query = new SolrQuery();
   query.setQuery(id:testID);
 
   QueryResponse rsp = server.query(query);
   ListIthursDocument list= rsp.getBeans(IthursDocument.class);
   System.out.println(list.size());
 
   }
 
   public static void main(String[] args) {
   ClientSearch clientSearch = new ClientSearch();
   try {
   clientSearch.store();
   clientSearch.query();
   } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   } catch (SolrServerException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
   }
   }
  The logs show the following:
  192.168.0.115 - - [10/08/2009:12:10:47 +] POST
  /solr/core0/update?wt=javabinversion=1 HTTP/1.1 200 40
  Where am I going wrong??
 



Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-10 Thread Grant Ingersoll
Right, a SearchComponent wrapper around some of the Solr Cell  
capabilities could make this so.


On Aug 9, 2009, at 11:21 AM, Jay Hill wrote:

Solr Cell definitely sounds like it has a place here. But wouldn't  
it be

needed for as an extracting component earlier in the process for the
MoreLikeThisHandler? The MLT Handler works great when it's directed  
to a
content stream of plain text. If we could just use Solr Cell to  
identify the
file type and do the content extraction earlier in the stream that  
would do
the trick I think. Then whether the URL pointed to HTML, a PDF, or  
whatever,

MLT would be receiving a stream of extracted content.

-Jay


On Sun, Aug 9, 2009 at 7:17 AM, Grant Ingersoll  
gsing...@apache.org wrote:


It's starting to sound like Solr Cell needs a SearchComponent as  
well, that
can come before the QueryComponent and can be used to map into the  
other
components.  Essentially, take the functionality of the extractOnly  
option

and have it feed other SearchComponent.



On Aug 8, 2009, at 10:42 AM, Ken Krugler wrote:



On Aug 7, 2009, at 5:23pm, Jay Hill wrote:

I'm using the MoreLikeThisHandler with a content stream to get  
documents

from my index that match content from an html page like this:

http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi
?f=/c/a/2009/08/06/ 
SP5R194Q13.DTLmlt.fl=bodyrows=4debugQuery=true


But, not surprisingly, the query generated is meaningless because  
a lot

of
the markup is picked out as terms:
str name=parsedquery_toString
body:li body:href  body:div body:class body:a body:script body:type
body:js
body:ul body:text body:javascript body:style body:css body:h  
body:img

body:var body:articl body:ad body:http body:span body:prop
/str

Does anyone know a way to transform the html so that the content  
can be
parsed out of the content stream and processed w/o the markup? Or  
do I

need
to write my own HTMLParsingMoreLikeThisHandler?



You'd want to parse the HTML to extract only text first, and use  
that for

your index data.

Both the Nutch and Tika OSS projects have examples of using HTML  
parsers
(based on TagSoup or CyberNeko) to generate content suitable for  
indexing.


-- Ken

If I parse the content out to a plain text file and point the  
stream.url

param to file:///parsedfile.txt it works great.

-Jay



--
Ken Krugler
TransPac Software, Inc.
http://www.transpac.com
+1 530-210-6378



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using

Solr/Lucene:
http://www.lucidimagination.com/search




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



UTF-8 query support?

2009-08-10 Thread Darren Govoni
Hi,
  I tried to query my text field with a UTF-8 string that was in the
indexed document, but it returned nothing.

e.g.
http://192.168.2.10:8081/solr4/select/?q=%E5%BE%93%E6%9D%A5%E9%80%9A%E3%
82%8Aversion=2.2start=0rows=10indent=on

The result page showed a garbled query string (wrong encoding).
str name=q従来通り/str

How do I set UTF-8 encoding so lucene can find the documents since it
supports UTF-8 queries?

thanks!
Darren




Re: mergeContiguous for multiple search terms

2009-08-10 Thread Koji Sekiguchi

Hachmann, Bjoern wrote:

Hello,
 
we would like to use the highlightingComponent with the mergeContiguous parameter set to true. 
 
We have a field with value: Ökonom Charles Goodhart.
 
If we search for all three words, they are found correctly: emÖkonom/em emCharles/em emGoodhart/em
 
But, as I set the mergeContiguous parameter to true, I expected: emÖkonom Charles Goodhart/em. Am I misunderstanding the behaviour of this parameter? We are using the dismax-query parser and solr-1.3.
  

Currrent highlighter doesn't support this type of highlighting.
Using FastVectorHighlighter in Lucene 2.9, when you query
phrase (Ökonom Charles Goodhart), you can expect the output
you mentioned above. But it hasn't been in Solr yet.

Koji



Re: UTF-8 query support?

2009-08-10 Thread Yonik Seeley
Your URL suggests you set up your own servlet container - that's
probably the issue.
If you're using tomcat see http://wiki.apache.org/solr/SolrTomcat
Test out your config with example/exampledocs/test_utf8.sh

-Yonik
http://www.lucidimagination.com



On Mon, Aug 10, 2009 at 10:19 AM, Darren Govonidar...@ontrenet.com wrote:
 Hi,
  I tried to query my text field with a UTF-8 string that was in the
 indexed document, but it returned nothing.

 e.g.
 http://192.168.2.10:8081/solr4/select/?q=%E5%BE%93%E6%9D%A5%E9%80%9A%E3%
 82%8Aversion=2.2start=0rows=10indent=on

 The result page showed a garbled query string (wrong encoding).
 str name=qå¾“æ ¥é€šã‚Š/str

 How do I set UTF-8 encoding so lucene can find the documents since it
 supports UTF-8 queries?

 thanks!
 Darren





Re: UTF-8 query support?

2009-08-10 Thread Mats Lindh
On Mon, Aug 10, 2009 at 4:19 PM, Darren Govonidar...@ontrenet.com wrote:
 How do I set UTF-8 encoding so lucene can find the documents since it
 supports UTF-8 queries?

This depends on the app server you're using. I'm guessing Tomcat (as
that's where I had the same issue), and you can fix this by enabling
UTF-8 encoded query strings in Tomcat itself:

http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4

--mats


Re: Embedded Solr Clustering

2009-08-10 Thread born2fish

Thanks Shalin and Avlesh for your responses.

Yes we are using Solr for a non-traditional search purpose and the
performance is critical. However it sounds like that sharing the same index
could slow down reading / writing to the index. And access synchronization
is tricky as well.

Therefore, we might have to use a single web based Solr instance or use
multiple embedded Solr instances and setup the script based replication.

Thanks again for your help!




born2fish wrote:
 
 Hi everyone,
 
 We have a web app that uses embedded solr for better performance. Now we
 are trying to deploy the app to a clustered environment. My question is:
 
 1. Can we configure the embedded solr instances to share the same index on
 the network?
 2. If the answer to question 1 is no, can we configure embedded solr
 instances to replicate indexes in a master / slave fashion just like
 normal web based Solr?
 
 Thanks,
 
 born2fish
 

-- 
View this message in context: 
http://www.nabble.com/Embedded-Solr-Clustering-tp24891931p24900854.html
Sent from the Solr - User mailing list archive at Nabble.com.



[OT] Solr Webinar

2009-08-10 Thread Grant Ingersoll
I will be giving a free one hour webinar on getting started with  
Apache Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT


You can sign up @ 
http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP

I will present and demo:
* Getting started with LucidWorks for Solr
* Getting better, faster results using Solr's findability and  
relevance improvement tools
* Deploying Solr in production, including monitoring performance and  
trends with the LucidGaze for Solr performance profiler


-Grant

Re: [OT] Solr Webinar

2009-08-10 Thread Lucas F. A. Teixeira
Hello Grant,
Will the webinar be recorded and available to download later someplace?
Unfortunately, I can't watch this time.

Thanks,

[]s,

Lucas Frare Teixeira .·.
- lucas...@gmail.com
- blog.lucastex.com
- twitter.com/lucastex


On Mon, Aug 10, 2009 at 12:33 PM, Grant Ingersoll gsing...@apache.orgwrote:

 I will be giving a free one hour webinar on getting started with Apache
 Solr on August 13th, 2009 ~ 11:00 AM PDT / 2:00 PM EDT

 You can sign up @
 http://www2.eventsvc.com/lucidimagination/081309?trk=WR-AUG2009-AP

 I will present and demo:
 * Getting started with LucidWorks for Solr
 * Getting better, faster results using Solr's findability and relevance
 improvement tools
 * Deploying Solr in production, including monitoring performance and trends
 with the LucidGaze for Solr performance profiler

 -Grant


Re: Relevant results with DisMaxRequestHandler

2009-08-10 Thread Vincent Pérès

Hello,

Thank you for your answer, I finally used only a 'qf' parameter in the
dismax requesthandler and it seems that I have now better and more relevant
results.
I just don't understand why a result is mainly boosted by his last update by
default !

Vincent
-- 
View this message in context: 
http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24903143.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Relevant results with DisMaxRequestHandler

2009-08-10 Thread Vincent Pérès

I actually have an other question...

The 'qf' parameter used in the dismax seems to work with a 'AND' separator.
I have much more results without dixmax. Is there any way to keep the same
amount of document and process the 'qf' ?

My dismax : 
   requestHandler name=dismax class=solr.SearchHandler 
 lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qf
 text^0.5 title_ac^4.0 name_ac^4.0 authors_list_sm^4.0
  /str
 /lst
   /requestHandler
-- 
View this message in context: 
http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24903219.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dealing with duplicates

2009-08-10 Thread Joe Calderon
so in the case someone can help me with the query syntax, the
relational query i would use for this would be something like:

SELECT * FROM videos
WHERE
title LIKE 'family guy'
AND desc LIKE 'stewie%'
AND (
  ( is_dup = 0 )
  OR
  ( is_dup = 1 AND id NOT IN
(
SELECT id FROM videos
WHERE
title LIKE 'family guy'
AND desc LIKE 'stewie%'
AND is_dup = 0
)
  )
)
ORDER BY views
LIMIT 10

can a similar query be written in lucene or do i need to structure my
index differently to be able to do such a query?

thx much

--joe


On Sat, Aug 1, 2009 at 9:15 AM, Joe Calderoncalderon@gmail.com wrote:
 hello, thanks for the response, i did take a look at that document but
 in my application i actually want the duplicates, as i mentioned, the
 matching text could be very different among cluster members, what
 joins them together is a similar set of numeric features.

 currently i do a query with fq=duplicate:0 and show a link to
 optionally show the dupes via by querying for all dupes of the
 master id, however im currently missing any documents that matched the
 query but are duplicates of other masters not included in that result
 set.

 in a relational database (fulltext indexing aside) i would use a
 subquery, i imagine a similar approach could be used with lucene, i
 just dont know the syntax

 best,

 --joe

 On Fri, Jul 31, 2009 at 11:32 PM, Otis
 Gospodneticotis_gospodne...@yahoo.com wrote:
 Joe,

 Maybe we can take a step back first.  Would it be better if your index was 
 cleaner and didn't have flagged duplicates in the first place?  If so, have 
 you tried using http://wiki.apache.org/solr/Deduplication ?

  Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
 From: Joe Calderon calderon@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, July 31, 2009 5:06:48 PM
 Subject: dealing with duplicates

 hello all, i have a collection of a few million documents; i have many
 duplicates in this collection. they have been clustered with a simple
 algorithm, i have a field called 'duplicate' which is 0 or 1 and a
 fields called 'description, tags, meta', documents are clustered on
 different criteria and the text i search against could be very
 different among members of a cluster.

 im currently using a dismax handler to search across the text fields
 with different boosts, and a filter query to restrict to masters
 (duplicate: 0)

 my question is then, how do i best query for documents which are
 masters OR match text but are not included in the matched set of
 masters?

 does this make sense?





Re: UTF-8 query support?

2009-08-10 Thread Darren Govoni
Thank you! I am using Tomcat and will give it a try.

On Mon, 2009-08-10 at 16:31 +0200, Mats Lindh wrote:
 On Mon, Aug 10, 2009 at 4:19 PM, Darren Govonidar...@ontrenet.com wrote:
  How do I set UTF-8 encoding so lucene can find the documents since it
  supports UTF-8 queries?
 
 This depends on the app server you're using. I'm guessing Tomcat (as
 that's where I had the same issue), and you can fix this by enabling
 UTF-8 encoded query strings in Tomcat itself:
 
 http://wiki.apache.org/solr/SolrTomcat#head-20147ee4d9dd5ca83ed264898280ab60457847c4
 
 --mats



Overview of Query Parsing API Stack? / Dismax parsing, new 1.4 parsing, etc.

2009-08-10 Thread Mark Bennett
There's some good Wiki pages on the syntax to use for queries, including
nested queries.

But trying traipse through the code to get the big picture is a bit
involved.

A couple example:

Over the past few months I've had several questions about dismax, and why it
was or wasn't doing something a certain way.  I came up with a workaround
for CJK, but today I'm back looking at the shingles stuff today and where,
exactly, shingle queries break.  I found the logical discussions about *why*
in some of the threads, but the actual code path makes quite a few hops, to
util classes, and to Lucene, etc.  I'll get there eventually, but having a
map would be nice.

Another example, at the last Meetup it was mentioned that big changes are
coming to query parsing pretty soon.  Understanding the before and after
logic would be nice, and I don't recall whether that impacted just Lucene,
or if Solr was also going to be affected.

--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513


Re: excluding certain terms from facet counts when faceting based on indexed terms of a field

2009-08-10 Thread Bill Au
I just upgraded to Solr 1.4/Lucene 2.9 for something else so I am trying to
see if I can use localParams to exclude certain terms from the facet
counts.  I tried the suggested:

facet.field={!terms=foo,bar}cat

actually only shows the facet counts of foo and bar.  What I want is to
exclude a value from the facet counts so I tried:

facet.field={!ex=cat:foo}cat

but that has not effect as as foo still show up in the facet counts.

Still looking...

Bill


On Thu, Jul 23, 2009 at 11:53 AM, Bill Au bill.w...@gmail.com wrote:

 That's actually what we have been doing.  I was just wondering if there is
 any way to move this work from the client back into Solr.

 Bill


 On Thu, Jul 23, 2009 at 11:47 AM, Erik Hatcher e...@ehatchersolutions.com
  wrote:

 Give it is a small number of terms, seems like just excluding them from
 use/visibility on the client would be reasonable.

Erik


 On Jul 23, 2009, at 11:43 AM, Bill Au wrote:

  I want to exclude a very small number of terms which will be different
 for
 each query.  So I think my best bet is to use localParam.

 Bill

 On Wed, Jul 22, 2009 at 4:16 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:


 : I am faceting based on the indexed terms of a field by using
 facet.field.
 : Is there any way to exclude certain terms from the facet counts?

 if you're talking about a lot of terms, and they're going to be hte same
 for *all* queries, the best appraoch is to strip them out when indexing
 (StopWordFilter is your freind)

 -Hoss







Question mark glyphs in indexed content

2009-08-10 Thread Rupert Fiasco
Hello, I am using the latest Solr4j to index content. When I look at
that content in the Solr Admin web utility I see weird characters like
this:

http://brockwine.com/images/solrglyphs.png

When I look at the text in the MySQL DB those chars appear to just be
plain hyphens. The MySQL table character set is utf8 and the collation
is utf8.

Environment:
OS X 10.5.8
java version 1.5.0_19
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304)
Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing)

Solr Specification Version: 1.3.0
Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47
Lucene Specification Version: 2.4-dev
Lucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16

Jetty 6.1.3

Any thoughts?

Thanks
/Rupert


Newbie problem ordering results

2009-08-10 Thread Germán Biozzoli
Hello everybody

I have the following (resumed) schema:


field name=title type=text indexed=true stored=true
multiValued=true/
   field name=titleorder type=string indexed=true stored=true
multiValued=true/
   field name=contributor type=text indexed=true stored=true
multiValued=true/
   field name=contributorfacet type=textFacetN indexed=true
stored=true multiValued=true/
   field name=contributororder type=string indexed=true
stored=true multiValued=true/
.

copyField source=title dest=text /
copyField source=title dest=titleorder /
copyField source=contributor dest=text /
copyField source=contributor dest=contributorfacet /
copyField source=contributor dest=contributororder /
...

I use for instance contributor for searching, contributorfacet for
faceting and order for ordering results, but when I try to order using
contributororder, Solr says that cannot order by a tokenized
field...(?)

I'm using Solr 1.4 nightly. Is this a bug? I believe that in previous
versions I have this issue working...

Regards and thanks
Germán


Re: dealing with duplicates

2009-08-10 Thread Avlesh Singh
Can you please provide your schema details here?

Cheers
Avlesh

On Tue, Aug 11, 2009 at 1:29 AM, Joe Calderon calderon@gmail.comwrote:

 so in the case someone can help me with the query syntax, the
 relational query i would use for this would be something like:

 SELECT * FROM videos
 WHERE
 title LIKE 'family guy'
 AND desc LIKE 'stewie%'
 AND (
  ( is_dup = 0 )
  OR
  ( is_dup = 1 AND id NOT IN
(
SELECT id FROM videos
WHERE
title LIKE 'family guy'
AND desc LIKE 'stewie%'
AND is_dup = 0
)
  )
 )
 ORDER BY views
 LIMIT 10

 can a similar query be written in lucene or do i need to structure my
 index differently to be able to do such a query?

 thx much

 --joe


 On Sat, Aug 1, 2009 at 9:15 AM, Joe Calderoncalderon@gmail.com
 wrote:
  hello, thanks for the response, i did take a look at that document but
  in my application i actually want the duplicates, as i mentioned, the
  matching text could be very different among cluster members, what
  joins them together is a similar set of numeric features.
 
  currently i do a query with fq=duplicate:0 and show a link to
  optionally show the dupes via by querying for all dupes of the
  master id, however im currently missing any documents that matched the
  query but are duplicates of other masters not included in that result
  set.
 
  in a relational database (fulltext indexing aside) i would use a
  subquery, i imagine a similar approach could be used with lucene, i
  just dont know the syntax
 
  best,
 
  --joe
 
  On Fri, Jul 31, 2009 at 11:32 PM, Otis
  Gospodneticotis_gospodne...@yahoo.com wrote:
  Joe,
 
  Maybe we can take a step back first.  Would it be better if your index
 was cleaner and didn't have flagged duplicates in the first place?  If so,
 have you tried using http://wiki.apache.org/solr/Deduplication ?
 
   Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
  From: Joe Calderon calderon@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Friday, July 31, 2009 5:06:48 PM
  Subject: dealing with duplicates
 
  hello all, i have a collection of a few million documents; i have many
  duplicates in this collection. they have been clustered with a simple
  algorithm, i have a field called 'duplicate' which is 0 or 1 and a
  fields called 'description, tags, meta', documents are clustered on
  different criteria and the text i search against could be very
  different among members of a cluster.
 
  im currently using a dismax handler to search across the text fields
  with different boosts, and a filter query to restrict to masters
  (duplicate: 0)
 
  my question is then, how do i best query for documents which are
  masters OR match text but are not included in the matched set of
  masters?
 
  does this make sense?
 
 
 



Re: Newbie problem ordering results

2009-08-10 Thread Avlesh Singh
Can you please post the fieldType definition for the string field in your
schema.xml?

Cheers
Avlesh

On Tue, Aug 11, 2009 at 9:52 AM, Germán Biozzoli
germanbiozz...@gmail.comwrote:

 Hello everybody

 I have the following (resumed) schema:

 
field name=title type=text indexed=true stored=true
 multiValued=true/
   field name=titleorder type=string indexed=true stored=true
 multiValued=true/
   field name=contributor type=text indexed=true stored=true
 multiValued=true/
   field name=contributorfacet type=textFacetN indexed=true
 stored=true multiValued=true/
   field name=contributororder type=string indexed=true
 stored=true multiValued=true/
 .
 
 copyField source=title dest=text /
 copyField source=title dest=titleorder /
 copyField source=contributor dest=text /
 copyField source=contributor dest=contributorfacet /
 copyField source=contributor dest=contributororder /
 ...

 I use for instance contributor for searching, contributorfacet for
 faceting and order for ordering results, but when I try to order using
 contributororder, Solr says that cannot order by a tokenized
 field...(?)

 I'm using Solr 1.4 nightly. Is this a bug? I believe that in previous
 versions I have this issue working...

 Regards and thanks
 Germán



Querying Dynamic Fields.. simple query not working

2009-08-10 Thread Ninad Raut
Hi,
when I do a *:* query I can see the dynamic field as show below:
str name=ne_.*{Germinait=0.7}/str
but when I try to query for the same like ne_Germinait:0.7 I get zero
records.
All the other field which are not dynamic can be easily queried.
Can some one please tell me how to query for dynamic fields?
Thanks.
Ninad.


Retrieving the boost factor using Solrj CommonsHttpSolrServer

2009-08-10 Thread Villemos, Gert
I'm using the solrj CommonsHttpSolrServer to retrieve documents from the index 
for update. I therefore also need to retrieve the boost factor as else each 
resubmission would reset the boost factor. I just cant figure out how to 
retrieve the boost factor.
 
The boost factor is available in the SolrInputDocument, but not in the 
SolrDocument returned by the SolrServer 'query' method. And there is no 
relationship between the SolrInputDocument and the SolrDocument (... which in 
itself is pretty confusing).
 
How can I get the boost factor? Do I have to use 'request' method and parse the 
result myself?
 
Cheers,
Gert.
 
 


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen /  Por favor ajude a Logica a respeitar 
o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: Querying Dynamic Fields.. simple query not working

2009-08-10 Thread Avlesh Singh
Weird that you get to see a field name like ne_.* in the response. I am
afraid that you might be using the field in an incorrect way.
Can you share the field definition please? And a peek into how are you
populating these fields?

Cheers
Avlesh

On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi,
 when I do a *:* query I can see the dynamic field as show below:
 str name=ne_.*{Germinait=0.7}/str
 but when I try to query for the same like ne_Germinait:0.7 I get zero
 records.
 All the other field which are not dynamic can be easily queried.
 Can some one please tell me how to query for dynamic fields?
 Thanks.
 Ninad.



Re: Querying Dynamic Fields.. simple query not working

2009-08-10 Thread Ninad Raut
This is the POJO field mapping:
@Field(*_ne)
MapString,String ne = new HashMapString,String();
this is how I set the value:
MapString,String namedEntity = new HashMapString,String();
  namedEntity.put(Germinait, 0.7);
  ithursDocument.setNe(namedEntity);
  server.addBean(ithursDocument);
  server.commit();
The schema had this dynamic field:
 dynamicField name=ne_* type=string indexed=true stored=true/
Let me know if something is missing. Thanks Avlesh.
On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com wrote:

 Weird that you get to see a field name like ne_.* in the response. I am
 afraid that you might be using the field in an incorrect way.
 Can you share the field definition please? And a peek into how are you
 populating these fields?

 Cheers
 Avlesh

 On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.com
 wrote:

  Hi,
  when I do a *:* query I can see the dynamic field as show below:
  str name=ne_.*{Germinait=0.7}/str
  but when I try to query for the same like ne_Germinait:0.7 I get zero
  records.
  All the other field which are not dynamic can be easily queried.
  Can some one please tell me how to query for dynamic fields?
  Thanks.
  Ninad.
 



Re: Querying Dynamic Fields.. simple query not working

2009-08-10 Thread Avlesh Singh
Ah! I guessed you were using it this way.

I would need to reconfirm this, but there seems to be an inconsistency in
fetching data versus adding data via SolrJ w.r.t dynamic fields.
SOLR-1129https://issues.apache.org/jira/browse/SOLR-1129is
essentially about binding the response into a bean with a Map type
property. My guess is that SolrInputDocument is yet to understand the map
type property while firing update requests. I don't think it works in the
way you have used it :(

Noble, can you please confirm this? If my guess turns out to be true, lets
open a JIRA issue asap.

Cheers
Avlesh

On Tue, Aug 11, 2009 at 10:45 AM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 This is the POJO field mapping:
 @Field(*_ne)
 MapString,String ne = new HashMapString,String();
 this is how I set the value:
 MapString,String namedEntity = new HashMapString,String();
  namedEntity.put(Germinait, 0.7);
  ithursDocument.setNe(namedEntity);
  server.addBean(ithursDocument);
  server.commit();
 The schema had this dynamic field:
  dynamicField name=ne_* type=string indexed=true stored=true/
 Let me know if something is missing. Thanks Avlesh.
 On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com wrote:

  Weird that you get to see a field name like ne_.* in the response. I am
  afraid that you might be using the field in an incorrect way.
  Can you share the field definition please? And a peek into how are you
  populating these fields?
 
  Cheers
  Avlesh
 
  On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut hbase.user.ni...@gmail.com
  wrote:
 
   Hi,
   when I do a *:* query I can see the dynamic field as show below:
   str name=ne_.*{Germinait=0.7}/str
   but when I try to query for the same like ne_Germinait:0.7 I get zero
   records.
   All the other field which are not dynamic can be easily queried.
   Can some one please tell me how to query for dynamic fields?
   Thanks.
   Ninad.
  
 



Re: Querying Dynamic Fields.. simple query not working

2009-08-10 Thread Ninad Raut
Hi Avlesh,
Can you tell me a work around to this problem?? Till you have this
resolved.:)
Regards,
Ninad.

On Tue, Aug 11, 2009 at 11:16 AM, Avlesh Singh avl...@gmail.com wrote:

 Ah! I guessed you were using it this way.

 I would need to reconfirm this, but there seems to be an inconsistency in
 fetching data versus adding data via SolrJ w.r.t dynamic fields.
 SOLR-1129https://issues.apache.org/jira/browse/SOLR-1129is
 essentially about binding the response into a bean with a Map type
 property. My guess is that SolrInputDocument is yet to understand the map
 type property while firing update requests. I don't think it works in the
 way you have used it :(

 Noble, can you please confirm this? If my guess turns out to be true, lets
 open a JIRA issue asap.

 Cheers
 Avlesh

 On Tue, Aug 11, 2009 at 10:45 AM, Ninad Raut hbase.user.ni...@gmail.com
 wrote:

  This is the POJO field mapping:
  @Field(*_ne)
  MapString,String ne = new HashMapString,String();
  this is how I set the value:
  MapString,String namedEntity = new HashMapString,String();
   namedEntity.put(Germinait, 0.7);
   ithursDocument.setNe(namedEntity);
   server.addBean(ithursDocument);
   server.commit();
  The schema had this dynamic field:
   dynamicField name=ne_* type=string indexed=true stored=true/
  Let me know if something is missing. Thanks Avlesh.
  On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com wrote:
 
   Weird that you get to see a field name like ne_.* in the response. I
 am
   afraid that you might be using the field in an incorrect way.
   Can you share the field definition please? And a peek into how are you
   populating these fields?
  
   Cheers
   Avlesh
  
   On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut 
 hbase.user.ni...@gmail.com
   wrote:
  
Hi,
when I do a *:* query I can see the dynamic field as show below:
str name=ne_.*{Germinait=0.7}/str
but when I try to query for the same like ne_Germinait:0.7 I get zero
records.
All the other field which are not dynamic can be easily queried.
Can some one please tell me how to query for dynamic fields?
Thanks.
Ninad.
   
  
 



Re: Querying Dynamic Fields.. simple query not working

2009-08-10 Thread Avlesh Singh
Well there are multiple ways to do it.
Instead of using your own class (with annotated fields), you can directly
use an instance of SolrInputDocument for each document and call a
SolrServer.add(SolrInputDocument doc). For each SolrInputDocument, you can
use the addField(String name, Object value) to add data per field. For
dynamic fields, just pass in the full field name, Germinait_ne in your
case, as the first argument and 0.7 as the second one.

Search the way you were doing earlier.

Cheers
Avlesh

On Tue, Aug 11, 2009 at 11:20 AM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi Avlesh,
 Can you tell me a work around to this problem?? Till you have this
 resolved.:)
 Regards,
 Ninad.

 On Tue, Aug 11, 2009 at 11:16 AM, Avlesh Singh avl...@gmail.com wrote:

  Ah! I guessed you were using it this way.
 
  I would need to reconfirm this, but there seems to be an inconsistency in
  fetching data versus adding data via SolrJ w.r.t dynamic fields.
  SOLR-1129https://issues.apache.org/jira/browse/SOLR-1129is
  essentially about binding the response into a bean with a Map type
  property. My guess is that SolrInputDocument is yet to understand the
 map
  type property while firing update requests. I don't think it works in the
  way you have used it :(
 
  Noble, can you please confirm this? If my guess turns out to be true,
 lets
  open a JIRA issue asap.
 
  Cheers
  Avlesh
 
  On Tue, Aug 11, 2009 at 10:45 AM, Ninad Raut hbase.user.ni...@gmail.com
  wrote:
 
   This is the POJO field mapping:
   @Field(*_ne)
   MapString,String ne = new HashMapString,String();
   this is how I set the value:
   MapString,String namedEntity = new HashMapString,String();
namedEntity.put(Germinait, 0.7);
ithursDocument.setNe(namedEntity);
server.addBean(ithursDocument);
server.commit();
   The schema had this dynamic field:
dynamicField name=ne_* type=string indexed=true stored=true/
   Let me know if something is missing. Thanks Avlesh.
   On Tue, Aug 11, 2009 at 10:34 AM, Avlesh Singh avl...@gmail.com
 wrote:
  
Weird that you get to see a field name like ne_.* in the response.
 I
  am
afraid that you might be using the field in an incorrect way.
Can you share the field definition please? And a peek into how are
 you
populating these fields?
   
Cheers
Avlesh
   
On Tue, Aug 11, 2009 at 10:29 AM, Ninad Raut 
  hbase.user.ni...@gmail.com
wrote:
   
 Hi,
 when I do a *:* query I can see the dynamic field as show below:
 str name=ne_.*{Germinait=0.7}/str
 but when I try to query for the same like ne_Germinait:0.7 I get
 zero
 records.
 All the other field which are not dynamic can be easily queried.
 Can some one please tell me how to query for dynamic fields?
 Thanks.
 Ninad.