Re: [POLL] How do you (like to) do logging with Solr

2011-05-17 Thread openvictor Open
Wow... Nobody is using the one with Jetty ? It was a good option for me
because I like to have separate processes for different things : A tomcat
server for all the webapps of my server, Jetty Server with Solr and a drools
server. Was it a stupid idea from the beginning ?

So my choice :

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
[ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using
an ANT option
[X]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!


Victor

2011/5/17 Shawn Heisey s...@elyograg.org

 On 5/16/2011 5:47 AM, Jan Høydahl wrote:

 That's what happens if we ship solr.war without any pre-set logger binding
 - it's the binding provided in your app-server's classpath which will be
 used.


 I use the jetty that's bundled in the example, but with my own directory
 structure that's a lot different, and a homegrown init.d script.  I haven't
 changed the binding in solr.war, but I have created a logging.properties
 file to reduce it to WARNING by default and configured
 java.util.logging.config.file in jetty.xml.

 If I understand what you've said above correctly, removing the binding in
 solr.war would make it inherit the binding in jetty/tomcat/whatever, is that
 right?  That sounds like an awesome plan to me.  The example jetty server
 can be configured instead of solr.war.  Once you've answered this, I can
 submit my vote.

 A semi-related question ... is there any way to get jetty to log the entire
 URL in its request log?  Almost every request we send is truncated.  Some of
 our request URLs are nearly 20K in size.  We've had to tune all the configs
 for that to work.  We are working on making them smaller, but that's not
 going to happen quickly.  I've done a lot of searching on this topic and
 come up empty.

 Thanks,
 Shawn




Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open
Hi everybody,


Recently I implemented an autocomplete mechanism for my website using a
custom TermsComponent. I was quite happy with that because it also enables
me to do a Google-like feature where complete sentences where suggested to
the user when he typed in the search field. I used Shingles to search
against pieces of sentences.
(I have resources for French people if somebody asks)

Then came solr 3.1 and its new suggest component. I have looked at the
documentation but it's still unclear how it works exactly. So please let me
ask some questions :


   - Is there performance improvements over TermsComponent ?
   - Is it able to autosuggest sentences and not only words ? If yes, how ?
   Should I keep my shingles ?
   - What is this threshold value that I see ? Is it a mandatory field to
   complete ? I want to have suggestion no matter what the frequency is in the
   document !


Thank you all, if I succeed to do that I will try to provide a tutorial to
do what with Jquery UI autocomplete + Suggest component if anyone's
interested.
Best regards.

Victor


Re: Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open
Hi Quentin, well stick in this thread, I will try to see how it works and
get inputs from other people.

Here is the link to my blog who shows how to do it :

http://www.victorkabdebon.net/archives/16

Note that I used Tomcat + SolR, but it can easily done with PHP. Also solrj
in 1.4.1 didn't have terms component so I had to find a way around that
problem but it's provided.



2011/4/15 Quentin Proust q.pro...@gmail.com

 Hi Victor,

 I have the same questions about the new Suggest component.
 I can't really help you as I didn't really manage to understand how it
 worked.
 Sometimes, I had more results, sometimes less.

 Even so, I would really be interested in your resources using Terms and
 shingles to implement auto-complete.
 I am myself a French student and it could help me improve the solution of
 one of my project.

 Best regards,
 Quentin

 2011/4/15 openvictor Open openvic...@gmail.com

  Hi everybody,
 
 
  Recently I implemented an autocomplete mechanism for my website using a
  custom TermsComponent. I was quite happy with that because it also
 enables
  me to do a Google-like feature where complete sentences where suggested
 to
  the user when he typed in the search field. I used Shingles to search
  against pieces of sentences.
  (I have resources for French people if somebody asks)
 
  Then came solr 3.1 and its new suggest component. I have looked at the
  documentation but it's still unclear how it works exactly. So please let
 me
  ask some questions :
 
 
- Is there performance improvements over TermsComponent ?
- Is it able to autosuggest sentences and not only words ? If yes, how
 ?
Should I keep my shingles ?
- What is this threshold value that I see ? Is it a mandatory field
 to
complete ? I want to have suggestion no matter what the frequency is in
  the
document !
 
 
  Thank you all, if I succeed to do that I will try to provide a tutorial
 to
  do what with Jquery UI autocomplete + Suggest component if anyone's
  interested.
  Best regards.
 
  Victor
 



 --
 
 Quentin Proust
 Email : q.pro...@gmail.com
 Tel : 06.78.81.15.94
 http://www.linkedin.com/in/quentinproust
 



Re: Solrj performance bottleneck

2011-04-04 Thread openvictor Open
Dear Rahul,

Stefan has the right solution. the autosuggest must be checked both from
Javascript and your backend. For javascript there are some really nice tools
to do that such as Jquery which implements a auto-suggest with a tunable
delay. It has also highlighting, you can add additional information etc...
It is actually quite impressive. Here is the address :
http://jqueryui.com/demos/autocomplete/#remote-jsonp. It's open source so
you can just copy what they have done or see the method they used.
For backend limit the number of request / second per ip or session and / or
cache result. As for cache normally solr caches the common request but I
don't know for term components.

Hope this helps you !

Victor

2011/4/4 Stefan Matheis matheis.ste...@googlemail.com

 rahul,

 On Mon, Apr 4, 2011 at 4:18 PM, rahul asharud...@gmail.com wrote:
  if anybody has some suggestions/experience on how to leverage
 autosuggestion
  without affecting search performance much, please do share them.

 we use javascript intervals for autosuggestion. regularly check the
 value of the monitored input field and if changed, trigger a new
 request. this will cover both cases, slow-typing users and also
 ten-finger-guys (which will type much faster). a new request for every
 added character is indeed too much, even if your backend is responding
 within a few ms.

 Regards
 Stefan



Searching all terms - SolrJ

2011-03-01 Thread openvictor Open
Dear all,

First I am sorry if this question has already been asked ( I am sure it
was...) but I can't find the right option with solrj.

I want to query only documents that contains ALL query terms.
Let me take an example, I have 4 documents that are simple sequences  ( they
have only one field : text ):

1 : The cat is on the roof
2 : The dog is on the roof
3 : The cat is black
4 : the cat is black and on the roof

if I search cat roof I will have doc 1,2,3,4
In my case I would like to have only : doc 1 and doc 4 (either cat or roof
don't appear in doc 2 and 3).

Is there a simple way to do that automatically with SolrJ or should I should
something like :
text:cat AND text:roof ?

Thank you very much for your help !

Best regards,
Victor


Re: Searching all terms - SolrJ

2011-03-01 Thread openvictor Open
Yes but I want to leave the choice to the user.

He can either search all the terms or just some.

Is there any more flexible solution ? Even if I have to code it by hand ?



2011/3/1 Ahmet Arslan iori...@yahoo.com


 --- On Wed, 3/2/11, openvictor Open openvic...@gmail.com wrote:

  From: openvictor Open openvic...@gmail.com
  Subject: Searching all terms - SolrJ
  To: solr-user@lucene.apache.org
  Date: Wednesday, March 2, 2011, 12:20 AM
  Dear all,
 
  First I am sorry if this question has already been asked (
  I am sure it
  was...) but I can't find the right option with solrj.
 
  I want to query only documents that contains ALL query
  terms.
  Let me take an example, I have 4 documents that are simple
  sequences  ( they
  have only one field : text ):
 
  1 : The cat is on the roof
  2 : The dog is on the roof
  3 : The cat is black
  4 : the cat is black and on the roof
 
  if I search cat roof I will have doc 1,2,3,4
  In my case I would like to have only : doc 1 and doc 4
  (either cat or roof
  don't appear in doc 2 and 3).
 
  Is there a simple way to do that automatically with SolrJ
  or should I should
  something like :
  text:cat AND text:roof ?
 
  Thank you very much for your help !

 You can use solrQueryParser defaultOperator=AND/ in your schema.xml






Re: Searching all terms - SolrJ

2011-03-01 Thread openvictor Open
Great !

Thank you very much Chris, it will come handy !

Best regards,
Victor

2011/3/1 Chris Hostetter hossman_luc...@fucit.org


 : Yes but I want to leave the choice to the user.
 :
 : He can either search all the terms or just some.
 :
 : Is there any more flexible solution ? Even if I have to code it by hand ?

 the declaration in the schema dictates the default.

 you can override the default at query time using the q.op param (ie:
 q.op=AND, q.op=OR) in the request.

 in SolrJ you would just call solrQuery.set(q.op,OR) on your SolrQuery
 object.

 -Hoss



Re: Using terms and N-gram

2011-02-04 Thread openvictor Open
Hi Otis,

That's good I finally made it. For sematext I am afraid that I am too poor
to consider this solution :) (I am doing that for fun)
Thank you anyway !

2011/2/4 Otis Gospodnetic otis_gospodne...@yahoo.com

 Hi,

 The main difference is that CommonGrams will take 2 adjacent words and put
 them
 together, while NGram* stuff will take a single word and chop it up in
 sequences
 of one or more characters/letters.

 If you are stuck with auto-complete stuff, consider
 http://sematext.com/products/autocomplete/index.html

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: openvictor Open openvic...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Thu, February 3, 2011 10:15:47 AM
  Subject: Re: Using terms and N-gram
 
  Thank you, I will do that and hopefuly it will be handy !
 
  But can someone  explain me difference between CommonGramFIlterFactory et
  NGramFilterFactory ?  ( Maybe the solution is there)
 
  Thank you all,
  best  regards
 
  2011/2/3 Grijesh pintu.grij...@gmail.com
 
  
Use analysis.jsp to see what happening at index time and query time
  with
   your
   input data.You can use highlighting to see if match  found.
  
   -
   Thanx:
   Grijesh
   http://lucidimagination.com
   --
   View this message in  context:
  
 
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
Sent from the Solr - User mailing list archive at Nabble.com.
  
 



Re: Terms and termscomponent questions

2011-02-03 Thread openvictor Open
Dear Erick,

You were totally right about the fact that I didn't use any space to
separate words, cause SolR to concatenate words !
Everything is solved now. Thank you very much for your help !

Best regards,
Victor Kabdebon

2011/2/3 Erick Erickson erickerick...@gmail.com

 There are a couple of things going on here. First,
 WordDelimiterFilterFactory is
 splitting things up on letter/number boundaries. Take a look at:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 for a list of *some* of the available tokenizers. You may want to just use
 one of the others, or change the parameters to
 WordDelimiterFilterFilterFactory
 to not split as it is.

 See the page: http://localhost:8983/solr/admin/analysis.jsp and check the
 verbose
 box to see what the effects of the various elements in your analysis chain
 are.
 This is a very important page for understanding the analysis part of the
 whole
 operation.

 Second, if you've been trying different things out, you may well have some
 old stuff in your index. When you delete documents, the terms are still in
 the index until an optimize. I'd advise starting with a clean slate for
 your
 experiments each time. The cheap way to do this is stop your server and
 delete solr_home/data/index. Delete the index directory too, not just the
 contents. So it's possible your TermsComponent is returning data from
 previous
 attempts, because I sure don't see how the concatenated terms would be
 in this index given the definition you've posted.

 And if none of that works, well, we'll try something else G..

 Best
 Erick

 On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open openvic...@gmail.com
 wrote:

  Dear Erick,
 
  Thank you for your answer, here is my fieldtype definition. I took the
  standard one because I don't need a better one for this field
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0 catenateNumbers=0
  catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt/
  /analyzer
  /fieldType
 
  Now my field :
 
  field name=p_field type=text indexed=true stored=true/
 
  But I have a doubt now... Do I really put a space between words or is it
  just a coma... If I only put a coma then the whole process is going to be
  impacted ? What I don't really understand is that I find the separate
  words,
  but also their concatenation (but again in one direction only). Let me
  explain : if a have man bear pig I will find :
  manbearpig bearpig but never pigman or anyother combination in a
  different order.
 
  Thank you very much
  Best Regards,
  Victor
 
  2011/2/1 Erick Erickson erickerick...@gmail.com
 
   Nope, this isn't what I'd expect. There are a couple of possibilities:
   1 check out what WordDelimiterFilterFactory is doing, although
   if you're really sending spaces that's probably not it.
   2 Let's see the field and fieldType definitions for the field
   in question. type=text doesn't say anything about analysis,
   and that's where I'd expect you're having trouble. In particular
   if your analysis chain uses KeywordTokenizerFactory for instance.
   3 Look at the admin/schema browse page, look at your field and
   see what the actual tokens are. That'll tell you what
 TermsComponents
   is returning, perhaps the concatenation is happening somewhere
   else.
  
   Bottom line: Solr will not concatenate terms like this unless you tell
 it
   to,
   so I suspect you're telling it to, you just don't realize it G...
  
   Best
   Erick
  
   On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open openvic...@gmail.com
   wrote:
  
Dear Solr users,
   
I am currently using SolR and TermsComponents to make an auto suggest
  for
my
website.
   
I have a field called p_field indexed and stored with type=text in
  the
schema xml. Nothing out of the usual.
I feed to Solr a set of words separated by a coma and a space such as
   (for
two documents) :
   
Document 1:
word11, word12, word13. word14
   
Document

Re: Using terms and N-gram

2011-02-03 Thread openvictor Open
Thank you, I will do that and hopefuly it will be handy !

But can someone explain me difference between CommonGramFIlterFactory et
NGramFilterFactory ? ( Maybe the solution is there)

Thank you all,
best regards

2011/2/3 Grijesh pintu.grij...@gmail.com


 Use analysis.jsp to see what happening at index time and query time with
 your
 input data.You can use highlighting to see if match found.

 -
 Thanx:
 Grijesh
 http://lucidimagination.com
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using terms and N-gram

2011-02-03 Thread openvictor Open
Thank you for these inputs.

I was silly asking for ngrams because I already knew it. I think I was tired
yesterday...

Thank you Eric Erickson, once again you gave me a more than useful comment.
Indeed Shingles seems to be the perfect fit for the work I want to do. I
will try to implement that tonight and I will come back to see if it's
working.

Regards,
Victor

2011/2/3 Erick Erickson erickerick...@gmail.com

 First, you'll get a lot of insight by defining something simply and looking
 at the analysis page from solr admin. That's a very valuable page.

 To your question:
 commongrams are shingles that work between stopwords and
 other words. For instance, this is some text gets analyzed into
 this, this_is, is, is_some, some text. Note that the stopwords
 are the only things that get combined with the text after.

 NGrams form on letters. It's too long to post the whole thing, but
 the above phrase gets analyzed as
 t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
 single
 token into grams whereas commongrams essentially combines tokens
 when they're stopwords.

 Have you looked at shingles? See:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
 Best
 Erick


 On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.com
 wrote:

  Thank you, I will do that and hopefuly it will be handy !
 
  But can someone explain me difference between CommonGramFIlterFactory et
  NGramFilterFactory ? ( Maybe the solution is there)
 
  Thank you all,
  best regards
 
  2011/2/3 Grijesh pintu.grij...@gmail.com
 
  
   Use analysis.jsp to see what happening at index time and query time
 with
   your
   input data.You can use highlighting to see if match found.
  
   -
   Thanx:
   Grijesh
   http://lucidimagination.com
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 



Re: Solr for finding similar word between two documents

2011-02-03 Thread openvictor Open
Rohan : what you want to do can be done with quite little effort if your
document has a limited size (up to some Mo) with common and basic structures
like Hasmap.

Do you have any additional information on your problem so that we can give
you more useful inputs ?

2011/2/3 Gora Mohanty g...@mimirtech.com

 On Thu, Feb 3, 2011 at 11:32 PM, rohan rai hiroha...@gmail.com wrote:
  Is there a way to use solr and get similar words between two document
  (files).
 [...]

 This is *way* too vague t make any sense out of. Could you elaborate,
 as I could have sworn that what you seem to want is the essential
 function of a search engine.

 Regards,
 Gora



Re: Using terms and N-gram

2011-02-03 Thread openvictor Open
Okay so as suggested Shingle works perfectly well for what I need !
Thank you Erick

2011/2/3 openvictor Open openvic...@gmail.com

 Thank you for these inputs.

 I was silly asking for ngrams because I already knew it. I think I was
 tired yesterday...

 Thank you Eric Erickson, once again you gave me a more than useful comment.
 Indeed Shingles seems to be the perfect fit for the work I want to do. I
 will try to implement that tonight and I will come back to see if it's
 working.

 Regards,
 Victor

 2011/2/3 Erick Erickson erickerick...@gmail.com

 First, you'll get a lot of insight by defining something simply and looking
 at the analysis page from solr admin. That's a very valuable page.

 To your question:
 commongrams are shingles that work between stopwords and
 other words. For instance, this is some text gets analyzed into
 this, this_is, is, is_some, some text. Note that the stopwords
 are the only things that get combined with the text after.

 NGrams form on letters. It's too long to post the whole thing, but
 the above phrase gets analyzed as
 t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
 single
 token into grams whereas commongrams essentially combines tokens
 when they're stopwords.

 Have you looked at shingles? See:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
 Best
 Erick


 On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.com
 wrote:

  Thank you, I will do that and hopefuly it will be handy !
 
  But can someone explain me difference between CommonGramFIlterFactory et
  NGramFilterFactory ? ( Maybe the solution is there)
 
  Thank you all,
  best regards
 
  2011/2/3 Grijesh pintu.grij...@gmail.com
 
  
   Use analysis.jsp to see what happening at index time and query time
 with
   your
   input data.You can use highlighting to see if match found.
  
   -
   Thanx:
   Grijesh
   http://lucidimagination.com
   --
   View this message in context:
  
 
 http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
 





Using terms and N-gram

2011-02-02 Thread openvictor Open
Dear all,

I am trying to implement an autocomplete system for research. But I am stuck
on some problems that I can't solve.

Here is my problem :
I give text like :
the cat is black and I want to explore all 1 gram to 8 gram for all the
text that are passed :
the, cat, is, black, the cat, cat is, is black, etc...

In order to do that I have defined the following fieldtype in my schema :

!--Custom fieldtype--
fieldType name=ngram_field class=solr.TextField
  analyzer type=index
tokenizer class=solr.LowerCaseTokenizerFactory /
filter class=solr.CommonGramsFilterFactory words=stopwords.txt
ignoreCase=true maxGramSize=8
   minGramSize=1/
  /analyzer
  analyzer type=query
tokenizer class=solr.LowerCaseTokenizerFactory /
filter class=solr.CommonGramsFilterFactory ignoreCase=true
maxGramSize=8
   minGramSize=1/
  /analyzer
/fieldType


Then the following field :

field name=p_title_ngram type=ngram_field indexed=true
stored=true/

Then I feed solr with some phrases and I was really surprised to see that
Solr didn't behave as expected.
I went to the schema browser to see the result for the very profound query :
the cat is black and it rains

The results are quite deceiving : first 1 grams are not found. some 2 grams
are found like : the_cat, and_it etc... But not what I expected.
Is there something I am missing here ? (by the way I also tried to remove
the mingramsize and maxgramsize even the words).

Thank you,
Victor Kabdebon


Re: Terms and termscomponent questions

2011-02-01 Thread openvictor Open
Dear Erick,

Thank you for your answer, here is my fieldtype definition. I took the
standard one because I don't need a better one for this field

fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
/analyzer
/fieldType

Now my field :

field name=p_field type=text indexed=true stored=true/

But I have a doubt now... Do I really put a space between words or is it
just a coma... If I only put a coma then the whole process is going to be
impacted ? What I don't really understand is that I find the separate words,
but also their concatenation (but again in one direction only). Let me
explain : if a have man bear pig I will find :
manbearpig bearpig but never pigman or anyother combination in a
different order.

Thank you very much
Best Regards,
Victor

2011/2/1 Erick Erickson erickerick...@gmail.com

 Nope, this isn't what I'd expect. There are a couple of possibilities:
 1 check out what WordDelimiterFilterFactory is doing, although
 if you're really sending spaces that's probably not it.
 2 Let's see the field and fieldType definitions for the field
 in question. type=text doesn't say anything about analysis,
 and that's where I'd expect you're having trouble. In particular
 if your analysis chain uses KeywordTokenizerFactory for instance.
 3 Look at the admin/schema browse page, look at your field and
 see what the actual tokens are. That'll tell you what TermsComponents
 is returning, perhaps the concatenation is happening somewhere
 else.

 Bottom line: Solr will not concatenate terms like this unless you tell it
 to,
 so I suspect you're telling it to, you just don't realize it G...

 Best
 Erick

 On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open openvic...@gmail.com
 wrote:

  Dear Solr users,
 
  I am currently using SolR and TermsComponents to make an auto suggest for
  my
  website.
 
  I have a field called p_field indexed and stored with type=text in the
  schema xml. Nothing out of the usual.
  I feed to Solr a set of words separated by a coma and a space such as
 (for
  two documents) :
 
  Document 1:
  word11, word12, word13. word14
 
  Document 2:
  word21, word22, word23. word24
 
 
  When I use my newly designed field I get things for the prefix word1 :
  word11, word12, word13. word14 word11word12 word11word13 etc...
  Is it normal to have the concatenation of words and not only the words
  indexed ? Did I miss something about Terms ?
 
  Thank you very much,
  Best regards all,
  Victor
 



Re: Solr for noSQL

2011-02-01 Thread openvictor Open
Hi All I don't know if it answers any of your question but if you are
interested by that check out :

Lucandra ( Cassandra + Lucene)



2011/2/1 Steven Noels stev...@outerthought.org

 On Tue, Feb 1, 2011 at 11:52 AM, Upayavira u...@odoko.co.uk wrote:


 
  Apologies if my nothing funky sounded like you weren't doing cool
  stuff.


 No offense whatsoever. I think my longer reply paints a more accurate light
 on what Lily means in terms of SOLR for NoSQL, and it was your reaction
 who triggered this additional explanation.


  I was merely attempting to say that I very much doubt you were
  doing anything funky like putting HBase underneath Solr as a replacement
  of FSDirectory.


 There are some initiatives in the context of Cassandra IIRC, as well as a
 project which stores Lucene index files in HBase tables, but frankly they
 seem more experimentation, and also I think the nature of how Lucene/SOLR
 works + what HBase does on top of Hadoop FS somehow is in conflict with
 each
 other. Too many layers of indirection will kill performance on every layer.



  I was trying to imply that, likely your integration with
  Solr was relatively conventional (interacting with its REST interface),
 


 Yep. We figured that was the wiser road to walk, and leaves a clear-defined
 interface and possible area of improvement against a too-low level of
 integration.


  and the funky stuff that you are doing sits outside of that space.
 
  Hope that's a clearer (and more accurate?) attempt at what I was trying
  to say.
 
  Upayavira (who finds the Lily project interesting, and would love to
  find the time to play with it)
 

 Anytime, Upayavira. Anytime! ;-)

 Steven.
 --
 Steven Noels
 http://outerthought.org/
 Scalable Smart Data
 Makers of Kauri, Daisy CMS and Lily



Terms and termscomponent questions

2011-01-31 Thread openvictor Open
Dear Solr users,

I am currently using SolR and TermsComponents to make an auto suggest for my
website.

I have a field called p_field indexed and stored with type=text in the
schema xml. Nothing out of the usual.
I feed to Solr a set of words separated by a coma and a space such as (for
two documents) :

Document 1:
word11, word12, word13. word14

Document 2:
word21, word22, word23. word24


When I use my newly designed field I get things for the prefix word1 :
word11, word12, word13. word14 word11word12 word11word13 etc...
Is it normal to have the concatenation of words and not only the words
indexed ? Did I miss something about Terms ?

Thank you very much,
Best regards all,
Victor