Re: Best practice: Autosuggest/autocomplete vs. "real search"

2014-11-10 Thread Jorge Luis Betancourt Gonzalez
It wouldn’t be easy if in the site you’ll ensure that only terms are submitted 
to the actual search? In app I worked some time ago the default behavior of the 
Javascript component used for autocompletion was to first autocomplete the term 
in the input and then submit the query against the backend. I know this is not 
what you’ve asked for but could work? I’m just firing a bullet in the air here! 
:-)

On Nov 10, 2014, at 8:37 AM, Michael Sokolov  
wrote:

> The goal is to ensure that suggestions from autocomplete are actually terms 
> in the main index, so that the suggestions will actually result in matches.  
> You've considered expanding the main index by adding the suggestion n-grams 
> to it, but it would probably be better to alter your suggester so that it 
> produces only tokens that are in the main index.  I think this is basically 
> how all the Suggester implementations are designed to work already; are you 
> using one of those, or are you using the TermsComponent, or something else?
> 
> -Mike
> 
> On 11/10/14 2:54 AM, Thomas Michael Engelke wrote:
>>  
>>  We're using Solr as a backend for an ECommerce site/system. The Solr
>> index stores products with selected attributes, as well as a dedicated
>> field for autocomplete suggestions (Done via AJAX request when typing in
>> the search box without pressing return).
>> 
>> The autosuggest field is supplied by copyField directives from certain
>> select product attribute fields (description and/or name mostly). It
>> uses EdgeNGramFilterFactory to complete words not yet typed completely,
>> and it works quite well.
>> 
>> However, we come across an issue with a disconnect between the
>> autosuggest results and results of a "normal search", that is, a query
>> over the full fields of the product. Let's say there are products that
>> are called "motor".
>> 
>> - When autosuggesting, typing "mot" autosuggests all products with
>> "motor", because the EdgeNGram created "m", "mo", "mot", "moto" and
>> "motor", respectively, and it matches.
>> - When searching for "mot", however (i.e. pressing enter when seeing the
>> autosuggestions), it doesn't find any products. The autosuggest field is
>> not part of the "real" search, and no product attribute contains "mot"
>> as a word.
>> 
>> One obvious solution would be to incorporate the "autosuggest" field
>> into the "real" search, however, this adds many tokens to the index that
>> aren't really part of the products indexed and makes for strange search
>> results, for example when an NGram is also a word, but the record itself
>> does contain the search term only as part of a word.
>> 
>> Are there clever solutions to this problem?
> 



Re: Search for partial name in Solr 4.x

2014-11-09 Thread Jorge Luis Betancourt Gonzalez
The whole idea behind Solr is to solve the problem that you just explain, in 
particular what you need is to define the title field as a solr.TextField and 
then define a tokenizer. The tokenizer essentially will transform the initial 
text into tokens. Solr has several tokenizers, each which its special 
characteristics, nevertheless one of the must commons is the StandardTokenizer, 
but again your choice will be influenced by how do you want to “divide” your 
initial text into “parts” or tokens. Basically when you fire a query against 
Solr (put it in simple words) will match the tokens of your query to the tokens 
stored in each of your documents, and the will output a list of matching 
documents.

One simple example of a fieldType you could use is:








In this case the tokenizer will split the initial text into the tokens, and 
then each token will be lowercased so when you query you wouldn’t have to worry 
about the capitalization of the terms.

Hope it helps

On Nov 9, 2014, at 3:26 PM, PeriS  wrote:

> I was wondering if there is a way to search on partial names? Ex; Field is a 
> string and stores values like titles of a book; When searching part of the 
> title may be supplied; How do I resolve this? Please let me know
> 
> 
> Thanks
> -PeriS
> 
> 
> 
> 
> 
> 
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
> recipient, please delete without copying and kindly advise us by e-mail of 
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
> Services to any order or other contract unless pursuant to explicit written 
> agreement or government initiative expressly permitting the use of e-mail for 
> such purpose.
> 
> 



Re: How to choose only one best hit from several ones ?

2014-11-09 Thread Jorge Luis Betancourt Gonzalez
How would you measure which snippet is the best? 

On Nov 9, 2014, at 1:59 PM, SolrUser1543  wrote:

> Lets say that for some query there are several results , with several hits
> for each one , which shown in hightligth section of the response.
> 
> Is it possible to select only one best hit for every result ? there are
> hl.snippets parameter which controls number of snippets . hl.snippets=1 ,
> will show the fisrt one , but not certenly the best one . 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-choose-only-one-best-hit-from-several-ones-tp4168416.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: exporting to CSV with solrj

2014-10-31 Thread Jorge Luis Betancourt Gonzalez
When you fire a query against Solr with the wt=csv the response coming from 
Solr is *already* in CSV, the CSVResponseWriter is responsible for translating 
SolrDocument instances into a CSV on the server side, son I don’t see any 
reason on using it by your self, Solr already do the heavy lifting for you.

Regards,

On Oct 31, 2014, at 10:44 AM, tedsolr  wrote:

> I am trying to invoke the CSVResponseWriter to create a CSV file of all
> stored fields. There are millions of documents so I need to write to the
> file iteratively. I saw a snippet of code online that claimed it could
> effectively remove the SorDocumentList wrapper and allow the docs to be
> retrieved in the actual format requested in the query. However, I get a null
> pointer from the CSVResponseWriter.write() method.
> 
> SolrQuery qry = new SolrQuery("*:*");
> qry.setParam("wt", "csv");
> // set other params
> SolrServer server = getSolrServer();
> try {
>   QueryResponse res = server.query(qry);
> 
>   CSVResponseWriter writer = new CSVResponseWriter();
>   Writer w = new StringWriter();
> SolrQueryResponse solrResponse = new SolrQueryResponse();
>   solrResponse.setAllValues(res.getResponse());
>try {
> SolrParams list = new MapSolrParams(new HashMap String>());
> writer.write(w, new LocalSolrQueryRequest(null, list), 
> solrResponse);
>} catch (IOException e) {
>throw new RuntimeException(e);
>}
>System.out.print(w.toString());
> 
> } catch (SolrServerException e) {
>   e.printStackTrace();
> }
> 
> NPE snippet:
> org.apache.solr.response.CSVWriter.writeResponse(CSVResponseWriter.java:281)
> org.apache.solr.response.CSVResponseWriter.write(CSVResponseWriter.java:56)
> 
> Am I on the right track with the approach? I really don't want to roll my
> own document to CSV line convertor. Thanks!
> Solr 4.9
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/exporting-to-CSV-with-solrj-tp4166845.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best way to index wordpress blogs in solr

2014-10-07 Thread Jorge Luis Betancourt Gonzalez
If you’re talking about a generic web crawl you could use something like Nutch 
[1] keep in mind that his a full web crawler and it does a pretty good job. 
I’ve been using it for over more than 2 years now and I’m very happy, although 
I don’t crawl just a couple of sites but a more wide spectrum (think a country 
web scale). But with Nutch you just have to configure a couple of options in an 
xml file and it will crawl the web and index the content into Solr.

Regards,

[1] http://nutch.apache.org 

On Oct 7, 2014, at 4:53 PM, Vishal Sharma  wrote:

> Makes sense.
> 
> I'll just dive in now. Thanks so much.
> 
> *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> E: vish...@grazitti.com
> www.grazitti.com [image: Description: LinkedIn]
> [image: Description:
> Twitter] [image: fbook]
> *dreamforce®*Oct 13-16,
> 2014 *Meet
> us at the Cloud Expo*
> Booth N2341 Moscone North,
> San Francisco
> Schedule a Meeting
> 
>   |   Follow us ZakCalendar
> Dreamforce® Featured
> App
> 
> 
> 
> 
> 
> 
> 
> On Tue, Oct 7, 2014 at 1:44 PM, Alexandre Rafalovitch 
> wrote:
> 
>> I am pretty sure Swift is not Solr. That's why I was asking whether
>> you were starting from scratch.
>> 
>> As to the other items, please re-read my original response. Solr has
>> an example reading in RSS feeds, you could probably use that. Or a
>> generic XML using DataImportHandler's mapping. Or directly from
>> database, again with DIH.
>> 
>> Basically, it sounds totally doable. So, it's hard to advise anything
>> specific beyond "go, do it" and wait for you to come back with a lot
>> more specific issue once you get going. Most of the issues will be
>> related to your schema and your WordPress configuration, so no
>> abstract advice is available.
>> 
>> Regards,
>>Alex.
>> 
>> On 7 October 2014 16:36, Vishal Sharma  wrote:
>>> Hey Alex,
>>> 
>>> Thanks for the prompt response.
>>> 
>>> Here is what I am trying to solve: I am showing search results from
>> content
>>> coming from 3 different places on a single site. And, I have done that by
>>> pumping all this content to Solr server running on single flat schema by
>>> using different APIs of these platforms. Now, I need to index blog posts
>>> written in word press also. I was wondering if there is any solution
>>> already availablw which can help me crawl and pump this posst to my
>> running
>>> solr instance. Otherwise I might have to write few more scripts to do
>> that.
>>> 
>>> BTW, Is Swift using Solr on the backend? Because I thought its a paid
>>> enterprise solution.
>>> 
>> 

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: SOLR query - restrict access to user documents

2014-10-07 Thread Jorge Luis Betancourt Gonzalez
I see you’re defining a default value for “rows” this could be overridden on 
the request, and requesting a lot of documents from solr can stress out your 
server/cluster, of course if the client in question has that many documents. if 
this is a fixed value and the clients can’t request more documents, then I’ll 
consider moving this into the invariants section ensuring that no matter what 
this value can’t be changed by the request. Some time ago I had a similar use 
case, we wanted to expose Solr to the clients and eventually we faced problems 
where some clients requested “all of his documents” in one request stressing 
out our cluster in the end we wrote a custom SearchComponent to set max values 
(instead of a fixed value specified on invariants) for the rows and start 
parameters (actually this component those a little more as we add some 
limitations to each type of client, defining some constrains as how many 
documents. i.e. data points can be requested, etc.).

Hope it helps, 

On Oct 7, 2014, at 11:37 AM, Nitin Agarwal <2nitinagar...@gmail.com> wrote:

> Hi, I have a question around SOLR query, I am trying to restrict access to
> SOLR data.
> 
> We are running SOLR 4.7.1, and wish to expose the query capabilities to our
> customers for the data that belongs to them. Specifically "/select", with
> default configuration is the only Request Handler that customers can
> access.
> 
> 
> 
>   explicit
>   10
>   text
> 
> 
> 
> The custom API that fronts SOLR, will inject appropriate restriction
> into the "q" param e.g. q=customerNumber:123 or
> append to "q" param q= AND customerNumber:123, before
> sending the request to the "/select" handler.
> 
> This works fine, however,
> 
> I want to know if there is a way customer can override these restrictions?
> 
> If so what can I do to prevent that?
> 
> So far I have come across facet.mincount as one potential concern
> where by customer can see data that they should not, e.g.
> 
> /select?q= AND
> customerNumber:123&facet=true&facet.field=customerName&rows=0&*facet.mincount=0*
> 
> will return those customer names as well that do not belong to
> customerNumber 123.
> 
> Are there any other gotchas that I should know?
> 
> Thanks for your time and help,
> 
> Nitin

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Changed behavior in solr 4 ??

2014-09-30 Thread Jorge Luis Betancourt Gonzalez
Don’t worry, the way Hoss explained its indeed the way I’ve know that works, 
but the example provided in the book pick my curiosity and hence the question 
in this thread.

Regards,

On Sep 30, 2014, at 5:59 PM, Timothy Potter  wrote:

> Indeed - Hoss is correct ... it's a problem with the example in the
> book ... my apologies for the confusion!
> 
> On Tue, Sep 30, 2014 at 3:57 PM, Chris Hostetter
>  wrote:
>> 
>> : Thanks for the response, yes the way you describe I know it works and is
>> : how I get it to work but then what does mean the snippet of the
>> : documentation I see on the documentation about overriding the default
>> 
>> It means that there is implicitly a set of search components that have
>> default behavior, and there is an implicit list of component *names*
>> used by default by SearchHandler -- and if you override one of those
>> implicit searchComponent instances by declaring your own with the same
>> name, then it will be used by default in SerachHandler.
>> 
>> a very concrete example of this is HighlightComponent -- if you have no
>> HighlightComponent declared in your solrconfig.xml, then an implicit
>> instance exists with the name "highlight"  and SearchHandler by default
>> includes that component.
>> 
>> If you want to declare your own HighlightComponent instance with special
>> initialization logic, you can either declare it with it's own unique name,
>> and edit the "components" list on a SerachHandler declatarion to include
>> that name, or you can just name it "highlight" and it will override the
>> default instance -- this is in fact done in the example solrconfig.xml
>> (grep for "HighlightComponent")
>> 
>> : components shipped with Solr? Even on the book Solr in Action in chapter
>> : 7 listing 7.3 I saw something similar to what I wanted to do:
>> :
>> : 
>> :   
>>...
>> 
>> That appears to be a mistake in Solr in Action ... the QueryComponent
>> class does nothing with it's "init" params (the nested XML inside the
>> searchComponent declaration) so that syntax does nothing.
>> 
>> 
>> 
>> -Hoss
>> http://www.lucidworks.com/

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: (auto)suggestions, but ony from a "filtered" set of documents

2014-09-26 Thread Jorge Luis Betancourt Gonzalez
Perhaps instead of the suggester component you could use the EdgeNGramFilter 
and provide partial matches so you will me able to configure a custom request 
handler that will “suggest” terms of phrases for you. I’m using this approach 
to provide queries suggestions, of course I’m indexing the queries into a 
separated core. 

Greetings,

On Sep 26, 2014, at 8:49 AM, Clemens Wyss DEV  wrote:

> Either my intention is dumb (pls let me know ;)), or there is no answer to 
> this problem. If so, I will have to index my sources into separate cores. 
> But then the questions arise:
> a) how do I get suggestions from more than one core? Multiple 
> suggest-requests, then merge?
> b) how doe I get (ranked) results from more than one core?
> In Lucene I was able to use a MultiIndexReader (one IndexReaders per index)
> 
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
> Gesendet: Donnerstag, 25. September 2014 10:24
> An: solr-user@lucene.apache.org
> Betreff: (auto)suggestions, but ony from a "filtered" set of documents
> 
> What I'd like to do is
> http://localhost:8983/solr/solrpedia/suggest?q=atm&qf=source:
> 
> Through qf (or however the parameter shall be called) I'd like to restrict 
> the suggestions to documents which fit the given qf-query. 
> I need this filter if (as posted in a previous thread) I intend to put 
> "different kind of data" into one core/collection, cause suggestion shall be 
> restrictable to one or many source(s)

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Changed behavior in solr 4 ??

2014-09-25 Thread Jorge Luis Betancourt Gonzalez
I haven’t used it before this, basically I found out about this in the Solr in 
Action book and guided by the comment about redefining the default components 
by defining a new searchComponent with the same name. 

Any how thanks for your reply! 

Regards,

On Sep 25, 2014, at 8:01 AM, Jack Krupansky  wrote:

> I am not aware of any such feature! That doesn't mean it doesn't exist, but I 
> don't recall seeing it in the Solr source code.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Jorge Luis Betancourt Gonzalez
> Sent: Wednesday, September 24, 2014 1:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Changed behavior in solr 4 ??
> 
> Hi Jack:
> 
> Thanks for the response, yes the way you describe I know it works and is how 
> I get it to work but then what does mean the snippet of the documentation I 
> see on the documentation about overriding the default components shipped with 
> Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw 
> something similar to what I wanted to do:
> 
> 
> 
>   25
>   content_field
> 
> 
>   *:*
>   true
>   explicit
> 
> 
> Because each default search component exists by default even if it’s not 
> defined explicitly in the solrconfig.xml file, defining them explicitly as in 
> the previous listing will replace the default configuration.
> 
> The previous snippet is from the quoted book Solr in Action, I understand 
> that in each SearchHandler I could define this parameters bu if defined in 
> the searchComponent (as the book says) this configuration wouldn’t apply to 
> all my request handlers? eliminating the need to replicate the same parameter 
> in several parts of my solrconfig.xml (i.e all the request handlers)?
> 
> 
> Regards,
> On Sep 23, 2014, at 11:53 PM, Jack Krupansky  wrote:
> 
> 
>> You set the defaults on the "search handler", not the "search component". 
>> See solrconfig.xml:
>> 
>> 
>> 
>> 
>>   explicit
>>   10
>>   text
>> 
>> ...
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: Jorge Luis Betancourt Gonzalez
>> Sent: Tuesday, September 23, 2014 11:02 AM
>> To: solr-user@lucene.apache.org
>> Subject: Changed behavior in solr 4 ??
>> 
>> Hi:
>> 
>> I’m trying to change the default configuration for the query component of a 
>> SearchHandler, basically I want to set a default value to the rows 
>> parameters and that this value be shared by all my SearchHandlers, as stated 
>> on the solrconfig.xml comments, this could be accomplished redeclaring the 
>> query search component, however this is not working on solr 4.9.0 which is 
>> the version I’m using, this is my configuration:
>> 
>>  
>>  
>>  1
>>  
>>  
>> 
>> The relevant portion of the solrconfig.xml comment is: "If you register a 
>> searchComponent to one of the standard names,  will be used instead of the 
>> default.” so is this a new desired behavior?? although just for testing a 
>> redefined the components of the request handler to only use the query 
>> component and not to use all the default components, this is how it looks:
>> 
>> 
>>  query
>> 
>> 
>> 
>> Everything works ok but the the rows parameter is not used, although I’m not 
>> specifying the rows parameter on the URL.
>> 
>> Regards,Concurso "Mi selfie por los 5". Detalles en 
>> http://justiciaparaloscinco.wordpress.com
> 
> 
> Concurso "Mi selfie por los 5". Detalles en 
> http://justiciaparaloscinco.wordpress.com
> 

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Spellchecking and suggesting part numbers

2014-09-24 Thread Jorge Luis Betancourt Gonzalez
I’ve done something similar to this using the the EdgeNGram not the 
spellchecker component, I don’t know if this is along with your requirements:

The relevant portion of my fieldType config:



 class="solr.SpellCheckComponent">
>   
>   solr.IndexBasedSpellChecker
>   ./spellchecker
>   did_you_mean_part
>   
>   
>startup="lazy">
>   
>   did_you_mean_part
>   on
>   
>   
>   spellcheck_part
>   
>   
> 
> 
>positionIncrementGap="100">
>   
>class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/>
>   
>   
>minGramSize="1" maxGramSize="20" side="front"/>
>class="solr.RemoveDuplicatesTokenFilterFactory"/>
>   
>   
>class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement=""/>
>   
>   
>minGramSize="1" maxGramSize="20" side="front"/>
>   
>   
> 
> Can we tweak the setup such that we should get more relevant part numbers?
> 
> Thanks,
> Alexander

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Changed behavior in solr 4 ??

2014-09-23 Thread Jorge Luis Betancourt Gonzalez
Hi Jack:

Thanks for the response, yes the way you describe I know it works and is how I 
get it to work but then what does mean the snippet of the documentation I see 
on the documentation about overriding the default components shipped with Solr? 
Even on the book Solr in Action in chapter 7 listing 7.3 I saw something 
similar to what I wanted to do:


  
25
content_field
  
  
*:*
true
explicit
  

Because each default search component exists by default even if it’s not 
defined explicitly in the solrconfig.xml file, defining them explicitly as in 
the previous listing will replace the default configuration.

The previous snippet is from the quoted book Solr in Action, I understand that 
in each SearchHandler I could define this parameters bu if defined in the 
searchComponent (as the book says) this configuration wouldn’t apply to all my 
request handlers? eliminating the need to replicate the same parameter in 
several parts of my solrconfig.xml (i.e all the request handlers)?


Regards,
On Sep 23, 2014, at 11:53 PM, Jack Krupansky  wrote:


> You set the defaults on the "search handler", not the "search component". See 
> solrconfig.xml:
> 
> 
> 
>  
>explicit
>10
>text
>  
> ...
> 
> -- Jack Krupansky
> 
> -Original Message- From: Jorge Luis Betancourt Gonzalez
> Sent: Tuesday, September 23, 2014 11:02 AM
> To: solr-user@lucene.apache.org
> Subject: Changed behavior in solr 4 ??
> 
> Hi:
> 
> I’m trying to change the default configuration for the query component of a 
> SearchHandler, basically I want to set a default value to the rows parameters 
> and that this value be shared by all my SearchHandlers, as stated on the 
> solrconfig.xml comments, this could be accomplished redeclaring the query 
> search component, however this is not working on solr 4.9.0 which is the 
> version I’m using, this is my configuration:
> 
>   
>   
>   1
>   
>   
> 
> The relevant portion of the solrconfig.xml comment is: "If you register a 
> searchComponent to one of the standard names,  will be used instead of the 
> default.” so is this a new desired behavior?? although just for testing a 
> redefined the components of the request handler to only use the query 
> component and not to use all the default components, this is how it looks:
> 
> 
>   query
> 
> 
> 
> Everything works ok but the the rows parameter is not used, although I’m not 
> specifying the rows parameter on the URL.
> 
> Regards,Concurso "Mi selfie por los 5". Detalles en 
> http://justiciaparaloscinco.wordpress.com 


Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com



Changed behavior in solr 4 ??

2014-09-23 Thread Jorge Luis Betancourt Gonzalez
Hi:

I’m trying to change the default configuration for the query component of a 
SearchHandler, basically I want to set a default value to the rows parameters 
and that this value be shared by all my SearchHandlers, as stated on the 
solrconfig.xml comments, this could be accomplished redeclaring the query 
search component, however this is not working on solr 4.9.0 which is the 
version I’m using, this is my configuration:



1



The relevant portion of the solrconfig.xml comment is: "If you register a 
searchComponent to one of the standard names,  will be used instead of the 
default.” so is this a new desired behavior?? although just for testing a 
redefined the components of the request handler to only use the query component 
and not to use all the default components, this is how it looks:


query



Everything works ok but the the rows parameter is not used, although I’m not 
specifying the rows parameter on the URL.

Regards,Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Solr(j) API for manipulating the schema(.xml)?

2014-09-20 Thread Jorge Luis Betancourt Gonzalez
Basically you could create a bunch of dynamic fields (according to your needs) 
so basically creating a dynamic field for each type of data (and several 
combinations) and then you can create a small wrapper around Solrj that will 
wrap the patterns defined on your schema.xml in a more understandable way. Like 
this you will be able to abstract the manipulation of the schema.xml file and 
only introduce it when is really needed i.e a new field type with new 
analyzers, etc. 

On Sep 18, 2014, at 3:16 AM, Clemens Wyss DEV  wrote:

> as our framework so far only knows a few field types "dynamic field"s may be 
> the way to go... And if there are new fieldtypes the new schema can be 
> distributed through ZooKeeper
> 
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com] 
> Gesendet: Mittwoch, 17. September 2014 19:56
> An: solr-user@lucene.apache.org
> Betreff: Re: Solr(j) API for manipulating the schema(.xml)?
> 
> Right, you can create new cores over the rest api.
> 
> As far as changing the schema, there's no good way to do that that I know of 
> programmatically. In the SolrCloud world, you can upload the schema to 
> ZooKeeper and have it automatically distributed to all the nodes though.
> 
> Best,
> Erick
> 
> On Wed, Sep 17, 2014 at 2:28 AM, Clemens Wyss DEV  
> wrote:
>> Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? 
>> Through SolrJ?
>> 
>> Context:
>> We already have a generic indexing/searching framework (based on lucene) 
>> where any component can act as a so called IndexDataPorvider. This provider 
>> delivers the field-types and also the entities to be (converted into 
>> documents and then) indexed. Each of these IndexProviders has ist own lucene 
>> index.
>> So we kind of have the information for the Solr schema.xml.
>> 
>> Hope the intention is clear. And yes the manipulation of the schema.xml is 
>> basically only needed when the field types change. Thats why I am looking 
>> for a way to consolidate the schema.xml (upon boot, initialization oft he 
>> IndexDataProviders ...).
>> In 99,999% it won't change, But I'd like to keep the possibility of an 
>> IndexDataProvider to hand in "its schema".
>> 
>> Also, again driven by the dynamic nature of our framework, can I easily 
>> create new cores over Sorj or the Solr-REST API ?

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: How to exclude a mimetype in tika?

2014-09-19 Thread Jorge Luis Betancourt Gonzalez
Which crawler are you using?

On Sep 18, 2014, at 10:14 AM, keeblerh  wrote:

> eShard wrote
>> Good afternoon,
>> I'm using solr 4.0 Final
>> I need movies "hidden" in zip files that need to be excluded from the
>> index.
>> I can't filter movies on the crawler because then I would have to exclude
>> all zip files.
>> I was told I can have tika skip the movies.
>> the details are escaping me at this point.
>> How do I exclude a file in the tika configuration?
>> I assume it's something I add in the update/extract handler but I'm not
>> sure.
>> 
>> Thanks,
> 
> I am having the same issue.  I need to exlcude some mime types from the zip
> files and using SOLR 4.8.  Did you ever get an answer to this?  THanks.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168p4159676.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Reading files in default Conf dir

2014-09-10 Thread Jorge Luis Betancourt Gonzalez
What are you developing a custom search component? update processor? a 
different class for one of the zillion moving parts of Solr? 

If you have access to a SolrCore instance you could use it to get access of, 
essentially using the SolrCore instance specific to the current core will cause 
the lookup of the file to be local to the conf directory of the specified core. 
In a custom UpdateRequestProcessorFactory which implements the SolrCoreAware 
interface I’ve the following code:

   @Override
   public void inform(SolrCore solrCore) {
   SolrResourceLoader loader = solrCore.getResourceLoader();

   try {
   List lines = loader.getLines(patternFile);

   if (false == lines.isEmpty()) {
   for (String s : lines) {
   this.patterns.add(Pattern.compile(s));
   }
   }
   } catch (IOException e) {
   SolrCore.log.error(String.format("File %s could not be loaded", 
patternFile));
   }

Essentially I ask the actually core (solrCore) to provide a SolrResourceLoader 
for it’s conf file, in your case you are just passing it null, which is causing 
(I think, haven’t tested) to instantiate a SolrResourceLoader of the Solr 
instance (judging for the paths you’ve placed in your mail) instead of a 
SolrResourceLoader relative to your core/collection that is what you want. 

So, bottom line implement the SolrCoreAware interface and use the 
SolrResourceLoader provided by this instance, and a little more info could be 
helpful as we can’t figure what Solr “part” are you developing.

Regards,

On Sep 9, 2014, at 2:37 PM, Ramana OpenSource  
wrote:

> Hi,
> 
> I am trying to load one of the file in conf directory in SOLR, using below
> code.
> 
> return new HashSet(new
> SolrResourceLoader(null).getLines("stopwords.txt"));
> 
> The "stopwords.txt" file is available in the location
> "solr\example\solr\collection1\conf".
> 
> When i debugged the SolrResourceLoader API, It is looking at the below
> locations to load the file:
> 
> ...solr\example\solr\conf\stopwords.txt
> ...solr\example\stopwords.txt
> 
> But as the file was not there in any of above location...it failed.
> 
> How to load the files in the default conf directory using
> SolrResourceLoader API ?
> 
> I am newbie to SOLR. Any help would be appreciated.
> 
> Thanks,
> Ramana.

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: How to implement multilingual word components fields schema?

2014-09-08 Thread Jorge Luis Betancourt Gonzalez
In one of the talks by Trey Grainger (author of Solr in Action) it touches how 
on CareerBuilder are dealing with multilingual with payloads, its a little more 
of work but I think it would payoff. 

On Sep 8, 2014, at 7:58 AM, Jack Krupansky  wrote:

> You also need to take a stance as to whether you wish to auto-detect the 
> language at query time vs. have a UI selection of language vs. attempt to 
> perform the same query for each available language and then "determine" which 
> has the best "relevancy". The latter two options are very sensitive to short 
> queries. Keep in mind that auto-detection for indexing full documents is a 
> different problem that auto-detection for very short queries.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Ilia Sretenskii
> Sent: Sunday, September 7, 2014 10:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to implement multilingual word components fields schema?
> 
> Thank you for the replies, guys!
> 
> Using field-per-language approach for multilingual content is the last
> thing I would try since my actual task is to implement a search
> functionality which would implement relatively the same possibilities for
> every known world language.
> The closest references are those popular web search engines, they seem to
> serve worldwide users with their different languages and even
> cross-language queries as well.
> Thus, a field-per-language approach would be a sure waste of storage
> resources due to the high number of duplicates, since there are over 200
> known languages.
> I really would like to keep single field for cross-language searchable text
> content, witout splitting it into specific language fields or specific
> language cores.
> 
> So my current choice will be to stay with just the ICUTokenizer and
> ICUFoldingFilter as they are without any language specific
> stemmers/lemmatizers yet at all.
> 
> Probably I will put the most popular languages stop words filters and
> stemmers into the same one searchable text field to give it a try and see
> if it works correctly in a stack.
> Does specific language related filters stacking work correctly in one field?
> 
> Further development will most likely involve some advanced custom analyzers
> like the "SimplePolyGlotStemmingTokenFilter" to utilize the ICU generated
> ScriptAttribute.
> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/100236
> https://github.com/whateverdood/cross-lingual-search/blob/master/src/main/java/org/apache/lucene/sandbox/analysis/polyglot/SimplePolyGlotStemmingTokenFilter.java
> 
> So I would like to know more about those "academic papers on this issue of
> how best to deal with mixed language/mixed script queries and documents".
> Tom, could you please share them? 

Concurso "Mi selfie por los 5". Detalles en 
http://justiciaparaloscinco.wordpress.com


Re: Strategies for effective prefix queries?

2014-07-16 Thread Jorge Luis Betancourt Gonzalez
Perhaps what you’re trying to do could be addressed by using the 
EdgeNGramFilterFactory filter? For query suggestions I’m using a very similar 
approach, this is an extract of the configuration I’m using:







Basically this allows you to get partial matches from any part of the string, 
let’s say the field get’s this content at index time: "A brown fox”, this 
document will be matched by the query (“bro”) for instance. My personal 
recommendation is to use this in a separated field that get’s populated through 
a copyField, this way you could apply different boosts.

Greetings,

On Jul 16, 2014, at 2:00 PM, Hayden Muhl  wrote:

> A copy field does not address my problem, and this has nothing to do with
> stored fields. This is a query parsing problem, not an indexing problem.
> 
> Here's the use case.
> 
> If someone has a username like "bob-smith", I would like it to match
> prefixes of "bo" and "sm". I tokenize the username into the tokens "bob"
> and "smith". Everything is fine so far.
> 
> If someone enters "bo sm" as a search string, I would like "bob-smith" to
> be one of the results. The query to do this is straight forward,
> "username:bo* username:sm*". Here's the problem. In order to construct that
> query, I have to tokenize the search string "bo sm" **on the client**. I
> don't want to reimplement tokenization on the client. Is there any way to
> give Solr the string "bo sm", have Solr do the tokenization, then treat
> each token like a prefix?
> 
> 
> On Tue, Jul 15, 2014 at 4:55 PM, Alexandre Rafalovitch 
> wrote:
> 
>> So copyField it to another and apply alternative processing there. Use
>> eDismax to search both. No need to store the copied field, just index it.
>> 
>> Regards,
>> Alex
>> On 16/07/2014 2:46 am, "Hayden Muhl"  wrote:
>> 
>>> Both fields? There is only one field here: username.
>>> 
>>> 
>>> On Mon, Jul 14, 2014 at 6:17 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com
 
>>> wrote:
>>> 
 Search against both fields (one split, one not split)? Keep original
 and tokenized form? I am doing something similar with class name
 autocompletes here:
 
 
>>> 
>> https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24
 
 Regards,
   Alex.
 Personal: http://www.outerthoughts.com/ and @arafalov
 Solr resources: http://www.solr-start.com/ and @solrstart
 Solr popularizers community:
>> https://www.linkedin.com/groups?gid=6713853
 
 
 On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl 
>>> wrote:
> I'm working on using Solr for autocompleting usernames. I'm running
>>> into
 a
> problem with the wildcard queries (e.g. username:al*).
> 
> We are tokenizing usernames so that a username like "solr-user" will
>> be
> tokenized into "solr" and "user", and will match both "sol" and "use"
> prefixes. The problem is when we get "solr-u" as a prefix, I'm having
>>> to
> split that up on the client side before I construct a query
 "username:solr*
> username:u*". I'm basically using a regex as a poor man's tokenizer.
> 
> Is there a better way to approach this? Is there a way to tell Solr
>> to
> tokenize a string and use the parts as prefixes?
> 
> - Hayden
 
>>> 
>> 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Solr 4.x and master-slave schema

2014-07-10 Thread Jorge Luis Betancourt Gonzalez
Hi all:

We have a small installation of Solr 3.6 in our hands, right now we have 3 
physical servers (1 master and 2 slaves) the ingestion process it’s done in the 
master which replicates by solr internal mechanism into the slaves, which 
handles all the queries. We are trying to update to Solr 4.x, eventually we 
would like to migrate into SolrCloud, my question essentially is if we migrate 
our Solr 3.6 nodes into Solr 4.9 and keep the same master-slave schema, how 
hard it would be to migrate afterwards to SorlCloud.

Greetings,VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 
de julio de 2014. Ver www.uci.cu


Re: Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
Basically this is for analytical purposes, essentially we want to help people 
(which sites we’ve indexed in our app) to find out for which particular terms 
(in theory related with their domain) they are bad positioned in our index. 
Initially we’re starting with this basic “position per term” but the idea is to 
elaborate further in this direction.

This logic por position finding could be abstracted effectively in a plugin 
inside Solr? I guess it would be more efficient to iterate (or fire the 2 
queries) from within solr itself than in our app (written in PHP, so not so 
fast for some things) speeding up things?

Regards,

On Jun 24, 2014, at 1:42 AM, Aman Tandon  wrote:

> Jorge, i don't think that solr provide this functionality, you have to
> iterate and solr is very fast in this, you can create a script for that
> which search for pattern(term) and parse(request) the records until get the
> record of that desired url, i don't thing 1/3 seconds time to find out is
> more.
> 
> As per the search result analysis, there are very few people who request
> for the second page for their query, otherwise mostly leave the search or
> modify query string. So i better suggest you that the if the website has
> the appropriate and good data it should come on first page, so its better
> to come on first page rather than finding the position.
> 
> With Regards
> Aman Tandon
> 
> 
> On Tue, Jun 24, 2014 at 10:35 AM, Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
> 
>> Yes, but I’m looking for the position of the url field of interest in the
>> response of solr. Solr matches the terms against the collection of
>> documents and returns sorted list by score, what I’m trying to do is get
>> the position of the a specific id in this sorted response. The response
>> could be something like position: 5, or position 500. To do this manually
>> suppose the response consists of a very large amount of documents
>> (webpages) in this case I would need to iterate over the complete response
>> to find the position, which in a worst case scenario could be in the last
>> page for instance. For this particular use case I’m not so interested in
>> the URL field per se but more on the position a certain url has in the full
>> solr response.
>> 
>> On Jun 24, 2014, at 12:31 AM, Walter Underwood 
>> wrote:
>> 
>>> Solr is designed to do exactly this very, very fast. So there isn't a
>> faster way to do it. But you only need to fetch the URL field. You can
>> ignore everything else.
>>> 
>>> wunder
>>> 
>>> On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez <
>> jlbetanco...@uci.cu> wrote:
>>> 
>>>> Basically given a few search terms (query) the idea is to know given
>> one or more terms in which position your website is located for those
>> specific terms.
>>>> 
>>>> On Jun 24, 2014, at 12:12 AM, Aman Tandon 
>> wrote:
>>>> 
>>>>> What kind of search criteria, could you please explain
>>>>> 
>>>>> With Regards
>>>>> Aman Tandon
>>>>> 
>>>>> 
>>>>> On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez <
>>>>> jlbetanco...@uci.cu> wrote:
>>>>> 
>>>>>> I’m using Solr for an analytic use case, one of the requirements is
>>>>>> basically given a search query get the position of the first hit. I’m
>>>>>> indexing web pages, so given a search criteria the client want’s to
>> know
>>>>>> the position (first occurrence) of his webpage in the result set (if
>> it
>>>>>> appears at all). Is any way of getting this position without
>> iterating and
>>>>>> manually checking the solr response?
>>>>>> 
>>>>>> Greetings,
>>>>>> 
>>>>>> 
>>>>>> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
>>>>>> julio de 2014. Ver www.uci.cu
>>>>>> 
>>>> 
>>>> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
>> julio de 2014. Ver www.uci.cu
>>> 
>>> 
>>> 
>>> 
>> 
>> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
>> julio de 2014. Ver www.uci.cu
>> 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
Yes, but I’m looking for the position of the url field of interest in the 
response of solr. Solr matches the terms against the collection of documents 
and returns sorted list by score, what I’m trying to do is get the position of 
the a specific id in this sorted response. The response could be something like 
position: 5, or position 500. To do this manually suppose the response consists 
of a very large amount of documents (webpages) in this case I would need to 
iterate over the complete response to find the position, which in a worst case 
scenario could be in the last page for instance. For this particular use case 
I’m not so interested in the URL field per se but more on the position a 
certain url has in the full solr response.

On Jun 24, 2014, at 12:31 AM, Walter Underwood  wrote:

> Solr is designed to do exactly this very, very fast. So there isn't a faster 
> way to do it. But you only need to fetch the URL field. You can ignore 
> everything else.
> 
> wunder
> 
> On Jun 23, 2014, at 9:32 PM, Jorge Luis Betancourt Gonzalez 
>  wrote:
> 
>> Basically given a few search terms (query) the idea is to know given one or 
>> more terms in which position your website is located for those specific 
>> terms.
>> 
>> On Jun 24, 2014, at 12:12 AM, Aman Tandon  wrote:
>> 
>>> What kind of search criteria, could you please explain
>>> 
>>> With Regards
>>> Aman Tandon
>>> 
>>> 
>>> On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez <
>>> jlbetanco...@uci.cu> wrote:
>>> 
>>>> I’m using Solr for an analytic use case, one of the requirements is
>>>> basically given a search query get the position of the first hit. I’m
>>>> indexing web pages, so given a search criteria the client want’s to know
>>>> the position (first occurrence) of his webpage in the result set (if it
>>>> appears at all). Is any way of getting this position without iterating and
>>>> manually checking the solr response?
>>>> 
>>>> Greetings,
>>>> 
>>>> 
>>>> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
>>>> julio de 2014. Ver www.uci.cu
>>>> 
>> 
>> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio 
>> de 2014. Ver www.uci.cu
> 
> 
> 
> 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
Basically given a few search terms (query) the idea is to know given one or 
more terms in which position your website is located for those specific terms.

On Jun 24, 2014, at 12:12 AM, Aman Tandon  wrote:

> What kind of search criteria, could you please explain
> 
> With Regards
> Aman Tandon
> 
> 
> On Tue, Jun 24, 2014 at 4:30 AM, Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
> 
>> I’m using Solr for an analytic use case, one of the requirements is
>> basically given a search query get the position of the first hit. I’m
>> indexing web pages, so given a search criteria the client want’s to know
>> the position (first occurrence) of his webpage in the result set (if it
>> appears at all). Is any way of getting this position without iterating and
>> manually checking the solr response?
>> 
>> Greetings,
>> 
>> 
>> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de
>> julio de 2014. Ver www.uci.cu
>> 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Get position of first occurrence in search result

2014-06-23 Thread Jorge Luis Betancourt Gonzalez
I’m using Solr for an analytic use case, one of the requirements is basically 
given a search query get the position of the first hit. I’m indexing web pages, 
so given a search criteria the client want’s to know the position (first 
occurrence) of his webpage in the result set (if it appears at all). Is any way 
of getting this position without iterating and manually checking the solr 
response? 

Greetings,


VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Customizing Solr; Where to draw the line?

2014-06-09 Thread Jorge Luis Betancourt Gonzalez
I’ve certainly go for the 2nd option. Depending of what you need you won’t need 
to modify Solr itself but extend it using different plugins for what you need. 
You’ll need to write different components depending on your specific 
requirements. I definitely recommend the talks from Trey Grainger, from 
CareerBuilder. I remember seeing in some of the talks they have A/B testing 
built into Solr, and a lot of other “crazy” things, so it would be a good 
starting point, and it will provide a look on what you could accomplish by 
extending Solr.

Of course you’ll need to update your source between big releases of Solr, and 
perhaps between some minor ones, but this way you don’t need to worry about the 
latency or maintain a new search layer between the client and Solr. 

I hope it helps,

On Jun 8, 2014, at 10:38 PM, Phanindra R  wrote:

> Hi,
> 
> We have decided to migrate from Lucene 3.x to latest Solr. A lot of
> architectural discussions are going on. There are two possible approaches.
> 
> Please note that our customer-facing app (or any client) and Search are
> hosted on different machines.
> 
> *1) Have a clean architecture*
>- Solr takes care of customized search only.
> 
>   - We certainly have to override some filtering, scoring,etc.
> 
>- There will be an intermediary search-app that
> 
>   - receives queries
>  - does a/b testing assignments, and other non-search stuff.
>  - does query expansion / rewriting (to avoid every Solr shard doing
>  that)
>  - transforms query into Solr syntax and uses Solr's http API to
>  consume it.
>  - returns the response to customer-facing app or whatever the client
>  is.
> 
>   The problem with this approach is the additional layer and the latency
> between search-app and solr. The client of search has to make an API call,
> across the network, to the intermediary search-app which in turns makes
> another Http API call to Solr.
> 
> *2) Customize Solr to the full extent*
> 
>   - Do all the crazy stuff within Solr.
>   - We can literally create a new url and register a handler class to
>   process that. With some limitations, we should be able to do almost
>   anything.
> 
> The benefit of this approach is that it obviates the additional layer
> and the latency. However, I see a lot of long-term problems like hard to
> upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.).
> 
> How about a distributed search? Where do above approaches stand?
> 
> I understand that this is a subjective question. It'd be helpful if you
> could share your thoughts and experiences.
> 
> Thanks.

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Percolator feature

2014-05-28 Thread Jorge Luis Betancourt Gonzalez
Is there some work around in Solr ecosystem to get something similar to the 
percolator feature offered by elastic search? 

Greetings!VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 
de julio de 2014. Ver www.uci.cu


Re: Writing a customize updateRequestHandler

2014-02-03 Thread Jorge Luis Betancourt Gonzalez
In the book Apache Solr Beginner’s Guide there is a section dedicated to write 
new Solr plugins, perhaps it would be a good place to start, also in the wiki 
there is a page about this, but the it’s a light introduction. I’ve found that 
a very good starting point it’s just browse throw the code of some standard 
components similar to the one you’re trying to customize.

On Feb 3, 2014, at 9:00 AM, neerajp  wrote:

> Hi,
> I want to write a custom updateRequestHandler.
> Can you pl.s guide me the steps I need to perform for that ?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html
> Sent from the Solr - User mailing list archive at Nabble.com.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Solr server requirements for 100+ million documents

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
Previously in the list a spreadsheet has been mentioned, taking into account 
that you already have documents in an index you could extract the needed 
information from your index and feed it into the spreadsheet and it probably 
will give you a rough approximated of the hardware you’ll bee needing. Also if 
I’m not mistaken no SolrCloud approximation is provided by this “tool”.

Greetings!

On Jan 28, 2014, at 11:02 PM, Susheel Kumar  
wrote:

> Thanks, Jack. That helps.
> 
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com] 
> Sent: Tuesday, January 28, 2014 8:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
> 
> Lucene and Solr work best if the full index can be cached in OS memory. 
> Sure, Lucene/Solr does work properly once the index no longer fits, but 
> performance will drop off.
> 
> I would say that you could fit 100 million moderate-size documents on a 
> single Solr server - provided that you give the OS enough RAM for the full 
> Lucene index. That said, if you want to configure a SolrCloud cluster with 
> shards, you can use more modest, commodity servers with less RAM, provided 
> each server still fits it's fraction of the total Lucene index in that 
> server's OS memory (file cache.)
> 
> You may also need to add replicas for each shard to accommodate query load - 
> proof-of-concept testing is needed to verify that. It is worth noting that 
> sharding can improve total query performance since each node only searches a 
> fraction of the total data and those searches are done in parallel  (since 
> they are on different machines.)
> 
> -- Jack Krupansky
> 
> -Original Message-
> From: Susheel Kumar
> Sent: Sunday, January 26, 2014 10:54 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr server requirements for 100+ million documents
> 
> Thank you Erick for your valuable inputs. Yes, we have to re-index data again 
> & again. I'll look into possibility of tuning db access.
> 
> On SolrJ and automating the indexing (incremental as well as one time) I want 
> to get your opinion on below two points. We will be indexing separate sets of 
> tables with similar data structure
> 
> - Should we use SolrJ and write Java programs that can be scheduled to 
> trigger indexing on demand/schedule based.
> 
> - Is using SolrJ a better idea even for searching than using SolrNet? As our 
> frontend is in .Net so we started using SolrNet but I am afraid down the road 
> when we scale/support SolrClod using SolrJ is better?
> 
> 
> Thanks
> Susheel
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, January 26, 2014 8:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
> 
> Dumping the raw data would probably be a good idea. I guarantee you'll be 
> re-indexing the data several times as you change the schema to accommodate 
> different requirements...
> 
> But it may also be worth spending some time figuring out why the DB access is 
> slow. Sometimes one can tune that.
> 
> If you go the SolrJ route, you also have the possibility of setting up N 
> clients to work simultaneously, sometimes that'll help.
> 
> FWIW,
> Erick
> 
> On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar 
>  wrote:
>> Hi Kranti,
>> 
>> Attach are the solrconfig & schema xml for review. I did run indexing 
>> with just few fields (5-6 fields) in schema.xml & keeping the same db 
>> config but Indexing almost still taking similar time (average 1 
>> million records 1
>> hr) which confirms that the bottleneck is in the data acquisition 
>> which in our case is oracle database. I am thinking to not use 
>> dataimporthandler / jdbc to get data from Oracle but to rather dump 
>> data somehow from oracle using SQL loader and then index it. Any thoughts?
>> 
>> Thnx
>> 
>> -Original Message-
>> From: Kranti Parisa [mailto:kranti.par...@gmail.com]
>> Sent: Saturday, January 25, 2014 12:08 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr server requirements for 100+ million documents
>> 
>> can you post the complete solrconfig.xml file and schema.xml files to 
>> review all of your settings that would impact your indexing performance.
>> 
>> Thanks,
>> Kranti K. Parisa
>> http://www.linkedin.com/in/krantiparisa
>> 
>> 
>> 
>> On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar < 
>> susheel.ku...@thedigitalgroup.net> wrote:
>> 
>>> Thanks, Svante. Your indexing speed using db seems to really fast.
>>> Can you please provide some more detail on how you are indexing db 
>>> records. Is it thru DataImportHandler? And what database? Is that 
>>> local db?  We are indexing around 70 fields (60 multivalued) but data 
>>> is not populated always in all fields. The average size of document 
>>> is in
>>> 5-10 kbs.
>>> 
>>> -Original Message-
>>> From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf 
>>> Of svant

Re: PHP + Solr

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
I’ve some experience using Solarium and have been great so far. In particular 
we use the NelmioSolariumBundle to integrate with Symfony2.

Greetings!

On Jan 28, 2014, at 1:54 PM, Felipe Dantas de Souza Paiva 
 wrote:

> ‎Hi Folks,
> 
> I would like to know what is the best way to integrate PHP and Apache Solr. 
> Until now I've found two options:
> 
> 1) http://www.php.net/manual/en/intro.solr.php
> 
> 2) http://www.solarium-project.org/
> 
> What do you guys say?
> 
> Cheers,
> 
> Felipe
> 
> 
> 
> AVISO: A informaç?o contida neste e-mail, bem como em qualquer de seus 
> anexos, é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinat?rio(s) 
> acima referido(s), podendo conter informaç?es sigilosas e/ou legalmente 
> protegidas. Caso você n?o seja o destinat?rio desta mensagem, informamos que 
> qualquer divulgaç?o, distribuiç?o ou c?pia deste e-mail e/ou de qualquer de 
> seus anexos é absolutamente proibida. Solicitamos que o remetente seja 
> comunicado imediatamente, respondendo esta mensagem, e que o original desta 
> mensagem e de seus anexos, bem como toda e qualquer c?pia e/ou impress?o 
> realizada a partir destes, sejam permanentemente apagados e/ou destru?dos. 
> Informaç?es adicionais sobre nossa empresa podem ser obtidas no site 
> http://sobre.uol.com.br/.
> 
> NOTICE: The information contained in this e-mail and any attachments thereto 
> is CONFIDENTIAL and is intended only for use by the recipient named herein 
> and may contain legally privileged and/or secret information.
> If you are not the e-mail´s intended recipient, you are hereby notified that 
> any dissemination, distribution or copy of this e-mail, and/or any 
> attachments thereto, is strictly prohibited. Please immediately notify the 
> sender replying to the above mentioned e-mail address, and permanently delete 
> and/or destroy the original and any copy of this e-mail and/or its 
> attachments, as well as any printout thereof. Additional information about 
> our company may be obtained through the site http://www.uol.com.br/ir/.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Solr & Nutch

2014-01-28 Thread Jorge Luis Betancourt Gonzalez
Q1: Nutch doesn’t only handle the parse of HTML files, it also use hadoop to 
achieve large-scale crawling using multiple nodes, it fetch the content of the 
HTML file, and yes it also parse its content.

Q2: In our case we use sold to crawl some website, store the content in one 
“main” solr core. We also have a web app with the typical “search box” we use a 
separated core to store the queries made by our users.

Q3: Not currently using SolrCloud so I’m going to let this one pass to a more 
experienced fellow.

On Jan 28, 2014, at 11:36 AM, rashmi maheshwari  
wrote:

> Hi,
> 
> Question1 --> When Solr could parse html, documents like doc, excel pdf
> etc, why do we need nutch to parse html files? what is different?
> 
> Questions 2: When do we use multiple core in solar? any practical business
> case when we need multiple cores?
> 
> Question 3: When do we go for cloud? What is meaning of implementing solr
> cloud?
> 
> 
> -- 
> Rashmi
> Be the change that you want to see in this world!
> www.minnal.zor.org
> disha.resolve.at
> www.artofliving.org


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing an alerting feature

2014-01-27 Thread Jorge Luis Betancourt Gonzalez
I believe that you are looking for something similar to the percolator feature 
present in elasticsearch. I remember something about a solar implementation 
being discussed here some time ago. Anyone knows if there have been any 
progress in this area?

On Jan 27, 2014, at 8:18 AM, Furkan KAMACI  wrote:

> Hi Charlie;
> 
> Is there any written documentation that explains your library?
> 
> Thanks;
> Furkan KAMACI
> 
> 
> 2014-01-27 Charlie Hull 
> 
>> On 27/01/2014 08:50, elmerfudd wrote:
>> 
>>> I want to implement an alert service in my solr system.
>>> In the FAST ESP system the service is called Real Time Alerting.
>>> 
>>> The service I'm looking for is:
>>> - a document is fed to solr.
>>> - without the document indexed , a set of queries run on the document
>>> - if the document answers a query - an alert will be sent in near
>>> Real-Time.
>>> 
>> 
>> You might want to take a look at Luwak, a library we built recently for
>> running lots of stored queries in an efficient manner. We use this for
>> media monitoring applications.
>> 
>> https://github.com/flaxsearch/luwak
>> 
>> Cheers
>> 
>> Charlie
>> 
>> 
>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/Implementing-an-alerting-feature-tp4113666.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>> 
>> 
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>> 
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk
>> 


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Solr Related Search Suggestions

2014-01-27 Thread Jorge Luis Betancourt Gonzalez
If I’m not remembering incorrectly Trey Grainger in one of his talks explained 
a few techniques that could be of use. If the equivalency is not dynamically 
you could just use synonyms. Otherwise some kind of offline processing should 
be used to compute the similarity between your queries (given that very little 
or none textual similarity it’s present in your queries). 

On Jan 27, 2014, at 4:29 AM, kumar  wrote:

> What is the best way to implement related search suggestions.
> 
> For example :
> 
> If the user is looking for "marriage halls" i need to show results like
> "catering services", "photography", "wedding cards", "invitation cards",
> "music organisers". 
> 
> 
> Thanks & Regards,
> kumar
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Related-Search-Suggestions-tp4113672.html
> Sent from the Solr - User mailing list archive at Nabble.com.


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Unit testing custom update request processor

2014-01-07 Thread Jorge Luis Betancourt Gonzalez
Happy new year!

I’ve developed some custom update request processors to accomplish some custom 
logic needed in some user cases. I’m trying to write test for this processor, 
but I’d like to test in a very similar way of how the built in processors are 
tested in the solr source code. Is there any advice on how accomplish this or 
some experience that someone more experienced could share?

Greetings!


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: ANNOUNCE: Apache Solr Reference Guide 4.6

2013-12-09 Thread Ing. Jorge Luis Betancourt Gonzalez
Is it possible to export the doc into markdown? 

- Mensaje original -
De: "Chris Hostetter" 
Para: solr-user@lucene.apache.org
Enviados: Lunes, 9 de Diciembre 2013 14:00:34
Asunto: Re: ANNOUNCE: Apache Solr Reference Guide 4.6


: Can we please give some thought to producing these manuals in ebook formats?

People have given it thought, but it's not as simple as just snapping our 
fingers and making it happen.

If you would like to contibute to the effort of figuring out the
how/where/what to make this happening, there is an existing jira for 
dicussing it.

https://issues.apache.org/jira/browse/SOLR-5467



-Hoss
http://www.lucidworks.com/

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


How to boost documents with all the query terms

2013-12-07 Thread Ing. Jorge Luis Betancourt Gonzalez
Hi:

I'm using solr 3.6 with dismax query parser, I've found that docs that doesn't 
has all the query terms get ranked above other that contains all the terms in 
the search query. Using debugQuery I could see that the most part of the score 
in this cases come from the coord(q,d) factor. Is there any way I could boost 
the documents that contain all the search query terms?

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Introducing Luwak for high-performance stored Lucene queries

2013-12-06 Thread Ing. Jorge Luis Betancourt Gonzalez
+1 on this.

- Mensaje original -
De: "Otis Gospodnetic" 
Para: solr-user@lucene.apache.org
Enviados: Viernes, 6 de Diciembre 2013 9:35:25
Asunto: Re: Introducing Luwak for high-performance stored Lucene queries

Hi Charlie,

Very nice - thanks!

I'd love to see a side-by-side comparison with ES percolator. got
something like that in your blog topic queue?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Dec 6, 2013 at 9:29 AM, Charlie Hull  wrote:

> Hi all,
>
> We've now released the library we mentioned in our presentation at Lucene
> Revolution: https://github.com/flaxsearch/luwak
>
> You can use this to apply tens of thousands of stored Lucene queries to an
> incoming document in a second or so on relatively modest hardware. We use
> it for media monitoring applications but it could equally be useful for
> categorisation, classification etc.
>
> It's currently based on a fork of Lucene (details supplied) but hopefully
> it'll work with release versions soon.
>
> Feedback is very welcome!
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: solr as a service for multiple projects in the same environment

2013-12-02 Thread Ing. Jorge Luis Betancourt Gonzalez
I think that one experience in this area could by provided by Tray Grainger, 
author of Solr in Action, I believe that some of his work on careerbuilder 
involve the creation of something (somehow) similar to what you're trying to 
accomplish. I must say that I'm also interested in this topic, but haven't had 
the time to really do anything about this.

- Mensaje original -
De: "adfel70" 
Para: solr-user@lucene.apache.org
Enviados: Domingo, 1 de Diciembre 2013 2:41:00
Asunto: Re: solr as a service for multiple projects in the same environment

The risk is if you buy mistake mess up a cluster while doing maintenance on
one of the systems, you can affect the other system.
Its a pretty amorfic risk.
Aside from having multiple systems share the same hardware resources, I
don't see any other real risk.

Are your collections share the same topology in terms of shards and
replicas?
Do you manually configure the nodes on which each collection is created so
that you'll still have some level of seperation between the systems?




michael.boom wrote
> Hi,
> 
> There's nothing unusual in what you are trying to do, this scenario is
> very common.
> 
> To answer your questions:
>> 1. as I understand I can separate the configs of each collection in
>> zookeeper. is it correct? 
> Yes, that's correct. You'll have to upload your configs to ZK and use the
> CollectionAPI to create your collections.
> 
>>2.are there any solr operations that can be performed on collection A and
somehow affect collection B? 
> No, I can't think of any cross-collection operation. Here you can find a
> list of collection related operations:
> https://cwiki.apache.org/confluence/display/solr/Collections+API
> 
>>3. is the solr cache separated for each collection? 
> Yes, separate and configurable in solrconfig.xml for each collection.
> 
>>4. I assume that I'll encounter a problem with the os cache, when the
different indices will compete on the same memory, right? how severe is this
issue? 
> Hardware can be a bottleneck. If all your collection will face the same
> load you should try to give solr a RAM amount equal to the index size (all
> indexes)
> 
>>5. any other advice on building such an architecture? does the maintenance
overhead of maintaining multiple clusters in production really overwhelm the
problems and risks of using the same cluster for multiple systems? 
> I was in the same situation as you, and putting everything in multiple
> collections in just one cluster made sense for me : it's easier to manage
> and has no obvious downside. As for "risks of using the same cluster for
> multiple systems" they are pretty much the same  in both scenarios. Only
> that with multiple clusters you'll have much more machines to manage.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-as-a-service-for-multiple-projects-in-the-same-environment-tp4103523p4104206.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Client-side proxy for Solr 4.5.0

2013-11-26 Thread Ing. Jorge Luis Betancourt Gonzalez
Perhaps what you want is a transparent proxy? You could use nginx, squid, 
varnish, etc. W've been evaluating varnish as a posibility to run in front of 
our solr server and take advantage of the HTTP caching that varnish does so 
well.

Greetings!

- Mensaje original -
De: "Markus Jelsma" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 26 de Noviembre 2013 13:53:31
Asunto: RE: Client-side proxy for Solr 4.5.0

I don't think you mean client-side proxy. You need a server side layer such as 
a normal web application or good proxy. We use Nginx, it is very fast and very 
feature rich. Its config scripting is usually enough to restrict access and 
limit input parameters. We also use Nginx's embedded Perl and Lua scripting 
besides its config scripting to implement more difficult logic.

 
 
-Original message-
> From:Reyes, Mark 
> Sent: Tuesday 26th November 2013 19:27
> To: solr-user@lucene.apache.org
> Subject: Client-side proxy for Solr 4.5.0
> 
> Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so 
> that the end-user can see  their queries w/o being able to directly access 
> :8983?
> 
> Applications/frameworks used:
> - Solr 4.5.0
> - AJAX Solr (javascript library)
> 
> Thank you,
> Mark
> 
> IMPORTANT NOTICE: This e-mail message is intended to be received only by 
> persons entitled to receive the confidential information it may contain. 
> E-mail messages sent from Bridgepoint Education may contain information that 
> is confidential and may be legally privileged. Please do not read, copy, 
> forward or store this message unless you are an intended recipient of it. If 
> you received this transmission in error, please notify the sender by reply 
> e-mail and delete the message and any attachments.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Solr logs encoding to UTF8

2013-11-22 Thread Ing. Jorge Luis Betancourt Gonzalez
Hi everybody:

Is there any way of forcing an UTF-8 conversion on the queries that are logged 
into the log? I've deployed solr in tomcat7. The file appears to be an UTF-8 
file but I'm seeing this in the logs:

INFO: [] webapp=/solr path=/select 
params={fl=*,score&start=0&q=disñemos+el+mundo&hl.simple.pre=&hl.simple.post=&hl.fl=title,content,url,description,keywords&wt=json&hl=true&rows=20}
 hits=48865 status=0 QTime=155.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Strange behavior of gap fragmenter on highlighting

2013-11-13 Thread Ing. Jorge Luis Betancourt Gonzalez
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is 
my configuration for the gap fragmenter:

  

  150

  

This is the basic configuration, just tweaked the fragsize parameter to get 
shorter fragments. The thing is that for 1 particular PDF document in my 
results I get a really long snippet, way over 150 characters. This get a little 
more odd, if I change the 150 value for 100 the snippet for the same document 
it's normal ~ 100 characters. The type of the field being highlighted is this:















Any ideas about what's happening?? Or how could I debug what is really going 
on??

Greetings!

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
Sorry, I forgot the link:

[1] - http://wiki.apache.org/solr/SolrRelevancyFAQ

- Mensaje original -
De: "Ing. Jorge Luis Betancourt Gonzalez" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 13:34:03
Asunto: Re: Auto Suggest - Time decay

For that core just use a boost factor as explained on [1]:

You could use a query like this to see (before make any change) how your 
suggestions will be retrieved, in this case a query for "goog" has been made, 
and recent documents will be boosted (an extra bonus will be given for the 
newer documents).

http://localhost:8983/solr/select?q={!boost 
b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog

If this is enough for you you could poot the boost parameter in your request 
handler and make it even simpler so any query againsta this particular request 
handler will be automatically boosted by date.

PS: You could tweak the above formula used in the boost parameter for a more 
suitable to your needs.

- Mensaje original -
De: "SolrLover" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:19:51
Asunto: Re: Auto Suggest - Time decay

I am using a totally separate core for storing the auto suggest keywords.

Would you be able to send me some more details on your implementation? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
For that core just use a boost factor as explained on [1]:

You could use a query like this to see (before make any change) how your 
suggestions will be retrieved, in this case a query for "goog" has been made, 
and recent documents will be boosted (an extra bonus will be given for the 
newer documents).

http://localhost:8983/solr/select?q={!boost 
b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}goog

If this is enough for you you could poot the boost parameter in your request 
handler and make it even simpler so any query againsta this particular request 
handler will be automatically boosted by date.

PS: You could tweak the above formula used in the boost parameter for a more 
suitable to your needs.

- Mensaje original -
De: "SolrLover" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:19:51
Asunto: Re: Auto Suggest - Time decay

I am using a totally separate core for storing the auto suggest keywords.

Would you be able to send me some more details on your implementation? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965p4092969.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Auto Suggest - Time decay

2013-10-01 Thread Ing. Jorge Luis Betancourt Gonzalez
Are you using the suggester component? or a separated core? I've used a 
separated core to store suggestions and order this suggestions (queries 
performed on the frontend) using a time decay function, and it works great for 
me.

Regards,

- Mensaje original -
De: "SolrLover" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 1 de Octubre 2013 12:12:13
Asunto: Auto Suggest - Time decay

I am trying to implement an auto suggest based on time decay function. I have
a separate index just to store auto suggest keywords.

I would be calculating the frequency over time rather than just calculating
just based on frequency alone. 

I am thinking of using a database to perform the calculation and update the
SOLR index with the boost calculated based on time decay function. I am not
sure if there is a better way to do this...

I need to boost the terms based on the frequency over time,

Ex: when someone searches for 'apple' 1 times during a iphone launch
(one particular day) shouldn't really make apple come up in the auto
suggestion always when someone types in the keyword 'a' rather it should
lose its popularity exponentially..

Anyone has any suggestions?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Suggest-Time-decay-tp4092965.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-28 Thread Ing. Jorge Luis Betancourt Gonzalez
I forgot to mention you could check the boost section on the configuration file 
of the core to see how your suggestions will be ranked, basically the boost 
factor for each field allows you to decide which suggestion do you like to come 
first, perhaps in your app you could keep track of how much a suggestion given 
to a user is actually used as the query and boost this suggestions as is more 
likely to become a query for the user; thinking a little ahead this could 
improve your user experience and additionally low the load on your server, 
because if a suggestion given to a high number of users become a query, this 
query should already be in the cache. This are just thoughts but I hope could 
be useful to you.

Regards,

- Mensaje original -
De: "Ing. Jorge Luis Betancourt Gonzalez" 
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 19:44:28
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

Actually I don't use that field, it could be used to do some form of basic 
collaborative filtering, so you could use a high value for items in your 
collection that you want to come first, but in my case this was not a 
requirement and I don't use it at all.

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 16:19:40
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I am not sure about the value to use for the option "popularity".  Is there
a method or do you just go with some arbitrary number?

On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> Great!! I haven't see your message yet, perhaps you could create a PR to
that Github repository, son it will be in sync with current versions of
Solr.
>
> - Mensaje original -
> De: "JMill" 
> Para: solr-user@lucene.apache.org
> Enviados: Jueves, 26 de Septiembre 2013 9:10:49
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)
>
> solved.
>
>
> On Thu, Sep 26, 2013 at 1:50 PM, JMill 
wrote:
>
>> I managed to get rid of the query error by playing jquery file in the
>> velocity folder and adding line: ">
src="#{url_for_solr}/admin/file?file=/velocity/jquery.min.js&contentType=text/javascript">".
>> That has not solved the issues the console is showing a new error -
>> "[13:42:55.181] TypeError: $.browser is undefined @
>>
http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.js&contentType=text/javascript:90
".
>> Any ideas?
>>
>>
>> On Thu, Sep 26, 2013 at 1:12 PM, JMill wrote:
>>
>>> Do you know the directory the "#{url_root}" in >> type="text/javascript" src="#{url_root}/js/lib/
>>> jquery-1.7.2.min.js"> points too? and same for
>>> ""#{url_for_solr}" >> src="#{url_for_solr}/js/lib/jquery-1.7.2.min.js">
>>>
>>>
>>> On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez <
>>> jlbetanco...@uci.cu> wrote:
>>>
>>>> Try quering the core where the data has been imported, something like:
>>>>
>>>> http://localhost:8983/solr/suggestions/select?q=uc
>>>>
>>>> In the previous URL suggestions is the name I give to the core, so this
>>>> should change, if you get results, then the problem could be the jquery
>>>> dependency. I don't remember doing any change, as far as I know that js
>>>> file is bundled with solr (at leat in 3.x) version perhaps you could
change
>>>> it the correct jquery version on solr 4.4, if you go into the admin
panel
>>>> (in solr 3.6):
>>>>
>>>> http://localhost:8983/solr/admin/schema.jsp
>>>>
>>>> And inspect the loaded code, the required file (jquery-1.4.2.min.js)
>>>> gets loaded in solr 4.4 it should load a similar file, but perhaps a
more
>>>> recent version.
>>>>
>>>> Perhaps you could change that part to something like:
>>>>
>>>>   >>> src="#{url_root}/js/lib/jquery-1.7.2.min.js">
>>>>
>>>> Which is used at least on a solr 4.1 that I have laying aroud here
>>>> somewhere.
>>>>
>>>> In any case you can test the suggestions using the URL that I suggest
on
>>>> the top of this mail, in that case you should be able to see the
possible
>>>> results, of course in a less fancy way.
>>>>
>>>> - Mensaje original -
>>>> De: "JMill" 
>>>> Para: solr-user@lucene.apache.org
>>>> Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
>>>> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
>>>> columns)
>>>>
>>>> Could it be the jquery library that is the problem?   I opened up
>>>> solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
>>>> to
>>>> the jquery library but I can't seem to find the directory referenced,
>>>>  line:  

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-27 Thread Ing. Jorge Luis Betancourt Gonzalez
Actually I don't use that field, it could be used to do some form of basic 
collaborative filtering, so you could use a high value for items in your 
collection that you want to come first, but in my case this was not a 
requirement and I don't use it at all.

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 16:19:40
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I am not sure about the value to use for the option "popularity".  Is there
a method or do you just go with some arbitrary number?

On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> Great!! I haven't see your message yet, perhaps you could create a PR to
that Github repository, son it will be in sync with current versions of
Solr.
>
> - Mensaje original -
> De: "JMill" 
> Para: solr-user@lucene.apache.org
> Enviados: Jueves, 26 de Septiembre 2013 9:10:49
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)
>
> solved.
>
>
> On Thu, Sep 26, 2013 at 1:50 PM, JMill 
wrote:
>
>> I managed to get rid of the query error by playing jquery file in the
>> velocity folder and adding line: ">
src="#{url_for_solr}/admin/file?file=/velocity/jquery.min.js&contentType=text/javascript">".
>> That has not solved the issues the console is showing a new error -
>> "[13:42:55.181] TypeError: $.browser is undefined @
>>
http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.js&contentType=text/javascript:90
".
>> Any ideas?
>>
>>
>> On Thu, Sep 26, 2013 at 1:12 PM, JMill wrote:
>>
>>> Do you know the directory the "#{url_root}" in >> type="text/javascript" src="#{url_root}/js/lib/
>>> jquery-1.7.2.min.js"> points too? and same for
>>> ""#{url_for_solr}" >> src="#{url_for_solr}/js/lib/jquery-1.7.2.min.js">
>>>
>>>
>>> On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez <
>>> jlbetanco...@uci.cu> wrote:
>>>
>>>> Try quering the core where the data has been imported, something like:
>>>>
>>>> http://localhost:8983/solr/suggestions/select?q=uc
>>>>
>>>> In the previous URL suggestions is the name I give to the core, so this
>>>> should change, if you get results, then the problem could be the jquery
>>>> dependency. I don't remember doing any change, as far as I know that js
>>>> file is bundled with solr (at leat in 3.x) version perhaps you could
change
>>>> it the correct jquery version on solr 4.4, if you go into the admin
panel
>>>> (in solr 3.6):
>>>>
>>>> http://localhost:8983/solr/admin/schema.jsp
>>>>
>>>> And inspect the loaded code, the required file (jquery-1.4.2.min.js)
>>>> gets loaded in solr 4.4 it should load a similar file, but perhaps a
more
>>>> recent version.
>>>>
>>>> Perhaps you could change that part to something like:
>>>>
>>>>   >>> src="#{url_root}/js/lib/jquery-1.7.2.min.js">
>>>>
>>>> Which is used at least on a solr 4.1 that I have laying aroud here
>>>> somewhere.
>>>>
>>>> In any case you can test the suggestions using the URL that I suggest
on
>>>> the top of this mail, in that case you should be able to see the
possible
>>>> results, of course in a less fancy way.
>>>>
>>>> - Mensaje original -
>>>> De: "JMill" 
>>>> Para: solr-user@lucene.apache.org
>>>> Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
>>>> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
>>>> columns)
>>>>
>>>> Could it be the jquery library that is the problem?   I opened up
>>>> solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
>>>> to
>>>> the jquery library but I can't seem to find the directory referenced,
>>>>  line:  

Re: Sorting dependent on user preferences with FunctionQuery

2013-09-26 Thread Ing. Jorge Luis Betancourt Gonzalez
I think you could use boosting queries: for group A you boost one category and 
for group B some other category.

- Mensaje original -
De: "Snubbel" 
Para: solr-user@lucene.apache.org
Enviados: Jueves, 26 de Septiembre 2013 8:01:36
Asunto: Sorting dependent on user preferences with FunctionQuery

Hello,

I want to present to different user groups a search result in different
orders.
Say, i have customer group A, which I know prefers Books, I want to get
Books at the top of my query result, DVDs at the bottom.
And for group B, preferring DVD, these first.
In my index I have a field of type text named "category" with values Book
and DVD.

I thought maybe I could solve this with QueryFunctions, maybe like this:

 
select?q=*%3A*&sort=query(qf=category v='Book')desc

but Solr returns "Can't determine a Sort Order (asc or desc) in sort".

What is wrong? I tried different ways of formulating the query without
success...


Or, does anyone have a better idea how to solve this?

Best regards, Nikola



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-dependent-on-user-preferences-with-FunctionQuery-tp4092119.html
Sent from the Solr - User mailing list archive at Nabble.com.

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-26 Thread Ing. Jorge Luis Betancourt Gonzalez
Great!! I haven't see your message yet, perhaps you could create a PR to that 
Github repository, son it will be in sync with current versions of Solr.

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Jueves, 26 de Septiembre 2013 9:10:49
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

solved.


On Thu, Sep 26, 2013 at 1:50 PM, JMill  wrote:

> I managed to get rid of the query error by playing jquery file in the
> velocity folder and adding line: " src="#{url_for_solr}/admin/file?file=/velocity/jquery.min.js&contentType=text/javascript">".
> That has not solved the issues the console is showing a new error -
> "[13:42:55.181] TypeError: $.browser is undefined @
> http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.js&contentType=text/javascript:90";.
> Any ideas?
>
>
> On Thu, Sep 26, 2013 at 1:12 PM, JMill wrote:
>
>> Do you know the directory the "#{url_root}" in > type="text/javascript" src="#{url_root}/js/lib/
>> jquery-1.7.2.min.js"> points too? and same for
>> ""#{url_for_solr}" > src="#{url_for_solr}/js/lib/jquery-1.7.2.min.js">
>>
>>
>> On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez <
>> jlbetanco...@uci.cu> wrote:
>>
>>> Try quering the core where the data has been imported, something like:
>>>
>>> http://localhost:8983/solr/suggestions/select?q=uc
>>>
>>> In the previous URL suggestions is the name I give to the core, so this
>>> should change, if you get results, then the problem could be the jquery
>>> dependency. I don't remember doing any change, as far as I know that js
>>> file is bundled with solr (at leat in 3.x) version perhaps you could change
>>> it the correct jquery version on solr 4.4, if you go into the admin panel
>>> (in solr 3.6):
>>>
>>> http://localhost:8983/solr/admin/schema.jsp
>>>
>>> And inspect the loaded code, the required file (jquery-1.4.2.min.js)
>>> gets loaded in solr 4.4 it should load a similar file, but perhaps a more
>>> recent version.
>>>
>>> Perhaps you could change that part to something like:
>>>
>>>   >> src="#{url_root}/js/lib/jquery-1.7.2.min.js">
>>>
>>> Which is used at least on a solr 4.1 that I have laying aroud here
>>> somewhere.
>>>
>>> In any case you can test the suggestions using the URL that I suggest on
>>> the top of this mail, in that case you should be able to see the possible
>>> results, of course in a less fancy way.
>>>
>>> - Mensaje original -
>>> De: "JMill" 
>>> Para: solr-user@lucene.apache.org
>>> Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
>>> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
>>> columns)
>>>
>>> Could it be the jquery library that is the problem?   I opened up
>>> solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
>>> to
>>> the jquery library but I can't seem to find the directory referenced,
>>>  line:  

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
That's and indication that jQuery can't be loaded, and without jQuery the 
autocomplete plugin won't work. This plugin is used to show the popup list that 
show up at the bottom of the input.

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 15:40:00
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

Not yet but I do see the "$" not found in console.

On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> As far as I can tell it is. You can check that by seeing the Console logs
on your browser (chrome, firefox, etc.). There should be an error saying
that the $ function it's not found. In any case I'll try to set up a
testing environment here, but I can only use solr 4.1, which I have here. I
haven't downloaded/tested the 4.4 version yet. Do you try replacing the
line that includes the jquery-1.4.3.min.js with the new one?
>
> - Mensaje original -
> De: "JMill" 
> Para: solr-user@lucene.apache.org
> Enviados: Miércoles, 25 de Septiembre 2013 14:44:53
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)
>
> That seems to work. I get back an xml containing a bunch of suggestions.
> Can we agree that it's jquery that's the problem?
>
> On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
>> Try quering the core where the data has been imported, something like:
>>
>> http://localhost:8983/solr/suggestions/select?q=uc
>>
>> In the previous URL suggestions is the name I give to the core, so this
> should change, if you get results, then the problem could be the jquery
> dependency. I don't remember doing any change, as far as I know that js
> file is bundled with solr (at leat in 3.x) version perhaps you could
change
> it the correct jquery version on solr 4.4, if you go into the admin panel
> (in solr 3.6):
>>
>> http://localhost:8983/solr/admin/schema.jsp
>>
>> And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets
> loaded in solr 4.4 it should load a similar file, but perhaps a more
recent
> version.
>>
>> Perhaps you could change that part to something like:
>>
>>src="#{url_root}/js/lib/jquery-1.7.2.min.js">
>>
>> Which is used at least on a solr 4.1 that I have laying aroud here
> somewhere.
>>
>> In any case you can test the suggestions using the URL that I suggest on
> the top of this mail, in that case you should be able to see the possible
> results, of course in a less fancy way.
>>
>> - Mensaje original -
>> De: "JMill" 
>> Para: solr-user@lucene.apache.org
>> Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
>> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
> columns)
>>
>> Could it be the jquery library that is the problem?   I opened up
>> solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
to
>> the jquery library but I can't seem to find the directory referenced,
>>  line:  

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
As far as I can tell it is. You can check that by seeing the Console logs on 
your browser (chrome, firefox, etc.). There should be an error saying that the 
$ function it's not found. In any case I'll try to set up a testing environment 
here, but I can only use solr 4.1, which I have here. I haven't 
downloaded/tested the 4.4 version yet. Do you try replacing the line that 
includes the jquery-1.4.3.min.js with the new one?

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 14:44:53
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

That seems to work. I get back an xml containing a bunch of suggestions.
Can we agree that it's jquery that's the problem?

On Wednesday, September 25, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> Try quering the core where the data has been imported, something like:
>
> http://localhost:8983/solr/suggestions/select?q=uc
>
> In the previous URL suggestions is the name I give to the core, so this
should change, if you get results, then the problem could be the jquery
dependency. I don't remember doing any change, as far as I know that js
file is bundled with solr (at leat in 3.x) version perhaps you could change
it the correct jquery version on solr 4.4, if you go into the admin panel
(in solr 3.6):
>
> http://localhost:8983/solr/admin/schema.jsp
>
> And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets
loaded in solr 4.4 it should load a similar file, but perhaps a more recent
version.
>
> Perhaps you could change that part to something like:
>
>   
>
> Which is used at least on a solr 4.1 that I have laying aroud here
somewhere.
>
> In any case you can test the suggestions using the URL that I suggest on
the top of this mail, in that case you should be able to see the possible
results, of course in a less fancy way.
>
> - Mensaje original -
> De: "JMill" 
> Para: solr-user@lucene.apache.org
> Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)
>
> Could it be the jquery library that is the problem?   I opened up
> solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to
> the jquery library but I can't seem to find the directory referenced,
>  line:  

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
Try quering the core where the data has been imported, something like:

http://localhost:8983/solr/suggestions/select?q=uc

In the previous URL suggestions is the name I give to the core, so this should 
change, if you get results, then the problem could be the jquery dependency. I 
don't remember doing any change, as far as I know that js file is bundled with 
solr (at leat in 3.x) version perhaps you could change it the correct jquery 
version on solr 4.4, if you go into the admin panel (in solr 3.6):

http://localhost:8983/solr/admin/schema.jsp

And inspect the loaded code, the required file (jquery-1.4.2.min.js) gets 
loaded in solr 4.4 it should load a similar file, but perhaps a more recent 
version.

Perhaps you could change that part to something like:

  

Which is used at least on a solr 4.1 that I have laying aroud here somewhere.

In any case you can test the suggestions using the URL that I suggest on the 
top of this mail, in that case you should be able to see the possible results, 
of course in a less fancy way.

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

Could it be the jquery library that is the problem?   I opened up
solr-home/ac/conf/velocity/head.vm with an editor and I see a reference to
the jquery library but I can't seem to find the directory referenced,
 line:  

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
Perhaps this could be an issue, I know that this works perfectly in solr 3.6 
(this is the one I was using). Currently I don't have a solr 4.4 to do some 
tests, but what have been done in that core should work in solr 4.4, perhaps 
there is a setting that need some tweaking but it's impossible of knowing 
without checking the logs. In case that any incompatibility is present it 
should pop out on the logs.

Regards,

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 11:10:32
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I simple query through admin (*:*) confirms the data is exists. The version
I'm working with is solr 4.4.0. The autocomplete manual refers to 3.x. I
wonder of this is the problem?


On Wed, Sep 25, 2013 at 4:01 PM, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:

> The response does not show any error, can you confirm that the data is in
> solr? you should be able to see the numDoc stats in the admin UI. Which
> version of Solr are you using? I believe that the example was tested on
> Solr 3.x at least at the time I use it.
>
> Regards,
>
> - Mensaje original -
> De: "JMill" 
> Para: solr-user@lucene.apache.org
> Enviados: Miércoles, 25 de Septiembre 2013 10:57:31
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)
>
> I followed the instructions, I am able to browse to "
> http://localhost:8983/solr/ac/browse?q=ce&debugQuery=true"; but I am not
> getting any suggestions (typed in c in Find Textbox).
>
> I wonder if loading the example data is the problem?  The response I get
> after executing the script  "feed-ac.sh" (step 3) is the following.
>
> user$ ./feed-ac.sh
> 
> 
> 0 name="QTime">2239
> 
>
> Are you able to confirm if this the expected response?
>
>
>
>
> On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez <
> jlbetanco...@uci.cu> wrote:
>
> > I've used a separated core for storing suggestions, based on what I see
> > in: https://github.com/cominvent/autocomplete. You can check the blog
> > post on
> > www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/.
> > This is really flexible, on the downside it does not use the suggester
> > component su this are like regular queries against a separated core.
> >
> > Greetings!
> >
> > - Mensaje original -
> > De: "Erick Erickson" 
> > Para: solr-user@lucene.apache.org
> > Enviados: Miércoles, 25 de Septiembre 2013 6:16:51
> > Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
> columns)
> >
> > I've sometimes seen this handled by clever tokenizing. For "Bill Rogers",
> > index (untokenized) something like
> > Bill|Bill Rogers
> > Rogers|Bill Rogers
> >
> > Your suggester then is a simple term lookup (see TermsComponent)
> > which is quite fast. What you _don't_ get is autocorrect. But if you
> > use terms.prefix, you can also control whether it's whole word match
> > or not. To get whole-word in the above, you would set your prefix to
> > "Rogers|" for instance. Or you may want to leave off the "|" to see
> > more of an autocomplete-type response.
> >
> > Then, of course, when you display this you need to only display what's
> > after the "|" (or whatever delimiter you use).
> >
> > One other note, this will be case sensitive, so you probably want to
> > do casing yourself, index things like
> > rogers|Bill Rogers
> > and lowercase what you send in to terms component.
> >
> > Best,
> > Erick
> >
> >
> >
> > On Tue, Sep 24, 2013 at 2:01 PM, JMill 
> > wrote:
> > > Hi,
> > >
> > > I'm using Solr's Suggester function to implement an autocomplete
> feature.
> > > I have it setup to check against the "username" and "name" fields.
> >  Problem
> > > is when running  a query against the name, the second term, after
> > > whitespace (surename) returns 0 results.  Works if if query is a
> partial
> > > name starting from the begining e.g. Given the name "Bill Rogers", a
> > query
> > > for Rogers will return 0 results whereas a query for "Bill" will return
> > > positive (Bill Rogers). As for the username, it's not working at.
> > >
> > > I am after the following behaviour.
> > >
> > > Match any partial words in the fie

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
The response does not show any error, can you confirm that the data is in solr? 
you should be able to see the numDoc stats in the admin UI. Which version of 
Solr are you using? I believe that the example was tested on Solr 3.x at least 
at the time I use it.

Regards,

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 10:57:31
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I followed the instructions, I am able to browse to "
http://localhost:8983/solr/ac/browse?q=ce&debugQuery=true"; but I am not
getting any suggestions (typed in c in Find Textbox).

I wonder if loading the example data is the problem?  The response I get
after executing the script  "feed-ac.sh" (step 3) is the following.

user$ ./feed-ac.sh


02239


Are you able to confirm if this the expected response?




On Wed, Sep 25, 2013 at 1:46 PM, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:

> I've used a separated core for storing suggestions, based on what I see
> in: https://github.com/cominvent/autocomplete. You can check the blog
> post on
> www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/.
> This is really flexible, on the downside it does not use the suggester
> component su this are like regular queries against a separated core.
>
> Greetings!
>
> - Mensaje original -
> De: "Erick Erickson" 
> Para: solr-user@lucene.apache.org
> Enviados: Miércoles, 25 de Septiembre 2013 6:16:51
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)
>
> I've sometimes seen this handled by clever tokenizing. For "Bill Rogers",
> index (untokenized) something like
> Bill|Bill Rogers
> Rogers|Bill Rogers
>
> Your suggester then is a simple term lookup (see TermsComponent)
> which is quite fast. What you _don't_ get is autocorrect. But if you
> use terms.prefix, you can also control whether it's whole word match
> or not. To get whole-word in the above, you would set your prefix to
> "Rogers|" for instance. Or you may want to leave off the "|" to see
> more of an autocomplete-type response.
>
> Then, of course, when you display this you need to only display what's
> after the "|" (or whatever delimiter you use).
>
> One other note, this will be case sensitive, so you probably want to
> do casing yourself, index things like
> rogers|Bill Rogers
> and lowercase what you send in to terms component.
>
> Best,
> Erick
>
>
>
> On Tue, Sep 24, 2013 at 2:01 PM, JMill 
> wrote:
> > Hi,
> >
> > I'm using Solr's Suggester function to implement an autocomplete feature.
> > I have it setup to check against the "username" and "name" fields.
>  Problem
> > is when running  a query against the name, the second term, after
> > whitespace (surename) returns 0 results.  Works if if query is a partial
> > name starting from the begining e.g. Given the name "Bill Rogers", a
> query
> > for Rogers will return 0 results whereas a query for "Bill" will return
> > positive (Bill Rogers). As for the username, it's not working at.
> >
> > I am after the following behaviour.
> >
> > Match any partial words in the fields "username" or "name" and return the
> > results.  If there is match in the field "name" the return the whole name
> > e.g. given the queries "Rogers" or "Bill"" return "Bill Rogers (not the
> > single word that was a match)".
> >
> > schema.xml extract
> > ..
> >  />
> >  
> >  > multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
> > ...
> > 
> > 
> > ...
> >
> >  > positionIncrementGap="100">
> >  
> >
> >
> >
> >  
> > 
> >
> >
> > solrconfig.xml
> >
> > 
> > 
> >suggest
> >org.apache.solr.spelling.suggest.Suggester
> > > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
> >autocomplete  
> >0.005
> >true
> >
> > 
> >
> > 
> >
> > ..
> >  > name="/suggest">
> >   
> > true
> > suggest
> > true
> > 5
> > true
> >   
> >   
> >  spellcheck
> >   
> > 
>
> 
> III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero
> del 2014. Ver www.uci.cu
>
> 
> III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero
> del 2014. Ver www.uci.cu
>


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-25 Thread Ing. Jorge Luis Betancourt Gonzalez
I've used a separated core for storing suggestions, based on what I see in: 
https://github.com/cominvent/autocomplete. You can check the blog post on 
www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/‎. This is 
really flexible, on the downside it does not use the suggester component su 
this are like regular queries against a separated core.

Greetings!

- Mensaje original -
De: "Erick Erickson" 
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 25 de Septiembre 2013 6:16:51
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I've sometimes seen this handled by clever tokenizing. For "Bill Rogers",
index (untokenized) something like
Bill|Bill Rogers
Rogers|Bill Rogers

Your suggester then is a simple term lookup (see TermsComponent)
which is quite fast. What you _don't_ get is autocorrect. But if you
use terms.prefix, you can also control whether it's whole word match
or not. To get whole-word in the above, you would set your prefix to
"Rogers|" for instance. Or you may want to leave off the "|" to see
more of an autocomplete-type response.

Then, of course, when you display this you need to only display what's
after the "|" (or whatever delimiter you use).

One other note, this will be case sensitive, so you probably want to
do casing yourself, index things like
rogers|Bill Rogers
and lowercase what you send in to terms component.

Best,
Erick



On Tue, Sep 24, 2013 at 2:01 PM, JMill  wrote:
> Hi,
>
> I'm using Solr's Suggester function to implement an autocomplete feature.
> I have it setup to check against the "username" and "name" fields.  Problem
> is when running  a query against the name, the second term, after
> whitespace (surename) returns 0 results.  Works if if query is a partial
> name starting from the begining e.g. Given the name "Bill Rogers", a query
> for Rogers will return 0 results whereas a query for "Bill" will return
> positive (Bill Rogers). As for the username, it's not working at.
>
> I am after the following behaviour.
>
> Match any partial words in the fields "username" or "name" and return the
> results.  If there is match in the field "name" the return the whole name
> e.g. given the queries "Rogers" or "Bill"" return "Bill Rogers (not the
> single word that was a match)".
>
> schema.xml extract
> ..
> 
>  
>  multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />
> ...
> 
> 
> ...
>
>  positionIncrementGap="100">
>  
>
>
>
>  
> 
>
>
> solrconfig.xml
>
> 
> 
>suggest
>org.apache.solr.spelling.suggest.Suggester
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup
>autocomplete  
>0.005
>true
>
> 
>
> 
>
> ..
>  name="/suggest">
>   
> true
> suggest
> true
> 5
> true
>   
>   
>  spellcheck
>   
> 

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu


Re: Suggest and Filtering

2013-06-13 Thread Ing. Jorge Luis Betancourt Gonzalez
If is query suggestion what you are looking for, what we've done is storing the 
user queries into a separated core and pull the suggestions from there. 

- Mensaje original -
De: "Brendan Grainger" 
Para: solr-user@lucene.apache.org
Enviados: Jueves, 13 de Junio 2013 19:43:03
Asunto: Suggest and Filtering

Hi Solr Guru's

I am trying to implement auto suggest where solr would suggest several
phrases that would return results as the user types in a query (as distinct
from autocomplete). e.g. say the user starts typing 'br' and we have
documents that contain "brake pads" and "left disc brake", solr would
suggest both of those phrases with "brake pads" first. I also want to only
look at documents that match a given filter query. So say I have a bunch of
documents for a toyota cressida that contain the bi-gram "brake pads",
while the documents for a honda accord don't have any brake pad articles.
If the user is filtering on the honda accord I wouldn't want "brake pads"
as a suggestion.

Right now, I've played with the suggest component and using faceting.

Any thoughts?

Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com

http://www.uci.cu
http://www.uci.cu


Re: Querying only for "+" character causes org.apache.lucene.queryParser.ParseException

2013-04-24 Thread Jorge Luis Betancourt Gonzalez
One more thing:

The hack that you commented when the query is a combination of restricted query 
operators such +-, +, --++--+%, etc? In this cases the application has to 
deal with all this cases to.

Greetings!

- Mensaje original -
De: "Jérôme Étévé" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 10:44:39
Asunto: Re: Querying only for "+" character causes 
org.apache.lucene.queryParser.ParseException

If you want to allow your users to search for '+' , you also define your
'+' as being a regular ALPHA characters:

In config:

delimiter_types.txt:

#
# We let +, # and * be part of normal words.
# This is to let c++, c#, c* and R&D as words.
#
+ => ALPHA
 # => ALPHA
* => ALPHA
& => ALPHA
@ => ALPHA

Then in your solr.WordDelimiterFilterFactory,
use types="delimiter_types.txt"


You'll then be able to let your users search for + as part of a word.

If you want to allow them to search for just '+' , a little hacking is
necessary in your client code. Personally, I just  double quote the query
if it's only one char length. Can't be harmful and as it will turn your
single + into "+" , it will be considered as a token (rather than being
part of the query syntax) by the parser.

Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...


J.


On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez
wrote:

> Hi Kai:
>
> Thanks for your reply, for what I've understood this logic must be
> included in my application, It would be possible to, for instance, use some
> regular expression at querying time in my schema to avoid a query that
> contains only this characters? for instance + and + would be a good
> catch to avoid.
>
> Thanks in advance!
>
> - Mensaje original -
> De: "Kai Becker" 
> Para: solr-user@lucene.apache.org
> Enviados: Martes, 23 de Abril 2013 9:48:26
> Asunto: Re: Querying only for "+" character causes
> org.apache.lucene.queryParser.ParseException
>
> Hi,
>
> you need to escape that char in search terms.
> Special chars are + - ! ( ) { } [ ] ^ " ~ * ? : \ / at the moment.
>
> The %2B is just the url encoding, but it will still be a + for Solr, so
> just put a \ in front of the chars I mentioned.
>
> Cheers,
> Kai
>
> Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:
>
> > Hi!
> >
> > Currently I'm working on a basica search engine for, the main problem is
> that during some tests a problem was detected, in the application if a user
> search for the "+" or "-" term only or the "+" string it causes an
> exception in my application, the problem is caused for an
> org.apache.lucene.queryParser.ParseException in solr. I get the same
> response if, from the solr admin interface, I search for the + term. For
> what I've seen the "+" character gets encoded into "%2B" which cause the
> exception. Is there any way of escaping this character so they behave like
> any other character? or at least get no response for this cases?
> >
> > I'm using solr 3.6.2, deployed in tomcat7.
> >
> > Greetings!
> > http://www.uci.cu
>
> http://www.uci.cu
> http://www.uci.cu
>



--
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

http://www.uci.cu
http://www.uci.cu


Re: Querying only for "+" character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez
Hi Jérôme:

Thanks for your suggestion Jérôme, I'll do as you told me for allowing the 
search of this specific tokens. I've also taked into account the option of add 
the "quote if lenght is 1" in the application level, but I would like to keep 
this logic inside of Solr (if possible), this is why I was thinking of some 
kind of replace regular expresion at query time, so if this change in the 
future it won't require also changing the application level, can you advice me 
on this?

Greetings!

- Mensaje original -
De: "Jérôme Étévé" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 10:44:39
Asunto: Re: Querying only for "+" character causes 
org.apache.lucene.queryParser.ParseException

If you want to allow your users to search for '+' , you also define your
'+' as being a regular ALPHA characters:

In config:

delimiter_types.txt:

#
# We let +, # and * be part of normal words.
# This is to let c++, c#, c* and R&D as words.
#
+ => ALPHA
 # => ALPHA
* => ALPHA
& => ALPHA
@ => ALPHA

Then in your solr.WordDelimiterFilterFactory,
use types="delimiter_types.txt"


You'll then be able to let your users search for + as part of a word.

If you want to allow them to search for just '+' , a little hacking is
necessary in your client code. Personally, I just  double quote the query
if it's only one char length. Can't be harmful and as it will turn your
single + into "+" , it will be considered as a token (rather than being
part of the query syntax) by the parser.

Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...


J.


On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez
wrote:

> Hi Kai:
>
> Thanks for your reply, for what I've understood this logic must be
> included in my application, It would be possible to, for instance, use some
> regular expression at querying time in my schema to avoid a query that
> contains only this characters? for instance + and + would be a good
> catch to avoid.
>
> Thanks in advance!
>
> - Mensaje original -
> De: "Kai Becker" 
> Para: solr-user@lucene.apache.org
> Enviados: Martes, 23 de Abril 2013 9:48:26
> Asunto: Re: Querying only for "+" character causes
> org.apache.lucene.queryParser.ParseException
>
> Hi,
>
> you need to escape that char in search terms.
> Special chars are + - ! ( ) { } [ ] ^ " ~ * ? : \ / at the moment.
>
> The %2B is just the url encoding, but it will still be a + for Solr, so
> just put a \ in front of the chars I mentioned.
>
> Cheers,
> Kai
>
> Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:
>
> > Hi!
> >
> > Currently I'm working on a basica search engine for, the main problem is
> that during some tests a problem was detected, in the application if a user
> search for the "+" or "-" term only or the "+" string it causes an
> exception in my application, the problem is caused for an
> org.apache.lucene.queryParser.ParseException in solr. I get the same
> response if, from the solr admin interface, I search for the + term. For
> what I've seen the "+" character gets encoded into "%2B" which cause the
> exception. Is there any way of escaping this character so they behave like
> any other character? or at least get no response for this cases?
> >
> > I'm using solr 3.6.2, deployed in tomcat7.
> >
> > Greetings!
> > http://www.uci.cu
>
> http://www.uci.cu
> http://www.uci.cu
>



--
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

http://www.uci.cu
http://www.uci.cu


Re: Querying only for "+" character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez
Hi Kai:

Thanks for your reply, for what I've understood this logic must be included in 
my application, It would be possible to, for instance, use some regular 
expression at querying time in my schema to avoid a query that contains only 
this characters? for instance + and + would be a good catch to avoid.

Thanks in advance!

- Mensaje original -
De: "Kai Becker" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 9:48:26
Asunto: Re: Querying only for "+" character causes 
org.apache.lucene.queryParser.ParseException

Hi,

you need to escape that char in search terms.
Special chars are + - ! ( ) { } [ ] ^ " ~ * ? : \ / at the moment.

The %2B is just the url encoding, but it will still be a + for Solr, so just 
put a \ in front of the chars I mentioned.

Cheers,
Kai

Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

> Hi!
> 
> Currently I'm working on a basica search engine for, the main problem is that 
> during some tests a problem was detected, in the application if a user search 
> for the "+" or "-" term only or the "+" string it causes an exception in 
> my application, the problem is caused for an 
> org.apache.lucene.queryParser.ParseException in solr. I get the same response 
> if, from the solr admin interface, I search for the + term. For what I've 
> seen the "+" character gets encoded into "%2B" which cause the exception. Is 
> there any way of escaping this character so they behave like any other 
> character? or at least get no response for this cases? 
> 
> I'm using solr 3.6.2, deployed in tomcat7.
> 
> Greetings! 
> http://www.uci.cu

http://www.uci.cu
http://www.uci.cu


Querying only for "+" character causes org.apache.lucene.queryParser.ParseException

2013-04-23 Thread Jorge Luis Betancourt Gonzalez
Hi!

Currently I'm working on a basica search engine for, the main problem is that 
during some tests a problem was detected, in the application if a user search 
for the "+" or "-" term only or the "+" string it causes an exception in my 
application, the problem is caused for an 
org.apache.lucene.queryParser.ParseException in solr. I get the same response 
if, from the solr admin interface, I search for the + term. For what I've seen 
the "+" character gets encoded into "%2B" which cause the exception. Is there 
any way of escaping this character so they behave like any other character? or 
at least get no response for this cases? 

I'm using solr 3.6.2, deployed in tomcat7.

Greetings! 
http://www.uci.cu


Re: Getting better snippets in highlighting component

2013-03-29 Thread Jorge Luis Betancourt Gonzalez
Hi Jack:

Thanks for the reply, exactly I know is a common thing to encounter this TOC in 
a lot of files, I'm plying with the regex fragmenter to be a little more 
selective about the generated snippets, but no luck so far.

- Mensaje original -
De: "Jack Krupansky" 
Para: solr-user@lucene.apache.org
Enviados: Sábado, 30 de Marzo 2013 0:40:03
Asunto: Re: Getting better snippets in highlighting component

It looks like a table of contents. The dots are followed by the page number,
followed by the text from the next table of contents entry, and repeat.

Even Google doesn't do anything special for this. For example, search for
"chapter 1 chapter 2 pdf":

[PDF]
2013 Publication 505 - Internal Revenue Service
www.irs.gov/pub/irs-pdf/p505.pdfFile Format: PDF/Adobe Acrobat
Mar 21, 2013 – Introduction . . . . . . . . . . . . . . . . . . 1. What's
New for 2013 . . . . . . . . . . . . . 2. Reminders . . . . . . . . . . . .
. . . . . . . 2. Chapter 1. Tax Withholding for ...

I'm sure somebody can come up with a clever heuristic to avoid this kind of
thing.

Maybe simply truncate any sequence of white space and only punctuation down
to two or three characters or so.

-- Jack Krupansky
-----Original Message-----
From: Jorge Luis Betancourt Gonzalez
Sent: Friday, March 29, 2013 10:34 PM
To: solr-user@lucene.apache.org
Subject: Getting better snippets in highlighting component

Hi all:

I'm building a document search plattform, basically indexing a lot of PDF
files. Some of this files has an index, which means that when I query for
"normativos" in my application (built using Symfony2+PHP+Solarium) I get a 
few results like this:

10
6.2 Elementos normativos generales
12
6.3 Elementos normativos técnicos
..32
ANEXOS A Formas verbales (normativo

Which is a bit of a problem, is there any way I can get rid of this dots? Is
there any sort of relevance in the snippets that the highlighting components
returns? I mean in this particular case, the snippet came from the index
page of the PDF which I hardly think is the best snippet in the document for
this particular query, any thought on this? Is there any "golden rule" to
treat cases like this?

Thanks a lot!
http://www.uci.cu

http://www.uci.cu
http://www.uci.cu


Getting better snippets in highlighting component

2013-03-29 Thread Jorge Luis Betancourt Gonzalez
Hi all:

I'm building a document search plattform, basically indexing a lot of PDF 
files. Some of this files has an index, which means that when I query for 
"normativos" in my application (built using Symfony2+PHP+Solarium) I get a few 
results like this:

10
 6.2 Elementos normativos generales 
12
 6.3 Elementos normativos técnicos 
..32
 ANEXOS A Formas verbales (normativo

Which is a bit of a problem, is there any way I can get rid of this dots? Is 
there any sort of relevance in the snippets that the highlighting components 
returns? I mean in this particular case, the snippet came from the index page 
of the PDF which I hardly think is the best snippet in the document for this 
particular query, any thought on this? Is there any "golden rule" to treat 
cases like this?

Thanks a lot!
http://www.uci.cu


Re: Question about email search

2013-03-14 Thread Jorge Luis Betancourt Gonzalez
Sorry for the duplicated mail :-(, any advice on a configuration for searching 
emails in a field that does not have only email addresses, so the email 
addresses are contained in larger textual messages?

- Mensaje original -
De: "Ahmet Arslan" 
Para: solr-user@lucene.apache.org
Enviados: Jueves, 14 de Marzo 2013 11:23:47
Asunto: Re: Question about email search

Hi,

Since you have word delimiter filter in your analysis chain, I am not sure if 
e-mail addresses are recognised. You can check that on solr admin UI, analysis 
page.

If e-mail addresses kept one token, I would use leading wildcard query.
&q=*@gmail.com

There was a similar question recently:
http://search-lucene.com/m/XF2ejnM6Vi2

--- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez  wrote:

> From: Jorge Luis Betancourt Gonzalez 
> Subject: Question about email search
> To: solr-user@lucene.apache.org
> Date: Thursday, March 14, 2013, 5:11 PM
> I'm using solr 3.6.2 to crawl some
> data using nutch, in my schema I've one field with all the
> content extracted from the page, which could possibly
> include email addresses, this is the configuration of my
> schema:
>
>          class="solr.TextField"
>            
> positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>              type="index">
>                
> 
>                
> 
>                
> 
>                
>  languange="Spanish"/>
>                
> 
>                
>                 
>     ignoreCase="true" words="stopwords.txt"/>
>                
>                 
>     generateWordParts="1"
> generateNumberParts="1"   
>                
>     catenateWords="1" catenateNumbers="1"
> catenateAll="0"
>                
>     splitOnCaseChange="1"/>
>                
> 
>                
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
>             
>         
>
> The thing is that I'm trying to search against a field of
> this type (text) with a value like "@gmail.com" and I'm
> intended to get documents with that text, any advice?
>
> slds
> --
> "It is only in the mysterious equation of love that any
> logical reasons can be found."
> "Good programmers often confuse halloween (31 OCT) with
> christmas (25 DEC)"
>
>


Question about email search

2013-03-14 Thread Jorge Luis Betancourt Gonzalez
I'm using solr 3.6.2 to crawl some data using nutch, in my schema I've one 
field with all the content extracted from the page, which could possibly 
include email addresses, this is the configuration of my schema:















The thing is that I'm trying to search against a field of this type (text) with 
a value like "@gmail.com" and I'm intended to get documents with that text, any 
advice?

slds
--
"It is only in the mysterious equation of love that any 
logical reasons can be found."
"Good programmers often confuse halloween (31 OCT) with 
christmas (25 DEC)"



Re: Using suggester for smarter phrase autocomplete

2013-03-13 Thread Jorge Luis Betancourt Gonzalez
Currently I'm using a separated core to query suggestions, for this I've 
started from: https://github.com/cominvent/autocomplete. Basically the 
suggester component I'm only using it for term suggestions based on the a 
tokenized field in my schema (all of this in solr 3.6), perhaps instead of 
using the suggester component could you use a more similar approach (more like 
the one on the github repo).

- Mensaje original -
De: "Eric Wilson" 
Para: solr-user@lucene.apache.org
Enviados: Miércoles, 13 de Marzo 2013 13:11:05
Asunto: Re: Using suggester for smarter phrase autocomplete

I'm not concerned about stopwords, rather the situation where the first and
second words are rarely used together, so don't occur together in a phrase
in the dictionary. Thanks.

On Wed, Mar 13, 2013 at 11:11 AM, Robert Muir  wrote:

> On Wed, Mar 13, 2013 at 11:07 AM, Eric Wilson 
> wrote:
> > I'm trying to use the suggester for auto-completion with Solr 4. I have
> > followed the example configuration for phrase suggestions at the bottom
> of
> > this wiki page:
> > http://wiki.apache.org/solr/Suggester<
> https://mail.manta.com/owa/redir.aspx?C=a570b5bb74f64f4fb810ba260e304ec5&URL=http%3a%2f%2fwiki.apache.org%2fsolr%2fSuggester
> >
> >
> > This shows how to use a text file with the following text for phrase
> > suggestions:
> >
> > # simple auto-suggest phrase dictionary for testing
> > # note this uses tabs as separator!
> > the first phrase1.0
> > the second phrase   2.0
> > testing 12343.0
> > foo 5.0
> > the fifth phrase2.0
> > the final phrase4.0
> >
> > This seems to be working in the expected way. If I query for "the f" I
> > receive the following suggestions:
> >
> >  the final phrase
> >  the fifth phrase
> >  the first phrase
> >
> > I would like to deal with the case where the user is interested in "the
> > foo". When "the fo" is entered, there will be no suggestions. Is it
> > possible to provide both the phrase matches, and the matches for
> individual
> > words, so that when the user entered text is no longer part of any actual
> > phrase, there are still suggestions to be made for the final word?
> >
>
> Is it really the case that you want matches for individual words, or
> just to handle e.g. the stopwords case like 'the fo' -> foo ?
>
> the latter can be done with analyzingsuggester (configure a stopfilter
> on the analyzer).
>


Re: Building a central index with Lucene + Solr

2013-03-05 Thread Jorge Luis Betancourt Gonzalez
Agreed, PHP and Solr are an excellent combination. I'm using Solr 3.6 + PHP 
(Symfony2 + NelmioSolariumBundle + Solarium) and getting excellent results. 
Even solarium as a PHP library is great, right now it lack's of solr4 support, 
but for solr 3.6 it's great.

- Mensaje original -
De: "David Quarterman" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 5 de Marzo 2013 10:56:18
Asunto: RE: Building a central index with Lucene + Solr

Hi Alvaro,

I agree with Otis & Alexandre (esp. Windows + PHP!). However, there are plenty 
of people using Solr & PHP out there very successfully. There's another good 
package at http://code.google.com/p/solr-php-client/ which is easy to implement 
and has some example usage.

Regards,

DQ



From: Álvaro Vargas Quezada [mailto:al...@outlook.com]
Sent: 05 March 2013 14:53
To: solr-user@lucene.apache.org
Subject: Building a central index with Lucene + Solr



Hi everyone!



I'm trying to develop a central index, I installed Solr and I reach the screen 
that I attach. But the problem is that I don't know how to continue since this 
point, I wanted to develop an app in php which use Solr, but I don't know how, 
anyone that can help me maybe with a tutorial or something like that?



Thanks and greetz from Chile!





Custom update handler

2013-02-08 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm trying to build a custom update handler to accomplish one specific task. In 
our app we do query suggestions based on previous queries passed into our 
frontend app, the thing is that instead of getting this queries from the solr 
logs, we stored in a separated core. So far so good, but one particular 
requirement is that not every query typed by the users in the search box 
appears as a suggestion, only the more popuparls. For this we created a field 
in the schema called count. And write code in out frontend to increase this 
value, to be honest we don't like this. So we came up with an idea of writing a 
custom update handler that before store the query in the index, checks if the 
query exists and then add 1 to the counter. 

The thing is that right now we have set up a dedupe component to avoid storing 
very similar queries, is there any way of accessing the dedupe component from 
the custom update handler? Is there any documentation I can check out to see 
anything similar to this?

Greetings

Re: Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
Thanks for the advice the thing with this approach is that we are using nutch 
as our crawler for the intranet, and right now, doing this (indexing one 
crawled document as several solr documents) it's not possible without changing 
the way nutch works. Is there any other workaround this?

Thanks for the replies!

- Mensaje original -
De: "Upayavira" 
Para: solr-user@lucene.apache.org
Enviados: Martes, 5 de Febrero 2013 9:05:58
Asunto: Re: Indexing several parts of PDF file

This would involve you querying against every page in your document,
which will be too many fields and will break quickly.

The best way to do it is to index pages as documents. You can use field
collapsing to group pages from the same document together.

Upayavira

On Tue, Feb 5, 2013, at 02:00 PM, Jorge Luis Betancourt Gonzalez wrote:
> Hi:
> 
> I'm working on a search engine for several PDF documents, right now one
> of the requirements is that we can provide not only the documents
> matching the search criteria but the page that match the criteria.
> Normally tika only extracts the text content and does not do this
> distinction, but using some custom library this could be achieve, but my
> question is how to structure the schema. For what I've seen one approach
> could be the use dynamic fields:
> 
> 
> 
> So at query time I could extract the page number from the fields name. Is
> this the best approach? Is there any form of storing the number page into
> an attribute and not using the dynamic fields?
> 
> Thanks in advance!
> 
> Greetings
> --
> "It is only in the mysterious equation of love that any 
> logical reasons can be found."
> "Good programmers often confuse halloween (31 OCT) with 
> christmas (25 DEC)"


Indexing several parts of PDF file

2013-02-05 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm working on a search engine for several PDF documents, right now one of the 
requirements is that we can provide not only the documents matching the search 
criteria but the page that match the criteria. Normally tika only extracts the 
text content and does not do this distinction, but using some custom library 
this could be achieve, but my question is how to structure the schema. For what 
I've seen one approach could be the use dynamic fields:



So at query time I could extract the page number from the fields name. Is this 
the best approach? Is there any form of storing the number page into an 
attribute and not using the dynamic fields?

Thanks in advance!

Greetings
--
"It is only in the mysterious equation of love that any 
logical reasons can be found."
"Good programmers often confuse halloween (31 OCT) with 
christmas (25 DEC)"


Re: Migrating from Solr 3.6.1 to Solr 4

2013-01-05 Thread Jorge Luis Betancourt Gonzalez
So, from my "php app point of view" if I have the desire of using solrcloud 
feautures changes will be needed right? One more thing the responses generated 
from solr4 are in any way different from the ones generated from solr3? Because 
solarium parses the JSON response from the server to provide high level objects 
encapsulating the response and response content.

Greetings!

- Mensaje original -
De: "Upayavira" 
Para: solr-user@lucene.apache.org
Enviados: Sábado, 5 de Enero 2013 4:49:01
Asunto: Re: Migrating from Solr 3.6.1 to Solr 4

Try pointing your app at 4.0. I converted an app recently. Here's the
steps I took (as I recall):

 * get original solrconfig.xml for the release I'm using
 * diff that and my solrconfig.xml
 * apply those changes to a 4.0 solrconfig.xml
 * try to start up solr with this new solrconfig and an old schema and
 an old index
 * fix each problem you find in the schema
- some class names have changed
- you may want to delete some field definitions that you're not
using
- you'll need to copy the version field from the 4.0 schema

I found my app was able to search/index without any difficulty via the
XML/HTTP interface.

Your mileage may vary, but for that particular app, that is what it
took.

Note, 4.0 can work in a 3.x way (old style replication, etc). You don't
need to use SolrCloud etc when using 4.0.

Upayavira

On Sat, Jan 5, 2013, at 08:20 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi:
>
> I'm currently working with solr 3.6.1, but solr 4 has great features like
> the ones bundled with SolrCloud, the content in the index is really not
> the problem to the transition, the thing is that I've a large app written
> in PHP + Solarium that interacts with the index in solr 3. As far as I
> know there is no support for solr 4 in solarium. So my question is is
> possible to use a solr 3.6.1 fronted that gets the data from a solr 4
> behind scenes, or there is any other workaround this?
>
> Greetings!
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Migrating from Solr 3.6.1 to Solr 4

2013-01-05 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm currently working with solr 3.6.1, but solr 4 has great features like the 
ones bundled with SolrCloud, the content in the index is really not the problem 
to the transition, the thing is that I've a large app written in PHP + Solarium 
that interacts with the index in solr 3. As far as I know there is no support 
for solr 4 in solarium. So my question is is possible to use a solr 3.6.1 
fronted that gets the data from a solr 4 behind scenes, or there is any other 
workaround this?

Greetings!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Dedup component

2012-12-15 Thread Jorge Luis Betancourt Gonzalez
Exist any similar approach that I could use in solr 3.6.1 or should I add this 
logic to my application?

- Mensaje original -
De: "Upayavira" 
Para: solr-user@lucene.apache.org
Enviados: Sábado, 15 de Diciembre 2012 12:37:11
Asunto: Re: Dedup component

Nope, it is a Solr 4.0 thing. In order for it to work, you need to store
every field, as what it does behind the scenes is retrieve the stored
fields, rebuilds the document, and then posts the whole document back.

Upayavira

On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote:
> Is this updatable fields available in Solr 3.6.1, is the one I'm using
> right now.
>
> - Mensaje original -
> De: "Upayavira" 
> Para: solr-user@lucene.apache.org
> Enviados: Sábado, 15 de Diciembre 2012 7:56:45
> Asunto: Re: Dedup component
>
> Make the ID field out of the query text so you don't have to use the
> dedup component, then use the updatable fields functionality in Solr
> 4.0:
>
> $ curl http://localhost:8983/solr/update -H
> 'Content-type:application/json' -d '
> [
>  {"id": "book1",
>   "copies_i"  : { "inc" : 1},
>   "cat"   : { "add" : "fantasy"},
>   "ISBN_s"    : { "set" : "0-380-97365-0"}
>   "remove_s"  : { "set" : null } }
> ]'
>
> /* example stolen from Yonik's ApacheCon talk */
>
> Upayavira
>
>
> On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
> > Hi all:
> >
> > I'm trying to build a query suggestion system using solr (also used to
> > index all the data in the app). I've a separated core dedicated only for
> > this purpose (along with some other for images, etc.). In the main app,
> > written in Symfoy2 + Solarium Bundle, we store the queries in this core,
> > to prevent the indexing of duplicated queries, I use the dedup component:
> >
> > 
> > 
> >  > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> > true
> > false
> > signature
> > textsuggest,textng
> > 
> > org.apache.solr.update.processor.TextProfileSignature
> > 
> > 
> > 
> > 
> > 
> >
> > Which prevent the store of very similar queries, but with this
> > configuration, but what I really trying to accomplish is to increment a
> > count (popularity) field when the same query is sent to solr.
> >
> > Any thought on this?
> >
> > Greetings!
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Dedup component

2012-12-15 Thread Jorge Luis Betancourt Gonzalez
Is this updatable fields available in Solr 3.6.1, is the one I'm using right 
now.

- Mensaje original -
De: "Upayavira" 
Para: solr-user@lucene.apache.org
Enviados: Sábado, 15 de Diciembre 2012 7:56:45
Asunto: Re: Dedup component

Make the ID field out of the query text so you don't have to use the
dedup component, then use the updatable fields functionality in Solr
4.0:

$ curl http://localhost:8983/solr/update -H
'Content-type:application/json' -d '
[
 {"id": "book1",
  "copies_i"  : { "inc" : 1},
  "cat"   : { "add" : "fantasy"},
  "ISBN_s": { "set" : "0-380-97365-0"}
  "remove_s"  : { "set" : null } }
]'

/* example stolen from Yonik's ApacheCon talk */

Upayavira


On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi all:
>
> I'm trying to build a query suggestion system using solr (also used to
> index all the data in the app). I've a separated core dedicated only for
> this purpose (along with some other for images, etc.). In the main app,
> written in Symfoy2 + Solarium Bundle, we store the queries in this core,
> to prevent the indexing of duplicated queries, I use the dedup component:
>
> 
> 
>  class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> true
> false
> signature
> textsuggest,textng
> 
> org.apache.solr.update.processor.TextProfileSignature
> 
> 
> 
> 
> 
>
> Which prevent the store of very similar queries, but with this
> configuration, but what I really trying to accomplish is to increment a
> count (popularity) field when the same query is sent to solr.
>
> Any thought on this?
>
> Greetings!
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Dedup component

2012-12-14 Thread Jorge Luis Betancourt Gonzalez
Hi all:

I'm trying to build a query suggestion system using solr (also used to index 
all the data in the app). I've a separated core dedicated only for this purpose 
(along with some other for images, etc.). In the main app, written in Symfoy2 + 
Solarium Bundle, we store the queries in this core, to prevent the indexing of 
duplicated queries, I use the dedup component:




true
false
signature
textsuggest,textng

org.apache.solr.update.processor.TextProfileSignature






Which prevent the store of very similar queries, but with this configuration, 
but what I really trying to accomplish is to increment a count (popularity) 
field when the same query is sent to solr.

Any thought on this?

Greetings!

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Solr PHP client

2012-12-14 Thread Jorge Luis Betancourt Gonzalez
Hi Guillaume:

I beg to differ, it's true that the native solr support has been a big aid to 
developers use of solr from many programming languages. But making all the 
queries "by hand" is not wice and in any case is hard to maintain, it's easier 
using some OO library to interact with solr. For instance, I'm using right now 
Solarium to interact with Solr 3.6.1 within a Symfony2 app, in this particular 
scenario the Solarium handles all the interaction with the solr server. I work 
in my code with classes and beneath solarium "talks" json with the solr server. 
My point is that the ability of solr to "talk" a lot of standard formats it's a 
huge plus, but having a library that handles the heavy stuffs with the server 
keeps your code clean.

Greetings,

- Mensaje original -
De: "Guillaume Rossolini" 
Para: solr-user@lucene.apache.org
Enviados: Viernes, 14 de Diciembre 2012 3:22:41
Asunto: Re: Solr PHP client

Hi,

The various Solr PHP clients have been a great help in the past, and I do
not mean to belittle their efforts.
However, the Solr project has made many efforts to support several input
and output data formats, including JSON and even serialized PHP, which are
fairly easy to implement. Maybe I am mistaken, but I am not sure any PHP
client (as an extension or as a library) would actually help much any more.

Regards,

--
I N S T A N T  |  L U X E - 40 Rue D'Aboukir - 75002 Paris - France



On Fri, Dec 14, 2012 at 8:23 AM, Romita Saha
wrote:

> Hi,
>
> Can anyone please guide me to use SolrPhpClient? The documents available
> are not clear. As to where to place SolrPhpClient?
>
> I have downloaded SolrPhpClient and have changed the following lines,
> specifying the path (where the files are present in my computer)
>
>
> require_once('/home/solr/SolrPhpClient/Apache/Solr/Document.php./Document.php');
>
> require_once('/home/solr/SolrPhpClient/Apache/Solr/Document.php./Response.php');
>
> After this I am unable to proceed. What and how should I index my
> documents now. How should I start my solr. Where to place the conf files.
> I see there are few html documents inside the folder
> "SolrPhpClien/phpdocs".
>
> Could someone please help.
>
> Thanks and regards,
> Romita


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: PHP client

2012-12-07 Thread Jorge Luis Betancourt Gonzalez
Any news on Solarium Project? Is the one I'm using with Solr 3.6!

- Mensaje original -
De: "Bill Au" 
Para: solr-user@lucene.apache.org, "Arkadi Colson" 
Enviados: Viernes, 7 de Diciembre 2012 13:40:20
Asunto: Re: PHP client

I have not used the pecl Solr client.  I have been using SolrPhpClient.  I
came across this patch for pecl when I was researching php client for Solr
4.0.  SolrPhpClient has the same problem with 4.0 that this patch addresses.

Bill


On Fri, Dec 7, 2012 at 11:00 AM, Arkadi Colson  wrote:

> Thanks for the info!
>
> Do you know if it'spossible to use file uploads to Tika with this client?
>
>
> On 12/03/2012 03:56 PM, Bill Au wrote:
>
>> https://bugs.php.net/bug.php?**id=62332
>>
>> There is a fork with patches applied.
>>
>>
>> On Mon, Dec 3, 2012 at 9:38 AM, Arkadi Colson  wrote:
>>
>>  Hi
>>>
>>> Anyone tested the pecl Solr Client in combination with SolrCloud? I seems
>>> to be broken since 4.0
>>>
>>> Best regard
>>> Arkadi
>>>
>>>
>>>
>>
>>
> --
> Met vriendelijke groeten
>
> Arkadi Colson
>
> Smartbit bvba . Hoogstraat 13 . 3670 Meeuwen
> T +32 11 64 08 80 . F +32 11 64 08 81
>
>


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Prevent indexing documents with some terms

2012-12-07 Thread Jorge Luis Betancourt Gonzalez
Hi:

Is there any way that I can prevent a document from being indexed? I've a 
separated core only for query suggestions, this queries are stored right from 
the frontend app, so I'm trying to prevent some kind of bad intended queries to 
be stored in my query, but keeping the logic of what I consider "bad intended" 
out of the fronted application. The stop words only prevent to store some words 
in the index, but there is any way of prevented the storing of the whole 
document?

Greetings!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: News clustering

2012-12-03 Thread Jorge Luis Betancourt Gonzalez
I'm trying to using to search though news websites, but I was interested in 
classification on index time, is there any available solution for this?

Greetings!

On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski  wrote:

>> I mean measuring the similarity between the document in each cluster.
>> Also, difference between document on one cluster with another cluster.
>> 
>> I saw the sample code ClusteringQualityBencmark.java
>> However, I do not know how to make use of it for assessing my Solr
>> Clustering performance.
>> 
> 
> You'd need to write your own code for this, here are the most common
> clustering quality measures you mentioned:
> 
> http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results
> 
> These are meant for the general case (numeric attributes), to apply them to
> texts, you'd need to use the vector representation of the documents.
> 
> One a more general note, synthetic measures test only the document-cluster
> assignments, but none take the quality of labels into account (this is
> really hard to measure objectively).
> 
> Staszek
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Suggester with punctuation signs

2012-11-27 Thread Jorge Luis Betancourt Gonzalez
Hi! Upayavira:

Hi I'm using the standard tokenizer right now, and it's working fine, but I was 
wondering if there is any form I can strip this punctuation marks right in the 
suggest requestHandler, so no need for index again. I've been doing some tests 
and increasing the threshold has improved the accuracy of the suggestions, one 
more thing is that the suggestions are mainly in spanish, so, any "best 
practice" configuration for this? or any standard configuration will do the 
trick?

Thanks!

On Nov 26, 2012, at 6:18 PM, Upayavira  wrote:

> You may want to change your tokenisation anyhow, as a search for
> 'universidad' will not match your term 'universidad,'
> 
> But you are on the right track - to improve suggestions, improve what is
> in your index.
> 
> Upayavira
> 
> On Mon, Nov 26, 2012, at 07:54 PM, Jorge Luis Betancourt Gonzalez wrote:
>> Hi:
>> 
>> I've configured my solr setup to use the suggester component and to get
>> terms suggestions from a PHP application, the thing is that I'm getting
>> results like universidad, note the punctuation sign, is there any way I
>> can get rid of this? Or do I need to create a separate field and strip
>> all punctuation signs?.
>> 
>> Greetings
>> 
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> 
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Suggester with punctuation signs

2012-11-26 Thread Jorge Luis Betancourt Gonzalez
Hi:

I've configured my solr setup to use the suggester component and to get terms 
suggestions from a PHP application, the thing is that I'm getting results like 
universidad, note the punctuation sign, is there any way I can get rid of this? 
Or do I need to create a separate field and strip all punctuation signs?.

Greetings

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci



Re: php client for Solr 4.0.0

2012-11-12 Thread Jorge Luis Betancourt Gonzalez
I'm currently using solarium with solr 3.6, perhaps you can tweak solarium as 
needed? I suppose that pull requests are welcome into solarium for solr 4.

Greetings!

On Nov 12, 2012, at 2:56 PM, Bill Au  wrote:

> Anyone know of a PHP client that is compatible with Solr 4.0.0?  I am using
> an old PHP client that is trying to set the waitFlush parameter on a commit
> so it is failing.
> 
> Bill
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: is it possible to save the search query?

2012-11-08 Thread Jorge Luis Betancourt Gonzalez
I think that solr by him self doesn't store the queries (correct me if I'm 
wrong, about this) but you can accomplish what you want by processing the solr 
log (its the only way I think). From the solr log you can get the queries and 
then process the queries according to your needs, and change the boost 
parameters in your app o solr config. 

On Nov 8, 2012, at 11:32 AM, Otis Gospodnetic  
wrote:

> Hi,
> 
> Aha, I think I understand.  Yes, you could collect all doc IDs from each
> query and find the differences.  There is nothing in Solr that can find
> those differences or that would store doc IDs of returned hits in the first
> place, so you would have to implement this yourself.  Sematext's Search
> Analytics service my be of help here in the sense that all data you
> need (queries, doc IDs, etc.) are collected, so it would be a matter of
> providing an API to get the data for off-line analysis.  But this data
> collection+diffing is also something you could implement yourself.  One
> thing to think about - what do you do when a query returns a lrge
> number of hits.  Do you really want/need to get IDs for all of them, or
> only a page at a time.
> 
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
> 
> 
> On Thu, Nov 8, 2012 at 1:01 AM, Romita Saha 
> wrote:
> 
>> Hi,
>> 
>> The following is the example;
>> 1st query:
>> 
>> 
>> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
>> ^2
>> id&start=0&rows=11&fl=data,id
>> 
>> Next query:
>> 
>> 
>> http://localhost:8983/solr/db/select/?defType=dismax&&debugQuery=on&q=cashier2&qf=data
>> id^2&start=0&rows=11&fl=data,id
>> 
>> In the 1st query the the field 'data' is boosted by 2. However may be the
>> user was not satisfied with the response. Thus in the next query he
>> boosted the field 'id' by 2.
>> 
>> I want to record both the queries and compare between the two, meaning,
>> what are the changes implemented on the 2nd query which are not present in
>> the previous one.
>> 
>> Thanks and regards,
>> Romita Saha
>> 
>> 
>> 
>> From:   Otis Gospodnetic 
>> To: solr-user@lucene.apache.org,
>> Date:   11/08/2012 01:35 PM
>> Subject:Re: is it possible to save the search query?
>> 
>> 
>> 
>> Hi,
>> 
>> Compare in what sense?  An example will help.
>> 
>> Otis
>> --
>> Performance Monitoring - http://sematext.com/spm
>> On Nov 7, 2012 8:45 PM, "Romita Saha" 
>> wrote:
>> 
>>> Hi All,
>>> 
>>> Is it possible to record a search query in solr and then compare it with
>>> the previous search query?
>>> 
>>> Thanks and regards,
>>> Romita Saha
>>> 
>> 
>> 
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Storing queries in Solr

2012-10-08 Thread Jorge Luis Betancourt Gonzalez
Thanks for the quick response, I'm trying to get a suggester query, I found odd 
the being a very common issue solr doesn't provide any built in mechanism for 
query suggestions, but implementing the other components isn't so hard either.

Greetiings!

On Oct 8, 2012, at 3:38 AM, Upayavira wrote:

> Solr has a small query cache, but this does not hold queries for any
> length of time, so won't suit your purpose.
> 
> The LucidWorks Search product has (I believe) a click tracking feature,
> but that is about boosting documents that are clicked on, not specific
> search terms. Parsing the Solr log, or pushing query terms to a
> different core/index would really be the only way to achieve what you're
> suggesting, as far as I am aware.
> 
> Processing logs would be preferable anyhow, as you don't really want to
> be triggering an index write during each query (assuming you have more
> queries than updates to your main index), and also if this is for
> building a suggester index, then it is unlikely to need updating that
> regularly - every hour or every day should be more than sufficient. You
> could write a SearchComponent that logs queries in another format,
> should the existing log format not be sufficient for you.
> 
> Upayavira
> 
> On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote:
>> Hi!
>> 
>> I was wondering if there are any built-in mechanism that allow me to
>> store the queries made to a solr server inside the index itself. I know
>> that the suggester module exist, but as far as I know it only works for
>> terms existing in the index, and not with queries. I remember reading
>> about using some external program to parse the solr log and pushing the
>> queries or any other interesting data into the index, is this the only
>> way of accomplish this?
>> 
>> Greetings!
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> 
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Storing queries in Solr

2012-10-07 Thread Jorge Luis Betancourt Gonzalez
Hi!

I was wondering if there are any built-in mechanism that allow me to store the 
queries made to a solr server inside the index itself. I know that the 
suggester module exist, but as far as I know it only works for terms existing 
in the index, and not with queries. I remember reading about using some 
external program to parse the solr log and pushing the queries or any other 
interesting data into the index, is this the only way of accomplish this?

Greetings!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-05 Thread Jorge Luis Betancourt Gonzalez
Thanks a lot for all the replies, Chris it worked out with this mm value:



If this version of solr is affected with the bug you pointed out, shouldn't 
fail with this value as well?

Greetings!

On Oct 4, 2012, at 8:48 PM, Jorge Luis Betancourt Gonzalez wrote:

> Hi Chris:
> 
> I'm using solr 3.6.1, is the bug present in this version?
> 
> Greetings!
> 
> On Oct 4, 2012, at 6:11 PM, Chris Hostetter wrote:
> 
>> 
>> : GRAVE: java.lang.NumberFormatException: For input string: "
>> :100
>> :"
>> :at 
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>> :at java.lang.Integer.parseInt(Integer.java:470)
>> :at java.lang.Integer.(Integer.java:636)
>> :at 
>> org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691)
>> 
>> What version of Solr are you using?
>> 
>> That looks like a simple parsing bug that seems to have been fixed a while 
>> back (it's definitely not in the 4.0 branch)
>> 
>> can you try eliminating hte whitespace from your XML configured value...
>> 
>>100
>> 
>> ...that should work arround the problem.
>> 
>> 
>> -Hoss
>> 
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> 
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Hi Chris:

I'm using solr 3.6.1, is the bug present in this version?

Greetings!

On Oct 4, 2012, at 6:11 PM, Chris Hostetter wrote:

> 
> : GRAVE: java.lang.NumberFormatException: For input string: "
> : 100
> : "
> : at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> : at java.lang.Integer.parseInt(Integer.java:470)
> : at java.lang.Integer.(Integer.java:636)
> : at 
> org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691)
> 
> What version of Solr are you using?
> 
> That looks like a simple parsing bug that seems to have been fixed a while 
> back (it's definitely not in the 4.0 branch)
> 
> can you try eliminating hte whitespace from your XML configured value...
> 
> 100
> 
> ...that should work arround the problem.
> 
> 
> -Hoss
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Thanks for the quick response, I got the same response, what I'm trying to 
accomplish is to get straight OR between all the clauses or terms in my query, 
the value I should use is 0 right?


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
This is the error:

GRAVE: java.lang.NumberFormatException: For input string: "
100
"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.(Integer.java:636)
at 
org.apache.solr.util.SolrPluginUtils.calculateMinShouldMatch(SolrPluginUtils.java:691)
at 
org.apache.solr.util.SolrPluginUtils.setMinShouldMatch(SolrPluginUtils.java:656)
at 
org.apache.solr.search.DisMaxQParser.getUserQuery(DisMaxQParser.java:210)
at 
org.apache.solr.search.DisMaxQParser.addMainQuery(DisMaxQParser.java:166)
at org.apache.solr.search.DisMaxQParser.parse(DisMaxQParser.java:77)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

This is the parameter in my solrconfig.xml


0


On Oct 4, 2012, at 1:46 PM, Otis Gospodnetic wrote:

> What's the error Jorge?
> 
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
> 
> 
> On Thu, Oct 4, 2012 at 1:36 PM, Jorge Luis Betancourt Gonzalez
>  wrote:
>> Hi:
>> 
>> Thanks for all the replies, right now I have this in my mm parameter:
>> 
>>
>>2<-1 5<-2 6<90%
>>
>> 
>> I'm trying to get an straight OR between all the terms in my query, should I 
>> set the mm parameter to 1? because this gave an error.
>> 
>> Greetings!
>> 
>> On Oct 4, 2012, at 11:06 AM, Jorge Luis Betancourt Gonzalez wrote:
>> 
>>> Hi:
>>> 
>>> I'm having an issue with solr 3.6.1 and I'm sensing that is a lack  of 
>>> understanding. I'm building a search engine, using of course solr to store 
>>> the inverted index, so far so good. When I search for a term, let's say 
>>> "java" I get 761 results, then querying the index with a "php" term give me 
>>> 3194 results found. So if a do a query for java php (without any quotas) I 
>>> suppose that solr will interpret this as an OR between the two terms, 
>>> correct? so the results should be the JOIN between the two subsets of 
>>> results? so can anyone  explain why I get less results searching for the 
>>> last query? java php without any quotes??
>>> 
>>> Thanks in advance!!
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>> 
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>>> 
>>> 10mo. ANIVERSARIO DE LA

Re: Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Hi:

Thanks for all the replies, right now I have this in my mm parameter:


2<-1 5<-2 6<90%


I'm trying to get an straight OR between all the terms in my query, should I 
set the mm parameter to 1? because this gave an error.

Greetings!

On Oct 4, 2012, at 11:06 AM, Jorge Luis Betancourt Gonzalez wrote:

> Hi:
> 
> I'm having an issue with solr 3.6.1 and I'm sensing that is a lack  of 
> understanding. I'm building a search engine, using of course solr to store 
> the inverted index, so far so good. When I search for a term, let's say 
> "java" I get 761 results, then querying the index with a "php" term give me 
> 3194 results found. So if a do a query for java php (without any quotas) I 
> suppose that solr will interpret this as an OR between the two terms, 
> correct? so the results should be the JOIN between the two subsets of 
> results? so can anyone  explain why I get less results searching for the last 
> query? java php without any quotes??
> 
> Thanks in advance!!
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Question about OR operator

2012-10-04 Thread Jorge Luis Betancourt Gonzalez
Hi:

I'm having an issue with solr 3.6.1 and I'm sensing that is a lack  of 
understanding. I'm building a search engine, using of course solr to store the 
inverted index, so far so good. When I search for a term, let's say "java" I 
get 761 results, then querying the index with a "php" term give me 3194 results 
found. So if a do a query for java php (without any quotas) I suppose that solr 
will interpret this as an OR between the two terms, correct? so the results 
should be the JOIN between the two subsets of results? so can anyone  explain 
why I get less results searching for the last query? java php without any 
quotes??

Thanks in advance!!
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci