[Parent] doc transformer

2017-10-30 Thread Aurélien MAZOYER
Hi,

 

Is there in Solr a kind of [parent] doc transformer (like the [child] doc
transformer) that can be used to embed parent’s fields in the response of a
query that uses the block join children query parser?

 

Thank you,

 

Aurélien MAZOYER



RE: Issue with scoreNodes stream expression

2017-09-21 Thread Aurélien MAZOYER
Hi,

Thank you for your advice. It helps me to notice that the exception seems to be 
thrown when no data is gathered by the gatherNodes expression (not a very 
explicit error message ).
I modified the expression and it works well now.

Thank you,

Aurélien 

-Message d'origine-
De : Joel Bernstein [mailto:joels...@gmail.com] 
Envoyé : mercredi 20 septembre 2017 04:11
À : solr-user@lucene.apache.org
Objet : Re: Issue with scoreNodes stream expression

Have you tried running a very simple expression first. For example does this 
run:

random(gettingstarted, q="*:*", fl="id", rows="200")



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Sep 19, 2017 at 4:56 PM, Aurélien MAZOYER < 
aurelien.mazo...@francelabs.com> wrote:

> Hi,
>
>
>
> I wanted to try the new scoreNodes stream expression that is used to 
> make
> recommendations:
>
> https://cwiki.apache.org/confluence/display/solr/Graph+
> Traversal#GraphTraver
> sal-UsingthescoreNodesFunctiontoMakeaRecommendation
>
> but encountered some issue with it.
>
>
>
> The following steps can easily reproduce the problem:
>
> I started Solr (6.6.1) in cloud mode :
>
> solr -e cloud -noprompt
>
> then run the following command in exampledocs to index the sample data :
>
> java -Dc=gettingstarted -jar post.jar *.xml
>
> and to finish copy/paste the following expression in the stream tab:
>
> scoreNodes(top(n=25,
>
>   sort="count(*) desc",
>
>   nodes(gettingstarted,
>
>  random(gettingstarted, q="*:*", 
> fl="id", rows="200"),
>
>  walk="id->id",
>
>  gather="id",
>
>  count(*
>
> (yes I now that my stream expression does nothing usefull :-P).
>
> Anyway, I got the following exception when I run the query:
>
> "EXCEPTION": "org.apache.solr.client.solrj.SolrServerException: No 
> collection param specified on request and no default collection has 
> been set.",
>
> Any idea of what i did wrong?
>
>
>
> Thank you,
>
>
>
> Regards,
>
>
>
> Aurélien
>
>
>
>
>
>
>
>



Issue with scoreNodes stream expression

2017-09-19 Thread Aurélien MAZOYER
Hi,

 

I wanted to try the new scoreNodes stream expression that is used to make
recommendations:

https://cwiki.apache.org/confluence/display/solr/Graph+Traversal#GraphTraver
sal-UsingthescoreNodesFunctiontoMakeaRecommendation

but encountered some issue with it.

 

The following steps can easily reproduce the problem:

I started Solr (6.6.1) in cloud mode : 

solr -e cloud -noprompt

then run the following command in exampledocs to index the sample data :

java -Dc=gettingstarted -jar post.jar *.xml

and to finish copy/paste the following expression in the stream tab:

scoreNodes(top(n=25,

  sort="count(*) desc",

  nodes(gettingstarted,

 random(gettingstarted, q="*:*",
fl="id", rows="200"),

 walk="id->id",

 gather="id",

 count(*

(yes I now that my stream expression does nothing usefull :-P).

Anyway, I got the following exception when I run the query:

"EXCEPTION": "org.apache.solr.client.solrj.SolrServerException: No
collection param specified on request and no default collection has been
set.",

Any idea of what i did wrong?

 

Thank you,

 

Regards,

 

Aurélien

 

 

 



Re: [E] Re: Stemming

2016-06-16 Thread Aurélien MAZOYER

No problem :-)

Aurélien

Le 16/06/2016 22:36, Jamal, Sarfaraz a écrit :

Oh, is this what you meant?

   
 
   content_stemming
   
 
   

I changed it to content_stemming and now it seems to work :) - It was _text_ 
before -

Thanks! I will update if I discover anything amiss

Thanks again so much =)

Sas

-Original Message-
From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com]
Sent: Thursday, June 16, 2016 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: [E] Re: Stemming

Hi,

I was just wondering if you are sure that you query only that field (or fields 
that use your text_stem analyzer) and not other fields (in your qf for example 
is you use edismax) that can give you uncorrect results.

Regards,

Aurélien

Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit :

Hello =)

Just to be safe and make sure it's happening at indexing time AS WELL
as QUERYING time -

I modified it to be like so:



  
  
  
  
  
  


  
  
  
  
  
  
   


I am re-indexing the files
And what do you mean about only querying one field? I am not entirely sure I 
understand..

Sas

-Original Message-
From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com]
Sent: Thursday, June 16, 2016 4:20 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Stemming

Hi,

Yes you should have the same resultset.

Are you sure that you reindex all the data after changing your schema?
Are you sure that you put your analyzer both at indexing and querying?
Are you sure you query only one field?

Regards,

Aurélien

Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :

Hi Guys,

I have enabled stemming:
 




 

In the Admin Analysis, I type in running or runs and they both break down to 
run.
However when I search for run, runs, or running with an actual query
-

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas





Re: [E] Re: Stemming

2016-06-16 Thread Aurélien MAZOYER

Hi,

I was just wondering if you are sure that you query only that field (or 
fields that use your text_stem analyzer) and not other fields (in your 
qf for example is you use edismax) that can give you uncorrect results.


Regards,

Aurélien

Le 16/06/2016 22:29, Jamal, Sarfaraz a écrit :

Hello =)

Just to be safe and make sure it's happening at indexing time AS WELL as 
QUERYING time -

I modified it to be like so:

   

  
  
  
  
  
  


  
  
  
  
  
  
   
   

I am re-indexing the files
And what do you mean about only querying one field? I am not entirely sure I 
understand..

Sas

-Original Message-
From: Aurélien MAZOYER [mailto:aurelien.mazo...@francelabs.com]
Sent: Thursday, June 16, 2016 4:20 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Stemming

Hi,

Yes you should have the same resultset.

Are you sure that you reindex all the data after changing your schema?
Are you sure that you put your analyzer both at indexing and querying?
Are you sure you query only one field?

Regards,

Aurélien

Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :

Hi Guys,

I have enabled stemming:







In the Admin Analysis, I type in running or runs and they both break down to 
run.
However when I search for run, runs, or running with an actual query -

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas





Re: Stemming

2016-06-16 Thread Aurélien MAZOYER

Hi,

Yes you should have the same resultset.

Are you sure that you reindex all the data after changing your schema?
Are you sure that you put your analyzer both at indexing and querying?
Are you sure you query only one field?

Regards,

Aurélien

Le 16/06/2016 21:13, Jamal, Sarfaraz a écrit :

Hi Guys,

I have enabled stemming:
   




   

In the Admin Analysis, I type in running or runs and they both break down to 
run.
However when I search for run, runs, or running with an actual query -

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas





Re: Is it different? q=(field1:value1 OR field2:value2) and q=field1:value1 OR field2:value2

2016-02-26 Thread Aurélien MAZOYER

Hi,

I think both the two queries are rewrited to the same query. You can use 
the debugQuery=on parameter to see how the query is rewrited and then 
compare if you get the same result for each query.


Regards,

Aurélien

Le 26/02/2016 14:27, vitaly bulgakov a écrit :

Is there a difference when we put query in brackets?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-different-q-field1-value1-OR-field2-value2-and-q-field1-value1-OR-field2-value2-tp4259976.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Deletion Policy in Solr Cloud

2015-06-15 Thread Aurélien MAZOYER

Thank you for your answer Marc,

Aurélien


On 15/06/2015 23:19, Mark Miller wrote:

SolrCloud does not really support any form of rollback.

On Mon, Jun 15, 2015 at 5:05 PM Aurélien MAZOYER 
aurelien.mazo...@francelabs.com wrote:


Hi all,

Is DeletionPolicy customization still available in Solr Cloud? Is there
a way to rollback to a previous commit point in Solr Cloud thanks to a
specific deletion policy?

Thanks,

   Aurélien



--
Aurélien MAZOYER
Expert en technologies de recherche
France Labs
*** Découvrez Datafari v1.0 sur datafari.com ***
CEEI Nice Premium
1 boulevard Maître Maurice Slama
06200 Nice
Tel : +33 (0) 683366620
www.francelabs.com



Deletion Policy in Solr Cloud

2015-06-15 Thread Aurélien MAZOYER

Hi all,

Is DeletionPolicy customization still available in Solr Cloud? Is there 
a way to rollback to a previous commit point in Solr Cloud thanks to a 
specific deletion policy?


Thanks,

 Aurélien


Re: Order synonyms

2015-01-20 Thread Aurélien MAZOYER

Hi,

I am afraid you don't use the right component.
In your example, you will match apple, darty and boulanger 
documents, sorted by the  default Solr scoring mechanism (TF-IDF) that 
won't take the order you specified in your synonyms.txt file into 
account for the scoring.
If you want to override the solr scoring mecanism for a query, you can 
have a look to the solr elevate component:

https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component



Regards,

Aurélien



On 20/01/2015 17:28, Antoine REBOUL wrote:

Hello,

(sorry for my English , I use a translator)

I used synonyms in solr .

My question is the following:
How to order the results list according to the order of synonyms ?

My synonyms are written as follows in mysynonyms.txt file :

ipad =  apple , Darty , Boulanger

I want that when you search for  ipad  the results appear in the
following order:

1 / Apple
2 / Darty
3 / Boulanger

Unless Apple is not returned first.

Do you have an idea to offer me ?


Thank you in advance.


Merci d'avance.
Antoine Reboul
Responsable Comparateurs / Plateforme emailing
Plebicom -  eBuyClub - Cashstore - Checkdeal

PLEBICOM – 29 avenue Joannes Masset – 69009 Lyon
Tel  : 04 72 85 81 49
Fax : 04 78 83 39 74



--
Aurélien MAZOYER
Expert en technologies de recherche
France Labs
CEEI Nice Premium
1 boulevard Maître Maurice Slama
06200 Nice
Tel : +33 (0) 683366620
www.francelabs.com



Re: How do I get index size and datasize

2014-08-25 Thread Aurélien MAZOYER

Hi,

Have a look the 'data' directory in your solr_home.
.fdt and fdx. files are used to store the data of stored field. You can 
consider the size of the other files as the size Solr uses for its index.
You can have a look to 
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/codecs/lucene49/package-summary.html#file-names 
to have more information.


Regards,


Le 25/08/2014 09:40, Ramprasad Padmanabhan a écrit :

I have solr working for my stats pages. When I run the index I need to know
how much of the size occupied by solr is used for index and how much is
used for storing non indexed data





Re: Query regarding URL Analysers

2014-08-21 Thread Aurélien MAZOYER

Hi,

Maybe I am wrong but I am not that you can find such a tokenizer in solr 
out-of-the-box.
I can suggest to have a look to PatternTokenizer and PathTokenizer. Note 
that you can also implement your own tokenizer and add it to Solr as a 
plugin.


Regards,

Aurélien MAZOYER

Le 21/08/2014 14:35, Sathyam a écrit :

Hi,

I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as well as each individual entity as tokens.
For example:
*Given a URL : *

http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10b=20c=30#xyz

The tokens that I need are :

*Hierarchical subsets of the URL*

1 http://

2 http://www.google.com/

3 http://www.google.com/abcd/

4 http://www.google.com/abcd/efgh/

5 http://www.google.com/abcd/efgh/ijkl/

  6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php

*Individual elements in the path to the resource*

7 abcd

8 efgh

9 ijkl

10 mnop.php

*Query Terms*

11 a=10

12 b=20

13 c=30

*Fragment*
14 xyz

This comes to a total of 14 tokens for the given URL.
Basically a URL analyzer that creates tokens based on the categories
mentioned in bold. Also a separate token for port(if mentioned).

I would like to know how this can be achieved by using a single analyzer
that uses a combination of the tokenizers and filters provided by solr.
Also curious to know why there is a restriction of only *one  *tokenizer to
be used in an analyzer.
Looking forward to a response from your side telling the best possible way
to achieve the closest to what I need.

Thanks.




Re: sample Cell schema question

2014-08-19 Thread Aurélien MAZOYER
indexed means you can search it, stored means you can return the 
value to the user or highlight it.

Both consum disk space.
A copyfield is not a kind of special field : it is a directive that 
copies one field values to another field. They are many use cases for 
using copy fields.
In the example, we use a specific field, text, as a default field where 
use will perform the searches. That is why we copy all fields that we 
want to search in that specific field text(note that there are other 
way to search multiple fields : have a look to 
http://wiki.apache.org/solr/ExtendedDisMax)
For exemple, the field contentis copied to the text field  (that is 
indexed) for searching. As we will use the field text to perform our 
search, we don't need to index the content field too, and we don't, 
you save some disk space.


Regards,

Aurélien



Le 19/08/2014 13:05, Aman Tandon a écrit :

I have a question, does storing the data in copyfields save space?

With Regards
Aman Tandon


On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote:


ok, I had not noticed text contains also the other metadata like keywords,
description etc, nevermind!


On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote:


In the sample schema.xml I can see this:

 !-- Main body of document extracted by SolrCell.
 NOTE: This field is not indexed by default, since it is also
copied to text
 using copyField below. This is to save space. Use this field
for returning and
 highlighting document content. Use the text field to search
the content. --
 field name=content type=text_general indexed=false
stored=true multiValued=true/


I am wondering, how does having this split in two fields text/content

save

space?





Re: logging in solr

2014-08-18 Thread Aurélien MAZOYER

Hi,

Are you using tomcat or jetty? If you use the default jetty, have a look 
to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup


Regards,

Aurélien


Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit :

Hi,

 Currently in my component Solr is logging to catalina.out. What is the 
configuration needed to redirect those logs to some custom logfile eg: Solr.log.

 Thanks...

--Arjun






Re: logging in solr

2014-08-18 Thread Aurélien MAZOYER
Sorry, outdated link. And I suppose you use tomcat if you are talking 
about catalina.out The correct link is : 
http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above



Le 18/08/2014 23:06, Aurélien MAZOYER a écrit :


Hi,

Are you using tomcat or jetty? If you use the default jetty, have a 
look to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup


Regards,

Aurélien


Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit :

Hi,

 Currently in my component Solr is logging to catalina.out. 
What is the configuration needed to redirect those logs to some 
custom logfile eg: Solr.log.


 Thanks...

--Arjun








Re: Selectively setting the number of returned SOLR rows per field based on field value

2014-08-17 Thread Aurélien MAZOYER
I am afraid you can't. I think your problem is linked  to this issue 
that is still unresolved :

https://issues.apache.org/jira/browse/SOLR-1093

Aurélien


Le 17/08/2014 23:16, talt a écrit :

I have a field in my SOLR index, let's call it book_title.

A query returns 15 rows with book_title:The Kite Runner, 13 rows with
book_title:The Stranger, and 8 rows with book_title:The Ruby Way.

Is there a way to return only the first row of The Kite Runner and The
Stranger, but all of the The Ruby Way rows from the previous query
result? This would result in 10 rows altogether. Is this possible at all,
using a single query?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Selectively-setting-the-number-of-returned-SOLR-rows-per-field-based-on-field-value-tp4153441.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Can I use multiple cores

2014-08-12 Thread Aurélien MAZOYER

Hi Paul and Ramprasad,

I follow your discussion with interest as I will have more or less the 
same requirement.
When you say that you use on demand core loading, are you talking about 
LotsOfCore stuff?
Erick told me that it does not work very well in a distributed 
environnement.
How do you handle this problem? Do you use multiple single Solr 
instances? What about failover?


Thanks for your answer,

Aurelien

Le 12/08/2014 14:48, Noble Paul a écrit :

Hi Ramprasad,


I have used it in a cluster with millions of users (1 user per core) in
legacy cloud mode .We used the on demand core loading feature where each
Solr had 30,000 cores and at a time only 2000 cores were in memory. You are
just hitting 400 and I don't see much of a problem . What is your h/w bTW?


On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:


I need to store in SOLR all data of my clients mailing activitiy

The data contains meta data like From;To:Date;Time:Subject etc

I would easily have 1000 Million records every 2 months.

What I am currently doing is creating cores per client. So I have 400 cores
already.

Is this a good idea to do ?

What is the general practice for creating cores








Re: Passivate core in Solr Cloud

2014-07-24 Thread Aurélien MAZOYER
Thank you Erick and Alex for your answers. Lots of core stuff seems to 
meet my requirement but it is a problem if it does not work with Solr 
Cloud. Is there an issue opened for this problem?
If I understand well, the only solution for me is to use multiple 
monoinstances of Solr using transient cores and to distribute manually 
the cores for my tenant (I assume the LRU mechanimn will be less 
effective as it will be done per solr instance).
When you say does NOT play nice with distributed mode, does it also 
include the standard replication mecanism?


Thanks,

Regards,

Aurelien



Le 23/07/2014 17:21, Erick Erickson a écrit :

Do note that the lots of cores stuff does NOT play nice with in
distributed mode (yet).

Best,
Erick


On Wed, Jul 23, 2014 at 6:00 AM, Alexandre Rafalovitcharafa...@gmail.com
wrote:


Solr has some support for large number of cores, including transient
cores:http://wiki.apache.org/solr/LotsOfCores

Regards,
Alex.
Personal:http://www.outerthoughts.com/  and @arafalov
Solr resources:http://www.solr-start.com/  and @solrstart
Solr popularizers community:https://www.linkedin.com/groups?gid=6713853


On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER
aurelien.mazo...@francelabs.com  wrote:

Hello,

We want to setup a Solr Cloud cluster in order to handle a high volume of
documents with a multi-tenant architecture. The problem is that an
application-level isolation for a tenant (using a mutual index with a

field

customer) is not enough to fit our requirements. As a result, we need 1
collection/customer. There is more than a thousand customers and it seems
unreasonable to create thousands of collections in Solr Cloud... But as

we

know that there are less than 1 query/customer/day, we are currently

looking

for a way to passivate collection when they are not in use. Can it be a

good

idea? If yes, are there best practices to implement this? What side

effects

can we expect? Do we need to put some application-level logic on top on

the

Solr Cloud cluster to choose which collection we have to unload (and

maybe

there is something smarter (and quicker?) than simply loading/unloading

the

core when it is not in used?) ?


Thank you for your answer(s),

Aurelien





Multipart documents with different update cycles

2014-07-24 Thread Aurélien MAZOYER

Hello,

I have to index a dataset containing multipart documents. The main 
part and the user metadata part have different update cycles : we want 
to update the user metadata part frequently without having to refetch 
the main part from the datasource nor storing every fields in order to 
use atomic update. As there is no true field level update in Solr yet, I 
am afraid that I have to build an index for both parts and to perform a 
query time join, with all the well-known performance limitation. I have 
also heard of side car index. Is it a solution that can meet my 
requirements? Is it stable enough to be usable in production? Does the 
community plan to make it part of the trunk code?


Thanks,

Aurelien



Passivate core in Solr Cloud

2014-07-23 Thread Aurélien MAZOYER

Hello,

We want to setup a Solr Cloud cluster in order to handle a high volume 
of documents with a multi-tenant architecture. The problem is that an 
application-level isolation for a tenant (using a mutual index with a 
field customer) is not enough to fit our requirements. As a result, we 
need 1 collection/customer. There is more than a thousand customers and 
it seems unreasonable to create thousands of collections in Solr 
Cloud... But as we know that there are less than 1 query/customer/day, 
we are currently looking for a way to passivate collection when they are 
not in use. Can it be a good idea? If yes, are there best practices to 
implement this? What side effects can we expect? Do we need to put some 
application-level logic on top on the Solr Cloud cluster to choose which 
collection we have to unload (and maybe there is something smarter (and 
quicker?) than simply loading/unloading the core when it is not in used?) ?



Thank you for your answer(s),

Aurelien