Using Solr to build a product matcher, with learning to rank

2018-03-28 Thread Xavier Schepler
Hello,

I'm considering using Solr with learning to rank to build a product matcher.
For example, it should match the titles:
- Apple iPhone 6 16 Gb,
- iPhone 6 16 Gb,
- Smartphone IPhone 6 16 Gb,
- iPhone 6 black 16 Gb,
to the same internal reference, an unique identifier.

With Solr, each document would then have a field for the product title and
one for its class, which is the unique identifier of the product.
Solr would then be used to perform matching as follows.

   1. A search is performed with a given product title.
   2. The first three results are considered (this requires an initial
   product title database).
   3. The most frequent identifier is returned.

This method corresponds roughly to a k-Nearest Neighbor approach with the
cosine metric, k = 3, and a TF-IDF model.

I've done some preliminary tests with Sci-kit learn and the results are
good, but not as good as the ones of more sophisticated learning algorithms.

Then, I noticed that there exists learning to rank with Solr.

First, do you think that such an use of Solr makes sense?
Second, is there a relatively simple way to build a learning model using a
sparse representation of the query TF-IDF vector?

Kind regards,

Xavier Schepler


Re: choosing placement upon RESTORE

2017-05-02 Thread xavier jmlucjav
thanks Mikhail, that sounds like it would help me as it allows you to set
createNodeSet on RESTORE calls

On Tue, May 2, 2017 at 2:50 PM, Mikhail Khludnev <m...@apache.org> wrote:

> This sounds relevant, but different to https://issues.apache.org/
> jira/browse/SOLR-9527
> You may want to follow this ticket.
>
> On Mon, May 1, 2017 at 9:15 PM, xavier jmlucjav <jmluc...@gmail.com>
> wrote:
>
>> hi,
>>
>> I am facing this situation:
>> - I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's
>> just
>> for dev work)
>> - the collections where created with:
>>action=CREATE&...=EMPTY"
>> then
>>   action=ADDREPLICA&...=$NODEA=$DATADIR"
>> - I have taken a BACKUP of the collections
>> - Solr is upgraded to 6.5.1
>>
>> Now, I started using RESTORE to restore the collections on the node A
>> (where they lived before), but, instead of all being created in node A,
>> collections have been created in A, then B, then C nodes. Well, Solrcloud
>> tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
>> disk, not reachable from nodes B and C.
>>
>> How is this supposed to work? I am looking at Rule Based Placement but it
>> seems it is only available for CREATESHARD, so I can use it in RESTORE?
>> Isn't there a way to force Solrcloud to create the collection in a given
>> node?
>>
>> thanks!
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


choosing placement upon RESTORE

2017-05-01 Thread xavier jmlucjav
hi,

I am facing this situation:
- I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's just
for dev work)
- the collections where created with:
   action=CREATE&...=EMPTY"
then
  action=ADDREPLICA&...=$NODEA=$DATADIR"
- I have taken a BACKUP of the collections
- Solr is upgraded to 6.5.1

Now, I started using RESTORE to restore the collections on the node A
(where they lived before), but, instead of all being created in node A,
collections have been created in A, then B, then C nodes. Well, Solrcloud
tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
disk, not reachable from nodes B and C.

How is this supposed to work? I am looking at Rule Based Placement but it
seems it is only available for CREATESHARD, so I can use it in RESTORE?
Isn't there a way to force Solrcloud to create the collection in a given
node?

thanks!


DIH: last_index_time not updated on if 0 docs updated

2017-02-27 Thread xavier jmlucjav
Hi,

After getting our interval for calling delta index shorter and shorter, I
have found out that last_index_time  in dataimport.properties is not
updated every time the indexing runs, it is skipped if no docs where added.

This happens at least in the following scenario:
- running delta as full index
( /dataimport?command=full-import=false=true )
- Solrcloud setup, so dataimport.properties is in zookeeper
- Solr 5.5.0

I understand skipping the commit on the index if no docs were updated is a
nice optimization, but I believe the last_index_time info should be updated
in all cases, so it reflects reality. We, for instance, are looking at this
piece of information in order to do other stuff.

I could not find any mention of this on Jira, so I wonder if this is
intented or just nobody had an issue with it?

xavier


Re: procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
hi Shawn,

as I replied to Markus, of course I know (and use) the collections api to
reload the config. I am asking what would happen in that scenario:
 - config updated (but collection not reloaded)
 - i restart one node
now one node has the new config and the rest the old one??

To which he already replied:
>The restared/reloaded node has the new config, the others have the old
config until reloaded/restarted.

I was not asking about making solr restart itself, my English must be worst
than I thought. By the way, stuff like that can be achieved with
http://yajsw.sourceforge.net/ a very powerful java wrapper, I used to use
it when Solr did not have a built in daemon setup. It was built by someone
how was using JSW, and got pissed when that one went commercial. It is very
configurable, but of course more complex. I wrote something about it some
time ago
https://medium.com/@jmlucjav/how-to-install-solr-as-a-service-in-any-platform-including-solr-5-8e4a93cc3909

thanks

On Thu, Feb 9, 2017 at 4:53 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/9/2017 5:24 AM, xavier jmlucjav wrote:
> > I always wondered, if this was not really needed, and I could just call
> > 'restart' in every node, in a quick loop, and forget about it. Does
> anyone
> > know if this is the case?
> >
> > My doubt is in regards to changing some config, and then doing the above
> > (just restart nodes in a loop). For example, what if I change a config G
> > used in collection C, and I restart just one of the nodes (N1), leaving
> the rest alone. If all the nodes contain a shard for C, what happens, N1 is
> using the new config and the rest are not? how is this handled?
>
> If you want to change the config or schema for a collection and make it
> active across all nodes, just use the collections API to RELOAD the
> collection.  The change will be picked up everywhere.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> To answer your question: No.  Solr does not have the ability to restart
> itself.  It would require significant development effort and a
> fundamental change in how Solr is started to make it possible.  It is
> something that has been discussed, but at this time it is not possible.
>
> One idea that would make this possible is mentioned on the following
> wiki page.  It talks about turning Solr into two applications instead of
> one:
>
> https://wiki.apache.org/solr/WhyNoWar#Information_that.27s_
> not_version_specific
>
> Again -- it would not be easy, which is why it hasn't been done yet.
>
> Thanks,
> Shawn
>
>


Re: procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
Hi Markus,

yes, of course I know (and use) the collections api to reload the config. I
am asking what would happen in that scenario:
- config updated (but collection not reloaded)
- i restart one node

now one node has the new config and the rest the old one??

Regarding restarting many hosts, my question is if we can just 'restart'
each solr and that is enough, or it is better to first stop all, and then
start all.

thanks


On Thu, Feb 9, 2017 at 1:28 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello - if you just want to use updated configuration, you can use Solr's
> collection reload API call. For restarting we rely on remote provisioning
> tools such as Salt, other managing tools can probably execute commands
> remotely as well.
>
> If you operate more than just a very few machines, i'd really recommend
> using these tools.
>
> Markus
>
>
>
> -Original message-
> > From:xavier jmlucjav <jmluc...@gmail.com>
> > Sent: Thursday 9th February 2017 13:24
> > To: solr-user <solr-user@lucene.apache.org>
> > Subject: procedure to restart solrcloud, and config/collection
> consistency
> >
> > Hi,
> >
> > When I need to restart a Solrcloud cluster, I always do this:
> > - log in into host nb1, stop solr
> > - log in into host nb2, stop solr
> > -...
> > - log in into host nbX, stop solr
> > - verify all hosts did stop
> > - in host nb1, start solr
> > - in host nb12, start solr
> > -...
> >
> > I always wondered, if this was not really needed, and I could just call
> > 'restart' in every node, in a quick loop, and forget about it. Does
> anyone
> > know if this is the case?
> >
> > My doubt is in regards to changing some config, and then doing the above
> > (just restart nodes in a loop). For example, what if I change a config G
> > used in collection C, and I restart just one of the nodes (N1), leaving
> the
> > rest alone. If all the nodes contain a shard for C, what happens, N1 is
> > using the new config and the rest are not? how is this handled?
> >
> > thanks
> > xavier
> >
>


procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
Hi,

When I need to restart a Solrcloud cluster, I always do this:
- log in into host nb1, stop solr
- log in into host nb2, stop solr
-...
- log in into host nbX, stop solr
- verify all hosts did stop
- in host nb1, start solr
- in host nb12, start solr
-...

I always wondered, if this was not really needed, and I could just call
'restart' in every node, in a quick loop, and forget about it. Does anyone
know if this is the case?

My doubt is in regards to changing some config, and then doing the above
(just restart nodes in a loop). For example, what if I change a config G
used in collection C, and I restart just one of the nodes (N1), leaving the
rest alone. If all the nodes contain a shard for C, what happens, N1 is
using the new config and the rest are not? how is this handled?

thanks
xavier


reuse a org.apache.lucene.search.Query in Solrj?

2017-01-05 Thread xavier jmlucjav
Hi,

I have a lucene Query (Boolean query with a bunch of possibly complex
spatial queries, even polygon etc) that I am building for some MemoryIndex
stuff.

Now I need to add that same query to a Solr query (adding it to a bunch of
other fq I am using). Is there a some way to piggyback the lucene query
this way?? It would be extremelly handy in my situation.

thanks
xavier


solrj: get to which shard a id will be routed

2016-12-22 Thread xavier jmlucjav
Hi

Is there somewhere a sample of some solrj code that given:
- a collection
- the id (like "IBM!12345")

returns the shard to where the doc will be routed? I was hoping to get that
info from CloudSolrClient  itself but it's not exposing it as far as I can
see.

thanks
xavier


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
done, with simple patch https://issues.apache.org/jira/browse/SOLR-9697

On Thu, Oct 27, 2016 at 4:21 PM, xavier jmlucjav <jmluc...@gmail.com> wrote:

> sure, will do, I tried before but I could not create a Jira, now I can,
> not sure what was happening.
>
> On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming
>> out soon and we'd have to hurry if this fix has to go in.
>>
>> On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav <jmluc...@gmail.com>
>> wrote:
>>
>> > Correcting myself here, I was wrong about the cause (I had already
>> messed
>> > with the script).
>> >
>> > I made it work by commenting out line 1261 (the number might be a bit
>> off
>> > as I have modified the script, but hopefully its easy to see where):
>> >
>> > ) ELSE IF "%1"=="/?" (
>> >   goto zk_usage
>> > ) ELSE IF "%1"=="-h" (
>> >   goto zk_usage
>> > ) ELSE IF "%1"=="-help" (
>> >   goto zk_usage
>> > ) ELSE IF "!ZK_SRC!"=="" (
>> >   if not "%~1"=="" (
>> > goto set_zk_src
>> >   )
>> >  * rem goto zk_usage*
>> > ) ELSE IF "!ZK_DST!"=="" (
>> >   IF "%ZK_OP%"=="cp" (
>> > goto set_zk_dst
>> >   )
>> >   IF "%ZK_OP%"=="mv" (
>> > goto set_zk_dst
>> >   )
>> >   set ZK_DST="_"
>> > ) ELSE IF NOT "%1"=="" (
>> >   set ERROR_MSG="Unrecognized or misplaced zk argument %1%"
>> >
>> > Now upconfig works!
>> >
>> > thanks
>> > xavier
>> >
>> >
>> > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com>
>> > wrote:
>> >
>> > > hi,
>> > >
>> > > Am I missing something or this is broken in windows? I cannot
>> upconfig,
>> > > the scripts keeps exiting immediately and showing usage, as if I use
>> some
>> > > wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw
>> it
>> > > also on win7 (don't have that around anymore to try)
>> > >
>> > > I think the issue is: there is a SHIFT too much in line 1276 of
>> solr.cmd:
>> > >
>> > > :set_zk_op
>> > > set ZK_OP=%~1
>> > > SHIFT
>> > > goto parse_zk_args
>> > >
>> > > if this SHIFT is removed, then parse_zk_args works (and it does the
>> shift
>> > > itself). But the upconfig hangs, so still it does not work.
>> > >
>> > > this probably was introduced in a851d5f557aefd76c01ac23da076a1
>> 4dc7576d8e
>> > > by Erick (not sure which one :) ) on July 2nd. Master still has this
>> > issue.
>> > > Would be great if this was fixed in the incoming 6.3...
>> > >
>> > > My cmd scripting is not too strong and I did not go further. I
>> searched
>> > > Jira but found nothing. By the way is it not possible to open tickets
>> in
>> > > Jira anymore?
>> > >
>> > > xavier
>> > >
>> >
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
sure, will do, I tried before but I could not create a Jira, now I can, not
sure what was happening.

On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming
> out soon and we'd have to hurry if this fix has to go in.
>
> On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav <jmluc...@gmail.com>
> wrote:
>
> > Correcting myself here, I was wrong about the cause (I had already messed
> > with the script).
> >
> > I made it work by commenting out line 1261 (the number might be a bit off
> > as I have modified the script, but hopefully its easy to see where):
> >
> > ) ELSE IF "%1"=="/?" (
> >   goto zk_usage
> > ) ELSE IF "%1"=="-h" (
> >   goto zk_usage
> > ) ELSE IF "%1"=="-help" (
> >   goto zk_usage
> > ) ELSE IF "!ZK_SRC!"=="" (
> >   if not "%~1"=="" (
> > goto set_zk_src
> >   )
> >  * rem goto zk_usage*
> > ) ELSE IF "!ZK_DST!"=="" (
> >   IF "%ZK_OP%"=="cp" (
> > goto set_zk_dst
> >   )
> >   IF "%ZK_OP%"=="mv" (
> > goto set_zk_dst
> >   )
> >   set ZK_DST="_"
> > ) ELSE IF NOT "%1"=="" (
> >   set ERROR_MSG="Unrecognized or misplaced zk argument %1%"
> >
> > Now upconfig works!
> >
> > thanks
> > xavier
> >
> >
> > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com>
> > wrote:
> >
> > > hi,
> > >
> > > Am I missing something or this is broken in windows? I cannot upconfig,
> > > the scripts keeps exiting immediately and showing usage, as if I use
> some
> > > wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw
> it
> > > also on win7 (don't have that around anymore to try)
> > >
> > > I think the issue is: there is a SHIFT too much in line 1276 of
> solr.cmd:
> > >
> > > :set_zk_op
> > > set ZK_OP=%~1
> > > SHIFT
> > > goto parse_zk_args
> > >
> > > if this SHIFT is removed, then parse_zk_args works (and it does the
> shift
> > > itself). But the upconfig hangs, so still it does not work.
> > >
> > > this probably was introduced in a851d5f557aefd76c01ac23da076a1
> 4dc7576d8e
> > > by Erick (not sure which one :) ) on July 2nd. Master still has this
> > issue.
> > > Would be great if this was fixed in the incoming 6.3...
> > >
> > > My cmd scripting is not too strong and I did not go further. I searched
> > > Jira but found nothing. By the way is it not possible to open tickets
> in
> > > Jira anymore?
> > >
> > > xavier
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
Correcting myself here, I was wrong about the cause (I had already messed
with the script).

I made it work by commenting out line 1261 (the number might be a bit off
as I have modified the script, but hopefully its easy to see where):

) ELSE IF "%1"=="/?" (
  goto zk_usage
) ELSE IF "%1"=="-h" (
  goto zk_usage
) ELSE IF "%1"=="-help" (
  goto zk_usage
) ELSE IF "!ZK_SRC!"=="" (
  if not "%~1"=="" (
goto set_zk_src
  )
 * rem goto zk_usage*
) ELSE IF "!ZK_DST!"=="" (
  IF "%ZK_OP%"=="cp" (
goto set_zk_dst
  )
  IF "%ZK_OP%"=="mv" (
    goto set_zk_dst
  )
  set ZK_DST="_"
) ELSE IF NOT "%1"=="" (
  set ERROR_MSG="Unrecognized or misplaced zk argument %1%"

Now upconfig works!

thanks
xavier


On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com> wrote:

> hi,
>
> Am I missing something or this is broken in windows? I cannot upconfig,
> the scripts keeps exiting immediately and showing usage, as if I use some
> wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw it
> also on win7 (don't have that around anymore to try)
>
> I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd:
>
> :set_zk_op
> set ZK_OP=%~1
> SHIFT
> goto parse_zk_args
>
> if this SHIFT is removed, then parse_zk_args works (and it does the shift
> itself). But the upconfig hangs, so still it does not work.
>
> this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e
> by Erick (not sure which one :) ) on July 2nd. Master still has this issue.
> Would be great if this was fixed in the incoming 6.3...
>
> My cmd scripting is not too strong and I did not go further. I searched
> Jira but found nothing. By the way is it not possible to open tickets in
> Jira anymore?
>
> xavier
>


'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
hi,

Am I missing something or this is broken in windows? I cannot upconfig, the
scripts keeps exiting immediately and showing usage, as if I use some wrong
parameters.  This is on win10, jdk8. But I am pretty sure I saw it also on
win7 (don't have that around anymore to try)

I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd:

:set_zk_op
set ZK_OP=%~1
SHIFT
goto parse_zk_args

if this SHIFT is removed, then parse_zk_args works (and it does the shift
itself). But the upconfig hangs, so still it does not work.

this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e by
Erick (not sure which one :) ) on July 2nd. Master still has this issue.
Would be great if this was fixed in the incoming 6.3...

My cmd scripting is not too strong and I did not go further. I searched
Jira but found nothing. By the way is it not possible to open tickets in
Jira anymore?

xavier


Re: JNDI settings

2016-09-26 Thread xavier jmlucjav
I did set up JNDI for DIH once, and you have to tweak the jetty setup. Of
course, solr should have its own jetty instance, the old way of being just
a war is not true anymore. I don't remember where, but there should be some
instructions somewhere, it took me an afternoon to set it up fine.

xavier

On Wed, Sep 21, 2016 at 1:15 PM, Aristedes Maniatis <amania...@apache.org>
wrote:

> On 13/09/2016 1:29am, Aristedes Maniatis wrote:
> > I am using Solr 5.5 and wanting to add JNDI settings to Solr (for data
> import). I'm new to Solr Cloud setup (previously I was running Solr running
> as a custom bundled war) so I can't figure where to put the JNDI settings
> with user/pass themselves.
> >
> > I don't want to add it to jetty.xml because that's part of the packaged
> application which will be upgraded from time to time.
> >
> > Should it go into solr.xml inside the solr.home directory? If so, what's
> the right syntax there?
>
>
> Just a follow up on this question. Does anyone know of how I can add JNDI
> settings to Solr without overwriting parts of the application itself?
>
> Cheers
> Ari
>
>
>
> --
> -->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>


Re: SOLR 5.4.0?

2015-12-31 Thread Xavier Sanchez Loro

El 31/12/15 a las 8:07, Ere Maijala escribió:
Well, for us SOLR-8418 is a major issue. I haven't encountered other 
issues, but that one was sort of a show-stopper.


--Ere

31.12.2015, 7.27, William Bell kirjoitti:

How is SOLR 5.4.0 ? I heard there was a quick 5.4.1 coming out?

Any major issues?



For us, SOLR-7864 (where timeAllowed is broken) is a major bug, which 
prevents us from finishing migration to solr 5 (we are currently using 
4.3). For our use case, correct operation of timeAllowed is critical.


Best regards,
Xavier

--
Trovit
Twitter <http://twitter.com/trovit>Facebook 
<http://www.facebook.com/trovit.search>Linkedin 
<http://www.linkedin.com/company/trovit>Google + 
<http://plus.google.com/+trovit/>Blog <http://about.trovit.com/blog/>

*Xavier Sánchez Loro*
Pipeline
+34 93 209 2556


Re: How to use DocumentAnalysisRequestHandler in java

2015-08-22 Thread Xavier Tannier

Hi,
Faceting is indeed the best way to do it.
Here is how it will look like in java:


SolrQuery query = new SolrQuery();
query.setQuery(id: + docId);
query.setFacet(true);
query.addFacetField(text);   // You can add all fields you 
want to inspect
query.setFacetMinCount(1); // Otherwise you'll get even 
tokens that are not in your document


QueryResponse rsp = this.index.query(query);

// Now look at the results (for field text)
FacetField facetField = rsp.getFacetField(text);
for (Count field : facetField.getValues()) {
System.out.println(field.getName());
}


Xavier.



Le 20/08/2015 22:20, Upayavira a écrit :


On Thu, Aug 20, 2015, at 04:34 PM, Jean-Pierre Lauris wrote:

Hi,
I'm trying to obtain indexed tokens from a document id, in order to see
what has been indexed exactly.
It seems that DocumentAnalysisRequestHandler does that, but I couldn't
figure out how to use it in java.

The doc says I must provide a contentstream but the available init()
method
only takes a NamedList as a parameter.
https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html

Could somebody provide me with a short example of how to get index
information from a document id?

If you are talking about what I think you are, then that is used by the
Admin UI to implement the analysis tab. You pass in a document, and it
returns it analysed.

As Alexandre says, faceting may well get you there if you want to query
a document already in your index.

Upayavira




--
Xavier Tannier
Associate Professor / Maître de conférence (HDR)
Univ. Paris-Sud
LIMSI-CNRS (bât. 508, bureau 12, RdC)
B.P. 133
91403 ORSAY CEDEX
FRANCE

http://www.limsi.fr/~xtannier/ http://www.limsi.fr/%7Extannier/
tel: 0033 (0)1 69 85 80 12
fax: 0033 (0)1 69 85 80 88
---


Can I be added to the Wiki contributors group?

2014-11-16 Thread Xavier Morera
I mean for: https://wiki.apache.org/solr/FrontPage

My username is XavierMorera

Regards,
Xavier

-- 

*Xavier Morera*

Entrepreneur | Author  Trainer | Consultant | Developer  Scrum Master

*www.xaviermorera.com http://www.xaviermorera.com/*

office:  (305) 600-4919

cel: +506 8849-8866

skype: xmorera
Twitter https://twitter.com/xmorera | LinkedIn
https://www.linkedin.com/in/xmorera | Pluralsight Author
http://www.pluralsight.com/author/xavier-morera


Re: Mongo DB Users

2014-09-16 Thread Xavier Morera
I think what some people are actually saying is burn in hell Aaron Susan
for using a solr apache dl for marketing purposes?

On Tue, Sep 16, 2014 at 8:31 AM, Suman Ghosh suman.ghos...@gmail.com
wrote:

 Remove

 On Mon, Sep 15, 2014 at 11:35 AM, Aaron Susan aaronsus...@gmail.com
 wrote:

  Hi,
 
  I am here to inform you that we are having a contact list of *Mongo DB
  Users *would you be interested in it?
 
  Data Field’s Consist Of: Name, Job Title, Verified Phone Number, Verified
  Email Address, Company Name  Address Employee Size, Revenue size, SIC
  Code, Industry Type etc.,
 
  We also provide other technology users as well depends on your
 requirement.
 
  For Example:
 
 
  *Red Hat *
 
  *Terra data *
 
  *Net-app *
 
  *NuoDB*
 
  *MongoHQ ** and many more*
 
 
  We also provide IT Decision Makers, Sales and Marketing Decision Makers,
  C-level Titles and other titles as per your requirement.
 
  Please review and let me know your interest if you are looking for above
  mentioned users list or other contacts list for your campaigns.
 
  Waiting for a positive response!
 
  Thanks
 
  *Aaron Susan*
  Data Specialist
 
  If you are not the right person, feel free to forward this email to the
  right person in your organization. To opt out response Remove
 




-- 
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera


Getting Started with Enterprise Search using Apache Solr

2014-07-28 Thread Xavier Morera
Hi. Most of the members here are already seasoned search professionals.
However I believe there may also be a few who joined because they want to
get started on search and IMHO, probably like you, Solr is the best way to
start.


Therefore I wanted to post a link to a course that I created on Getting
Started Enterprise Search using Apache Solr. For some it might be a good
way to start learning. If you are already a search professional maybe you
will not benefit greatly, but if you can provide feedback that will be
great as I want to create more trainings to help people get started on
search.

It is a Pluralsight training so if you are not a subscriber, just create a
trial account and you have 10 days to watch. If you have questions, let me
know. You can reach me through here or @xmorera in Twitter

Here is the course
http://pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr


PS: Pluralsight is also a great way to learn so I really recommend it.

https://www.linkedin.com/news?viewArticle=articleID=8578259352468791690gid=161594type=memberitem=5887568199951605762articleURL=http%3A%2F%2Fpluralsight%2Ecom%2Ftraining%2FCourses%2FTableOfContents%2Fenterprise-search-using-apache-solrurlhash=45UXgoback=%2Egde_161594_member_5887568199951605762
Getting Started with Enterprise Search using Apache Solr pluralsight.com

Search is one of the most misunderstood functionalities in the IT industry.
Even further, Enterprise Search used to be neither for the faint of heart,
nor for those with a thin wallet. However, since the introduction of Apache
Solr, the name of the game has changed. Don't leave home without it!

-- 
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera


Re: Raw query parameters

2014-04-29 Thread Xavier Morera
You saved my life Shawn! Thanks!


On Mon, Apr 28, 2014 at 11:54 PM, Shawn Heisey s...@elyograg.org wrote:

 On 4/28/2014 7:54 PM, Xavier Morera wrote:
  Would anyone be so kind to explain what are the Raw query parameters
  in Solr's admin UI. I can't find an explanation in either the reference
  guide nor wiki nor web search.

 The query API supports a lot more parameters than are shown on the admin
 UI.  For instance, If you are doing a faceted search, there are only
 boxes for facet.query, facet.field, and facet.prefix ... but faceted
 search supports a lot more parameters (like facet.method, facet.limit,
 facet.mincount, facet.sort, etc).  Raw Query Parameters gives you a way
 to use the entire query API, not just the few things that have UI input
 boxes.

 Thanks,
 Shawn




-- 
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera


Raw query parameters

2014-04-28 Thread Xavier Morera
Hi,

Would anyone be so kind to explain what are the Raw query parameters in
Solr's admin UI. I can't find an explanation in either the reference guide
nor wiki nor web search.

[image: Inline image 1]

A bit confused on what it actually is for
[image: Inline image 3]

Thanks in advance,
Xavier
-- 
*Xavier Morera*
email: xav...@familiamorera.com
CR: +(506) 8849 8866
US: +1 (305) 600 4919
skype: xmorera


[ANN] sadat: generate fake docs for your Solr index

2014-03-17 Thread xavier jmlucjav
Hi,

A couple of times I found myself in the following situation: I had to work
on a Solr schema, but had no docs to index yet (the db was not ready etc).

In order to start learning js, I needed some small project to practice, so
I thought of this small utility. It allows you to generate fake docs to
index, so you can at least advance with the schema/solrconfig design.

Currently it allows (based on your current schema) to generate the most
basic field types (int, float, boolean, text, date), and user defined
functions can be plugged in for customized generation.

Have a look at https://github.com/jmlucjav/sadat


Re: When is/should qf different from pf?

2013-10-29 Thread xavier jmlucjav
I am confused, wouldn't a doc that match both the phrase and the term
queries have a better score than a doc matching only the term score, even
if qf and pf are the same??


On Mon, Oct 28, 2013 at 7:54 PM, Upayavira u...@odoko.co.uk wrote:

 There'd be no point having them the same.

 You're likely to include boosts in your pf, so that docs that match the
 phrase query as well as the term query score higher than those that just
 match the term query.

 Such as:

   qf=text descriptionpf=text^2 description^4

 Upayavira

 On Mon, Oct 28, 2013, at 05:44 PM, Amit Nithian wrote:
  Thanks Erick. Numeric fields make sense as I guess would strictly string
  fields too since its one  term? In the normal text searching case though
  does it make sense to have qf and pf differ?
 
  Thanks
  Amit
  On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   The facetious answer is when phrases aren't important in the fields.
   If you're doing a simple boolean match, adding phrase fields will add
   expense, to no good purpose etc. Phrases on numeric
   fields seems wrong.
  
   FWIW,
   Erick
  
  
   On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com
 wrote:
  
Hi all,
   
I have been using Solr for years but never really stopped to wonder:
   
When using the dismax/edismax handler, when do you have the qf
 different
from the pf?
   
I have always set them to be the same (maybe different weights) but
 I was
wondering if there is a situation where you would have a field in
 the qf
not in the pf or vice versa.
   
My understanding from the docs is that qf is a term-wise hard filter
   while
pf is a phrase-wise boost of documents who made it past the qf
 filter.
   
Thanks!
Amit
   
  



Re: do SearchComponents have access to response contents

2013-04-05 Thread xavier jmlucjav
I knew I could do that at jetty level with a servlet for instance, but the
user wants to do this stuff inside solr code itself. Now that you mention
the logs...that could be a solution without modifying the webapp...

thanks for the input!
xavier


On Fri, Apr 5, 2013 at 7:55 AM, Amit Nithian anith...@gmail.com wrote:

 We need to also track the size of the response (as the size in bytes of
 the
 whole xml response tat is streamed, with stored fields and all). I was a
 bit worried cause I am wondering if a searchcomponent will actually have
 access to the response bytes...

 == Can't you get this from your container access logs after the fact? I
 may be misunderstanding something but why wouldn't mining the Jetty/Tomcat
 logs for the response size here suffice?

 Thanks!
 Amit


 On Thu, Apr 4, 2013 at 1:34 AM, xavier jmlucjav jmluc...@gmail.com
 wrote:

  A custom QueryResponseWriter...this makes sense, thanks Jack
 
 
  On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.com
  wrote:
 
   The search components can see the response as a namedlist, but it is
   only when SolrDispatchFIlter calls the QueryResponseWriter that XML or
  JSON
   or whatever other format (Javabin as well) is generated from the named
  list
   for final output in an HTTP response.
  
   You probably want a custom query response writer that wraps the XML
   response writer. Then you can generate the XML and then do whatever you
   want with it.
  
   The QueryResponseWriter class and queryResponseWriter in
  solrconfig.xml.
  
   -- Jack Krupansky
  
   -Original Message- From: xavier jmlucjav
   Sent: Wednesday, April 03, 2013 4:22 PM
   To: solr-user@lucene.apache.org
   Subject: do SearchComponents have access to response contents
  
  
   I need to implement some SearchComponent that will deal with metrics on
  the
   response. Some things I see will be easy to get, like number of hits
 for
   instance, but I am more worried with this:
  
   We need to also track the size of the response (as the size in bytes of
  the
   whole xml response tat is streamed, with stored fields and all). I was
 a
   bit worried cause I am wondering if a searchcomponent will actually
 have
   access to the response bytes...
  
   Can someone confirm one way or the other? We are targeting Sorl4.0
  
   thanks
   xavier
  
 



Re: do SearchComponents have access to response contents

2013-04-04 Thread xavier jmlucjav
A custom QueryResponseWriter...this makes sense, thanks Jack


On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.comwrote:

 The search components can see the response as a namedlist, but it is
 only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON
 or whatever other format (Javabin as well) is generated from the named list
 for final output in an HTTP response.

 You probably want a custom query response writer that wraps the XML
 response writer. Then you can generate the XML and then do whatever you
 want with it.

 The QueryResponseWriter class and queryResponseWriter in solrconfig.xml.

 -- Jack Krupansky

 -Original Message- From: xavier jmlucjav
 Sent: Wednesday, April 03, 2013 4:22 PM
 To: solr-user@lucene.apache.org
 Subject: do SearchComponents have access to response contents


 I need to implement some SearchComponent that will deal with metrics on the
 response. Some things I see will be easy to get, like number of hits for
 instance, but I am more worried with this:

 We need to also track the size of the response (as the size in bytes of the
 whole xml response tat is streamed, with stored fields and all). I was a
 bit worried cause I am wondering if a searchcomponent will actually have
 access to the response bytes...

 Can someone confirm one way or the other? We are targeting Sorl4.0

 thanks
 xavier



do SearchComponents have access to response contents

2013-04-03 Thread xavier jmlucjav
I need to implement some SearchComponent that will deal with metrics on the
response. Some things I see will be easy to get, like number of hits for
instance, but I am more worried with this:

We need to also track the size of the response (as the size in bytes of the
whole xml response tat is streamed, with stored fields and all). I was a
bit worried cause I am wondering if a searchcomponent will actually have
access to the response bytes...

Can someone confirm one way or the other? We are targeting Sorl4.0

thanks
xavier


custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
I have the following setup:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
field name=descriptiontype=text   indexed=true
stored=true   multiValued=false omitNorms=true /

I index my corpus, and I can see tf is as usual, in this doc is 14 times in
this field:
4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
[DefaultSimilarity], result of:
  4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
0.14165252 = queryWeight, product of:
  10.0 = boost
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  0.0016648936 = queryNorm
31.834784 = fieldWeight in 440, product of:
  3.7416575 = tf(freq=14.0), with freq of:
14.0 = termFreq=14.0
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  1.0 = fieldNorm(doc=440)


Then I modify my schema:

similarity class=solr.SchemaSimilarityFactory/
fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
similarity class=com.customsolr.NoTfSimilarityFactory/
/fieldType

I just want to disable term freq  1, so a term its either present or not.

public class NoTfSimilarity extends DefaultSimilarity {
public float tf(float freq) {
return freq  0 ? 1.0f : 0.0f;
}
}

But I still see tf=14 in my query??
723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
  85.08203 = queryWeight, product of:
10.0 = boost
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = queryNorm
  8.5082035 = fieldWeight in 440, product of:
1.0 = tf(freq=14.0), with freq of:
  14.0 = termFreq=14.0
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = fieldNorm(doc=440)

anyone sees what I am missing?
I am on solr4.0

thanks
xavier


Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Hi Felipe,

I need to keep positions, that is why I cannot just use
omitTermFreqAndPositions


On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti fla...@thoughtworks.comwrote:

 Do you really need a custom similarity?
 Did you try to put the attribute omitTermFreqAndPositions in your field?

 It could be:

 field name=description omitTermFreqAndPositions=truetype=text
 indexed=true stored=true  multiValued=false omitNorms=true /

 http://wiki.apache.org/solr/SchemaXml


 On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav jmluc...@gmail.com
 wrote:

  I have the following setup:
 
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
  analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  /fieldType
  field name=descriptiontype=text   indexed=true
  stored=true   multiValued=false omitNorms=true /
 
  I index my corpus, and I can see tf is as usual, in this doc is 14 times
 in
  this field:
  4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
  [DefaultSimilarity], result of:
4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
  0.14165252 = queryWeight, product of:
10.0 = boost
8.5082035 = idf(docFreq=30, maxDocs=56511)
0.0016648936 = queryNorm
  31.834784 = fieldWeight in 440, product of:
3.7416575 = tf(freq=14.0), with freq of:
  14.0 = termFreq=14.0
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = fieldNorm(doc=440)
 
 
  Then I modify my schema:
 
  similarity class=solr.SchemaSimilarityFactory/
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
  analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  similarity class=com.customsolr.NoTfSimilarityFactory/
  /fieldType
 
  I just want to disable term freq  1, so a term its either present or
 not.
 
  public class NoTfSimilarity extends DefaultSimilarity {
  public float tf(float freq) {
  return freq  0 ? 1.0f : 0.0f;
  }
  }
 
  But I still see tf=14 in my query??
  723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
  723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
85.08203 = queryWeight, product of:
  10.0 = boost
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  1.0 = queryNorm
8.5082035 = fieldWeight in 440, product of:
  1.0 = tf(freq=14.0), with freq of:
14.0 = termFreq=14.0
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  1.0 = fieldNorm(doc=440)
 
  anyone sees what I am missing?
  I am on solr4.0
 
  thanks
  xavier
 



 --
 Felipe Lahti
 Consultant Developer - ThoughtWorks Porto Alegre



Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Steve,

yes, as I already included (though maybe is not very visible) I have this
before types element:
similarity class=solr.SchemaSimilarityFactory/

I can see explain info is indeed different, for example I have [] instead
of [DefaultSimilarity]

thanks



On Thu, Mar 21, 2013 at 3:08 PM, Steve Rowe sar...@gmail.com wrote:

 Hi xavier,

 Have you set the global similarity to solr.SchemaSimilarityFactory?

 See http://wiki.apache.org/solr/SchemaXml#Similarity.

 Steve

 On Mar 21, 2013, at 9:44 AM, xavier jmlucjav jmluc...@gmail.com wrote:

  Hi Felipe,
 
  I need to keep positions, that is why I cannot just use
  omitTermFreqAndPositions
 
 
  On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti fla...@thoughtworks.com
 wrote:
 
  Do you really need a custom similarity?
  Did you try to put the attribute omitTermFreqAndPositions in your
 field?
 
  It could be:
 
  field name=description omitTermFreqAndPositions=truetype=text
  indexed=true stored=true  multiValued=false omitNorms=true /
 
  http://wiki.apache.org/solr/SchemaXml
 
 
  On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav jmluc...@gmail.com
  wrote:
 
  I have the following setup:
 
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType
 field name=descriptiontype=text   indexed=true
  stored=true   multiValued=false omitNorms=true /
 
  I index my corpus, and I can see tf is as usual, in this doc is 14
 times
  in
  this field:
  4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
  [DefaultSimilarity], result of:
   4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
 0.14165252 = queryWeight, product of:
   10.0 = boost
   8.5082035 = idf(docFreq=30, maxDocs=56511)
   0.0016648936 = queryNorm
 31.834784 = fieldWeight in 440, product of:
   3.7416575 = tf(freq=14.0), with freq of:
 14.0 = termFreq=14.0
   8.5082035 = idf(docFreq=30, maxDocs=56511)
   1.0 = fieldNorm(doc=440)
 
 
  Then I modify my schema:
 
 similarity class=solr.SchemaSimilarityFactory/
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 similarity class=com.customsolr.NoTfSimilarityFactory/
 /fieldType
 
  I just want to disable term freq  1, so a term its either present or
  not.
 
  public class NoTfSimilarity extends DefaultSimilarity {
 public float tf(float freq) {
 return freq  0 ? 1.0f : 0.0f;
 }
  }
 
  But I still see tf=14 in my query??
  723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result
 of:
 723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product
 of:
   85.08203 = queryWeight, product of:
 10.0 = boost
 8.5082035 = idf(docFreq=30, maxDocs=56511)
 1.0 = queryNorm
   8.5082035 = fieldWeight in 440, product of:
 1.0 = tf(freq=14.0), with freq of:
   14.0 = termFreq=14.0
 8.5082035 = idf(docFreq=30, maxDocs=56511)
 1.0 = fieldNorm(doc=440)
 
  anyone sees what I am missing?
  I am on solr4.0
 
  thanks
  xavier
 
 
 
 
  --
  Felipe Lahti
  Consultant Developer - ThoughtWorks Porto Alegre
 




Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Damn...I was obfuscated seeing the 14 there...I had naively thought that
term freq would not be stored in the doc, 1 would be stored, but I guess it
still stores the real value and then applies custom similarity at query
time.

That means changing to a custom similarity does not need reindexing right?

thanks for the help!
xavier


On Thu, Mar 21, 2013 at 5:26 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 :  public class NoTfSimilarity extends DefaultSimilarity {
 :  public float tf(float freq) {
 :  return freq  0 ? 1.0f : 0.0f;
 :  }
 :  }
 ...

 :  But I still see tf=14 in my query??
 ...
 :  1.0 = tf(freq=14.0), with freq of:
 :14.0 = termFreq=14.0

 pretty sure you are looking at the explanation of the *input* to your tf()
 function, not that the *output* is 1.0, just like in your function.

 Did you compare this to what you see using the DefaultSimilarity?



 -Hoss



Re: 4.0 hanging on startup on Windows after Control-C

2013-03-18 Thread xavier jmlucjav
Hi Shawn,

I am using DIH with commit at the end...I'll investigate further to see if
this is what is happening and will report back, also will check 4.2 (that I
had to do anyway...).
thanks for your input
xavier


On Mon, Mar 18, 2013 at 6:12 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/17/2013 11:51 AM, xavier jmlucjav wrote:

 Hi,

 I have an index where, if I kill solr via Control-C, it consistently hangs
 next time I start it. Admin does not show cores, and searches never
 return.
 If I delete the index contents and I restart again all is ok. I am on
 windows 7, jdk1.7 and Solr4.0.
 Is this a known issue? I looked in jira but found nothing.


 I scanned your thread dump.  Nothing jumped out at me, but given my
 inexperience with such things, I'm not surprised by that.

 Have you tried 4.1 or 4.2 yet to see if the problem persists?  4.0 is no
 longer the new hotness.

 Below I will discuss the culprit that springs to mind, though I don't know
 whether it's what you are actually hitting.

 One thing that can make Solr take a really long time to start up is huge
 transaction logs.  Transaction logs must be replayed when Solr starts, and
 if they are huge, it can take a really long time.

 Do you have tlog directories in your cores (in the data dir, next to the
 index directory), and if you do, how much disk space do they use?  The
 example config in 4.x has updateLog turned on.

 There are two common situations that can lead to huge transaction logs.
  One is exclusively using soft commits when indexing, the other is running
 a very large import with the dataimport handler and not committing until
 the very end.

 AutoCommit with openSearcher=false is a good solution to both of these
 situations.  As long as you use openSearcher=false, it will not change what
 documents are visible.  AutoCommit does a regular hard commit every X new
 documents or every Y milliseconds.  A hard commit flushes index data to
 disk and starts a new transaction log.  Solr will only keep a few
 transaction logs around, so frequently building new ones keeps their size
 down.  When you restart Solr, you don't need to wait for a long time while
 it replays them.

 Thanks,
 Shawn




Re: Is there an EdgeSingleFilter already?

2013-03-17 Thread xavier jmlucjav
Steve, worked like a charm.
thanks!


On Sun, Mar 17, 2013 at 7:37 AM, Steve Rowe sar...@gmail.com wrote:

 See https://issues.apache.org/jira/browse/LUCENE-4843

 Let me know if it works for you.

 Steve

 On Mar 16, 2013, at 5:35 PM, xavier jmlucjav jmluc...@gmail.com wrote:

  I read too fast your reply, so I thought you meant configuring the
  LimitTokenPositionFilter. I see you mean I have to write one, ok...
 
 
 
  On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav jmluc...@gmail.com
 wrote:
 
  Steve,
 
  Yes, I want only one, one two, and one two three, but nothing
 else.
  Cool if this can be achieved without java code even better, I'll check
 that
  filter.
 
  I need this for building a field used for suggestions, the user
  specifically wants no match only from the edge.
 
  thanks!
 
  On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote:
 
  Hi xavier,
 
  It's not clear to me what you want.  Is the edge you're referring to
  the beginning of a field? E.g. raw text one two three four with
  EdgeShingleFilter configured to produce unigrams, bigrams and trigams
 would
  produce one, one two, and one two three, but nothing else?
 
  If so, I suspect writing a LimitTokenPositionFilter (which would stop
  emitting tokens after the token position exceeds a specified limit)
 would
  be better, rather than subclassing ShingleFilter.  You could use
  LimitTokenCountFilter as a model, especially its comsumeAllTokens
 option.
  I think this would make a nice addition to Lucene.
 
  Also, what do you plan to use this for?
 
  Steve
 
  On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com
 wrote:
  Hi,
 
  I need to use shingles but only keep the ones that start from the
 edge.
 
  I want to confirm there is no way to get this feature without
  subclassing
  ShingleFilter, cause I thought someone would have already encountered
  this
  use case
 
  thanks
  xavier
 
 
 




4.0 hanging on startup on Windows after Control-C

2013-03-17 Thread xavier jmlucjav
Hi,

I have an index where, if I kill solr via Control-C, it consistently hangs
next time I start it. Admin does not show cores, and searches never return.
If I delete the index contents and I restart again all is ok. I am on
windows 7, jdk1.7 and Solr4.0.
Is this a known issue? I looked in jira but found nothing.
xavier

Here is a thread dump:

2013-03-17 17:58:33
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.7-b01 mixed mode):

JMX server connection timeout 30 daemon prio=6 tid=0x0bbf9000
nid=0x3b4c in Object.wait() [0x1df3e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xe7054338 (a [I)
at
com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:168)
- locked 0xe7054338 (a [I)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

RMI Scheduler(0) daemon prio=6 tid=0x0bbf8000 nid=0x39d8 waiting
on condition [0x1db9f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xb9e1e6d8 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

RMI TCP Connection(1)-192.168.1.128 daemon prio=6 tid=0x0bbf7800
nid=0x111c runnable [0x1dd3e000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
- locked 0xe70003c8 (a java.io.BufferedInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- 0xb959bc68 (a
java.util.concurrent.ThreadPoolExecutor$Worker)

RMI TCP Accept-0 daemon prio=6 tid=0x0bbf5000 nid=0x1fe0 runnable
[0x1da4e000]
   java.lang.Thread.State: RUNNABLE
at java.net.DualStackPlainSocketImpl.accept0(Native Method)
at
java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:121)
at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:183)
- locked 0xb9531a78 (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:522)
at java.net.ServerSocket.accept(ServerSocket.java:490)
at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

DestroyJavaVM prio=6 tid=0x0bbf6800 nid=0x60c waiting on
condition [0x]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

searcherExecutor-6-thread-1 prio=6 tid=0x0bbf6000 nid=0x3480 in
Object.wait() [0x1441e000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xb9e6a4a0 (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1379)
- locked 0xb9e6a4a0 (a java.lang.Object)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1135

Re: Is there an EdgeSingleFilter already?

2013-03-16 Thread xavier jmlucjav
Steve,

Yes, I want only one, one two, and one two three, but nothing else.
Cool if this can be achieved without java code even better, I'll check that
filter.

I need this for building a field used for suggestions, the user
specifically wants no match only from the edge.

thanks!

On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote:

 Hi xavier,

 It's not clear to me what you want.  Is the edge you're referring to the
 beginning of a field? E.g. raw text one two three four with
 EdgeShingleFilter configured to produce unigrams, bigrams and trigams would
 produce one, one two, and one two three, but nothing else?

 If so, I suspect writing a LimitTokenPositionFilter (which would stop
 emitting tokens after the token position exceeds a specified limit) would
 be better, rather than subclassing ShingleFilter.  You could use
 LimitTokenCountFilter as a model, especially its comsumeAllTokens option.
  I think this would make a nice addition to Lucene.

 Also, what do you plan to use this for?

 Steve

 On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com wrote:
  Hi,
 
  I need to use shingles but only keep the ones that start from the edge.
 
  I want to confirm there is no way to get this feature without subclassing
  ShingleFilter, cause I thought someone would have already encountered
 this
  use case
 
  thanks
  xavier




Re: Is there an EdgeSingleFilter already?

2013-03-16 Thread xavier jmlucjav
I read too fast your reply, so I thought you meant configuring the
LimitTokenPositionFilter. I see you mean I have to write one, ok...



On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav jmluc...@gmail.comwrote:

 Steve,

 Yes, I want only one, one two, and one two three, but nothing else.
 Cool if this can be achieved without java code even better, I'll check that
 filter.

 I need this for building a field used for suggestions, the user
 specifically wants no match only from the edge.

 thanks!

 On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote:

 Hi xavier,

 It's not clear to me what you want.  Is the edge you're referring to
 the beginning of a field? E.g. raw text one two three four with
 EdgeShingleFilter configured to produce unigrams, bigrams and trigams would
 produce one, one two, and one two three, but nothing else?

 If so, I suspect writing a LimitTokenPositionFilter (which would stop
 emitting tokens after the token position exceeds a specified limit) would
 be better, rather than subclassing ShingleFilter.  You could use
 LimitTokenCountFilter as a model, especially its comsumeAllTokens option.
  I think this would make a nice addition to Lucene.

 Also, what do you plan to use this for?

 Steve

 On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com wrote:
  Hi,
 
  I need to use shingles but only keep the ones that start from the edge.
 
  I want to confirm there is no way to get this feature without
 subclassing
  ShingleFilter, cause I thought someone would have already encountered
 this
  use case
 
  thanks
  xavier





RE: Need help with delta import

2013-03-11 Thread Xavier Pell
This is absolutely a sintax error, I had the same problem, and with
dih.delta.id it solves all my problems. Thanks to god and the special
person who post the answer in this page.

You have to revise your sintax in queries for delta import and watch the
catalina (i use tomcat) log file for any errors.

Regards,


[ANN] vifun: a GUI to help visually tweak Solr scoring, release 0.6

2013-03-10 Thread xavier jmlucjav
Hi,

I am releasing an new version (0.6) of vifun, a GUI to help visually tweak
Solr scoring. Most relevant changes are:
- support float values
- add support for tie
- synch both Current/Baseline scrollbars (if some checkbox is selected)
- doubleclick in a doc: show side by side comparison of debug score info
- upgrade to griffon1.2.0
- allow using another handler (besides /select) enhancement

You can check it out here: https://github.com/jmlucjav/vifun
Binary distribution:
http://code.google.com/p/vifun/downloads/detail?name=vifun-0.6.zip

xavier


Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-03-04 Thread xavier jmlucjav
Hi Mark,

Thanks for trying it out.

Let me see if I explain it better: the number you have to select (in order
to later being able to tweak it with the slider), is  any number that must
be in one of the parameters in the Scoring section.

The issue you have, is that you are using /select handler from the example
distribution, and that handler does not have any of these parameters (qf,
pf, pf2, pf3, ps, ps2, ps3, bf, bq, boost, mm, tie), so it's normal they
don't show up, there is nothing to tweak...

In the example configuration from 4.1, you can select /browse handler, as
it uses qf and mm, and you should be able to tweak them. Of course If you
were using a real Solr installation with a sizable number of documents and
some complex usage of edismax, you would be able to see much better what
the tool can do.

xavier


On Mon, Mar 4, 2013 at 10:52 PM, Mark Bennett
mark.benn...@lucidworks.comwrote:

 Hello Xavier,

 Thanks for uploading this and sharing.  I also read the other messages in
 the thread.

 I'm able to get part way through your Getting Started section, I get
 results, but I get stuck on the editing values.  I've tried with Java 6 and
 7, with both the 0.5 binary and from the source distribution.

 What's working:
 * Default Solr 4.1 install  (plus a couple extra fields in schema)
 * Able to connect to Solr (/collection1)
 * Able to select handler (/select)
 * Able to run a search:
   q=bandwidth
   rows=10
   fl=title
   rest: pt=45.15,-93.85 (per your example)
 * Get 2 search results with titles
 * Able to select a result, mouse over, highlight score, etc.

 However, what I'm stuck on:
 * Below the Run Query button, I only see the grayed out Scoring slider.
 * The instructions say to highlight some numbers
   - I tried highlighting the 10 in rows paramour
   - I also tried the 45.15 in rest, and some of the scores in the
 results list

 I never see the extra parameters you show in this screen shot:

 https://raw.github.com/jmlucjav/vifun/master/img/screenshot-selecttarget.jpg
 I see the word Scoring:
 I don't see the blue text Select a number as a target to tweak
 I don't see the parameters qf, bf_0, 1, 2, bq_0, etc.

 I'm not sure how to get those extra fields to appear in the UI.

 I also tried adding defType=edismax, no luck

 The Handlers it sees:
 /select, /query, /browse, /spell, /tvrh, /clustering, /terms,
 /elevate
 (from default Solr 4.1 solrconfig.xml)
 I'm using /select


 --
 Mark Bennett / LucidWorks: Search  Big Data / mark.benn...@lucidworks.com
 Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513







 On Feb 23, 2013, at 6:12 AM, jmlucjav jmluc...@gmail.com wrote:

  Hi,
 
  I have built a small tool to help me tweak some params in Solr (typically
  qf, bf in edismax). As maybe others find it useful, I am open sourcing it
  on github: https://github.com/jmlucjav/vifun
 
  Check github for some more info and screenshots. I include part of the
  github page below.
  regards
 
  Description
 
  Did you ever spend lots of time trying to tweak all numbers in a
 *edismax*
  handler *qf*, *bf*, etc params so docs get scored to your liking? Imagine
  you have the params below, is 20 the right boosting for *name* or is it
 too
  much? Is *population* being boosted too much versus distance? What about
  new documents?
 
 !-- fields, boost some --
 str name=qfname^20 textsuggest^10 edge^5 ngram^2
 phonetic^1/str
 str name=mm33%/str
 !-- boost closest hits --
 str name=bfrecip(geodist(),1,500,0)/str
 !-- boost by population --
 str name=bfproduct(log(sum(population,1)),100)/str
 !-- boost newest docs --
 str name=bfrecip(rord(moddate),1,1000,1000)/str
 
  This tool was developed in order to help me tweak the values of boosting
  functions etc in Solr, typically when using edismax handler. If you are
 fed
  up of: change a number a bit, restart Solr, run the same query to see how
  documents are scored now...then this tool is for you.
  https://github.com/jmlucjav/vifun#featuresFeatures
 
- Can tweak numeric values in the following params: *qf, pf, bf, bq,
boost, mm* (others can be easily added) even in *appends or
invariants*
- View side by side a Baseline query result and how it changes when you
gradually change each value in the params
- Colorized values, color depends on how the document does related to
baseline query
- Tooltips give you Explain info
- Works on remote Solr installations
- Tested with Solr 3.6, 4.0 and 4.1 (other versions would work too, as
long as wt=javabin format is compatible)
- Developed using Groovy/Griffon
 
  https://github.com/jmlucjav/vifun#requirementsRequirements
 
- */select* handler should be available, and not have any *appends or
invariants*, as it could interfere with how vifun works.
- Java6 is needed (maybe it runs on Java5 too). A JRE should be enough.
 
  https://github.com

Index all possible facets values even if there is no document in relation

2012-03-07 Thread Xavier
Hi everyone,

My question is a little weird but i need to have all my facet values in solr
index :

I have a database with all possible values of my facets for my solr
documents.

I don't have all my facets values used by my documents, but I would like to
index theses facets values even if they returned 0 documents.

I need this for SEO management, and because i want to test this facets
values (with 0 documents) without requesting my database.


Best Regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-all-possible-facets-values-even-if-there-is-no-document-in-relation-tp3806461p3806461.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 'location' fieldType indexation impossible

2012-02-23 Thread Xavier
You totally get it :)

I'v deleted thoses dynamicField (though it was just an exemple), why didn't
i read the comment above the line  !

Thanks alot ;)

Best regards,
Xavier.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/location-fieldType-indexation-impossible-tp3766136p3769065.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an autofacet with a predefined facet

2012-02-23 Thread Xavier
Thank you for theses informations, I'll keep that in mind.

But i'm sorry, i don't get it about the process to do it ???


Em wrote
 
 Well, you could create a keyword-file out of your database and join it
 with your self-maintained keywordslist. 
 


By that you mean : 
- 'self-maintained keywordslist' is my 'predefined_facet' already filled in
database that i'll still import with DIH ?
- The keyword-file isnt the same thing that i've created with
synonyms/keepsword combination ?

And still don't get how to 'merge' those both way of getting facets values
in an only one facet !

Thanks for advance,
Xavier


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3769121.html
Sent from the Solr - User mailing list archive at Nabble.com.


'location' fieldType indexation impossible

2012-02-22 Thread Xavier
Hi,

When i try to index my location field i get this error for each documents :
*ATTENTION: Error creating document  Error adding field
'emploi_city_geoloc'='48.85,2.5525' *
(so i have 0 files indexed)

Here is my schema.xml :
*field name=emploi_city_geoloc type=location indexed=true
stored=false/*

I really don't understand why it isnt working because, it was working on my
local server with the same configuration (Solr 3.5.0) and the same database
!!!

If i try to use geohash instead of location it is working for
indexation, but my geodist query in front isnt working anymore ...

Any ideas ?

Best regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/location-fieldType-indexation-impossible-tp3766136p3766136.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an autofacet with a predefined facet

2012-02-22 Thread Xavier
I'm not sure to understand your solution ?

When (and how) will be the 'word' detection in the fulltext ? before (by my
own) or during (with) solr indexation ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3767059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
That's it !  Thanks :)

First time i see that documentation page (which is really helpfull) :
http://lucidworks.lucidimagination.com/display/solr/Filter+Descriptions#FilterDescriptions-KeepWordsFilter

So, now i want to associate a wordslist to a value of an existing facets

So i tried i combine synonyms and keepwords like that : 

fieldType name=text_tag class=solr.TextField sortMissingLast=true
omitNorms=true
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonymswords.txt/

filter class=solr.KeepWordFilterFactory
words=keepwords.txt ignoreCase=true/
/analyzer
/fieldType

It works very well but my problem now is that i want to have whitespaces
return in synonym and match it with my keepwords ! (because i have
whitespaces in the values of my facet)

Exemple if i see : 'php' term i get with my synonyms_words : 'web langage'
and i keep the whole word 'web langage'

So my files are : 
synonymswords.txt : php=web langage
keepwords.txt : web langage

The problem is that each words are analyze separatly and i dont know how to
handle it with whitespaces ...
(synonyms return 'web' and 'langage' so it don't match with 'web langage')

I tried to use 'solr.PatternReplaceFilter'  (as you can see in my
configuration above ) with a chosen caractere '_' as a space caracter but i
get an error so if you have an other tip for me it would be great :p



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763247.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
Seems that's an error from the documentation with the 'Factory' missing in
the classname !!?

I found 

filter class=solr.PatternReplaceFilterFactory pattern=_ replacement=
/

That is working fine !!!

Conclusion i have this files :
*synonymswords.txt :*
php,mysql,html,css=web_langage

And

*keepwords.txt :*
web langage

With this fieldType : 

fieldType name=text_tag class=solr.TextField sortMissingLast=true
omitNorms=true
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonymswords.txt/
filter class=solr.PatternReplaceFilterFactory pattern=_
replacement= /
filter class=solr.KeepWordFilterFactory
words=keepwords.txt ignoreCase=true/
/analyzer
/fieldType


And it's working fine ;)


But I have another question, my fields are configured like that :

copyField source=mytext dest=text_tag_facet /
field name=text_tag_facet type=text_tag indexed=true stored=false
multiValued=true/

But if I turn stored to true, it always return the full original text in
my documents field value for text_tag_facet and not the facets created
(like 'web langage')

How can i get the result of the facet in the stored field of the document ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763551.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to merge an autofacet with a predefined facet

2012-02-21 Thread Xavier
Hi everyone,

Like explained in this post :
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html

I have created a dynamic facet at indexation by searching terms in a
fulltext field.

But i don't know if it's possible to merge this autocreated facet with a
facet already predefined ? i tried to used copyField (adding this to my
code in my previous post) : 
*copyField source=text_tag_facet dest=predefined_facet /*

 but it's not seems to work ... (my text_tag_facet is always working, but
didnt merged with my predefined_facet)

It's maybe because (As I understood) the real (stored) value of this dynamic
facet is still the initial fulltext  ?? (or maybe i'm wrong ...)

I'm a little confused about this and i'm certainly doing it wrong but i
begin to feel that those kinds of manipulation arent feasible into
schema.xml 

Best regards.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3763988.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
Thanks for this answer.

I have posted my new question (related to this post) into a new topic ;)

(
http://lucene.472066.n3.nabble.com/How-to-merge-an-quot-autofacet-quot-with-a-predefined-facet-td3763988.html
)


Best regards

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3763993.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an autofacet with a predefined facet

2012-02-21 Thread Xavier
Sure, the difference between my 2 facets are :

- 'predefined_facets' contains values already filled in my database like :
'web langage', 'cooking', 'fishing' 

- 'text_tag_facets' will contain the same possible value but determined
automatically from a given wordslist by searching in the document text as
shown in my previous post


Why i want to do that ? because sometimes my 'predefined_facets' is not
defined, and even if it is, i want to defined it the more as possible.

Best regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764116.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an autofacet with a predefined facet

2012-02-21 Thread Xavier
In a way I agree that it would be easier to do that but i really wants to
avoid this solution because it prefer to work harder on preparing my index
than adding field requests on my front query :)

So the only solution i see right now is to do that on my own in order to
have my database fully prepared to be indexed ... but i had hope that solr
could handle it ... so if anyone see any solution to handle it directly with
solr you are welcome :p

Anyways thanks for your help Em ;)

Best regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3764506.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to index a facetfield by searching words matching from another Textfield

2012-02-20 Thread Xavier
Hi everyone,

I'm a new Solr User but i used to work on Endeca.

There is a modul called TextTagger with Endeca that is auto indexing
values in a facetfield (multivalued) when he find words (from a given
wordslist) into an other TextField from that document.

I didn't see any subjects or any ways to do it with Solr ???

Thanks for advance ;)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-tp3761201p3761201.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tomcat6 and Log4j

2011-02-10 Thread Xavier Schepler

Hi,

I added “slf4j-log4j12-1.5.5.jar” and “log4j-1.2.15.jar” to 
$CATALINA_HOME/webapps/solr/WEB-INF/lib ,
then deleted the library “slf4j-jdk14-1.5.5.jar” from 
$CATALINA_HOME/webapps/solr/WEB-INF/lib,

then created a directory $CATALINA_HOME/webapps/solr/WEB-INF/classes.
and created $CATALINA_HOME/webapps/solr/WEB-INF/classes/log4j.properties 
with the following contents :


log4j.rootLogger=INFO
log4j.appender.SOLR.logfile=org.apache.log4j.DailyRollingFileAppender
log4j.appender.SOLR.logfile.file=/home/quetelet_bdq/logs/bdq.log
log4j.appender.SOLR.logfile.DatePattern='.'-MM-dd
log4j.appender.SOLR.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.SOLR.logfile.layout.conversionPattern=%d %p [%c{3}] - 
[%t] - %X{ip}: %m%n

log4j.appender.SOLR.logfile = true

I restarted solr and I got the following message in the catalina.out log :

log4j:WARN No appenders could be found for logger 
(org.apache.solr.core.SolrResourceLoader).

log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.


What is told on this page is that this error occurs what the 
log4j.properties isn't found.


Could someone help me to have it working ?

Thanks in advance,

Xavier


Re: Tomcat6 and Log4j

2011-02-10 Thread Xavier SCHEPLER
Thanks for your response.
How could I do that ?




 
 From: Jan Høydahl jan@cominvent.com
 Sent: Thu Feb 10 11:01:15 CET 2011
 To: solr-user@lucene.apache.org
 Subject: Re: Tomcat6 and Log4j
 
 
 Have you tried to start Tomcat with 
 -Dlog4j.configuration=$CATALINA_HOME/webapps/solr/WEB-INF/classes/log4j.properties
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 10. feb. 2011, at 09.41, Xavier Schepler wrote:
 
  Hi,
  
  I added “slf4j-log4j12-1.5.5.jar” and “log4j-1.2.15.jar” to 
  $CATALINA_HOME/webapps/solr/WEB-INF/lib ,
  then deleted the library “slf4j-jdk14-1.5.5.jar” from 
  $CATALINA_HOME/webapps/solr/WEB-INF/lib,
  then created a directory $CATALINA_HOME/webapps/solr/WEB-INF/classes.
  and created $CATALINA_HOME/webapps/solr/WEB-INF/classes/log4j.properties 
  with the following contents :
  
  log4j.rootLogger=INFO
  log4j.appender.SOLR.logfile=org.apache.log4j.DailyRollingFileAppender
  log4j.appender.SOLR.logfile.file=/home/quetelet_bdq/logs/bdq.log
  log4j.appender.SOLR.logfile.DatePattern='.'-MM-dd
  log4j.appender.SOLR.logfile.layout=org.apache.log4j.PatternLayout
  log4j.appender.SOLR.logfile.layout.conversionPattern=%d %p [%c{3}] - [%t] - 
  %X{ip}: %m%n
  log4j.appender.SOLR.logfile = true
  
  I restarted solr and I got the following message in the catalina.out log :
  
  log4j:WARN No appenders could be found for logger 
  (org.apache.solr.core.SolrResourceLoader).
  log4j:WARN Please initialize the log4j system properly.
  log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
  more info.
  
  What is told on this page is that this error occurs what the 
  log4j.properties isn't found.
  
  Could someone help me to have it working ?
  
  Thanks in advance,
  
  Xavier
 


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


Re: Tomcat6 and Log4j

2011-02-10 Thread Xavier SCHEPLER
I added it to /etc/default/tomcat6.
What happened is that the same error message appeared twice in 
/var/log/tomcat6/catalina.out.
Like the same file was loaded twice.


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


Re: Tomcat6 and Log4j

2011-02-10 Thread Xavier SCHEPLER
Yes thanks. This works fine :

log4j.rootLogger=INFO, SOLR
log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender
log4j.appender.SOLR.file=/home/quetelet_bdq/logs/bdq.log
log4j.appender.SOLR.datePattern='.'-MM-dd
log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout
log4j.appender.SOLR.layout.conversionPattern=%d %p [%c{3}] - [%t] - %X{ip}: %m%n


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


Re: Local param tag voodoo ?

2011-01-20 Thread Xavier SCHEPLER
Ok,
I tryed to use nested queries this way:
wt=jsonindent=truefl=qFRq=sarkozy 
_query_:{!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId
It resulted in this error:
facet_counts:{
  facet_queries:{},
  exception:java.lang.NullPointerException\n\tat 
org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:132)\n\tat 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:278)\n\tat
 
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)\n\tat
 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)\n\tat
 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat
 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat
 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat
 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat
 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat
 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat
 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat
 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat
 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat
 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat 
java.lang.Thread.run(Thread.java:636)\n}}

Then I tryed a simpler version:
q={!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId

It resulted in the same error.


 
 From: Jonathan Rochkind rochk...@jhu.edu
 Sent: Wed Jan 19 17:38:53 CET 2011
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Subject: Re: Local param tag voodoo ?
 
 
 What query are you actually trying to do?  There's probably a way to do 
 it, possibly using nested queries -- but not using illegal syntax like 
 some of your examples!  If you explain what you want to do, someone may 
 be able to tell you how.  From the hints in your last message, I suspect 
 nested queries _might_ be helpful to you.
 
 On 1/19/2011 3:46 AM, Xavier SCHEPLER wrote:
  Ok I was already at this point.
  My facetting system use exactly what is described in this page. I read it 
  from the Solr 1.4 book. Otherwise I would'nt ask.
  The problem is that the filter queries doesn't affect the relevance score 
  of the results so I want the terms in the main query.
 
 
  
  From: Markus Jelsmamarkus.jel...@openindex.io
  Sent: Tue Jan 18 21:31:52 CET 2011
  To:solr-user@lucene.apache.org
  Subject: Re: Local param tag voodoo ?
 
 
  Hi,
 
  You get an error because LocalParams need to be in the beginning of a
  parameter's value. So no parenthesis first. The second query should not 
  give an
  error because it's a valid query.
 
  Anyway, i assume you're looking for :
  http://wiki.apache.org/solr/SimpleFacetParameters#Multi-
  Select_Faceting_and_LocalParams
 
  Cheers,
 
  Hey,
 
  here are my needs :
 
  - a query that has tagged and untagged contents
  - facets that ignore the tagged contents
 
  I tryed :
 
  q=({!tag=toExclude} ignored)  taken into account
  q={tag=toExclude v='ignored'} take into account
 
  Both resulted in a error.
 
  Is this possible or do I have to try another way ?
 
  --
  Tous les courriers électroniques émis depuis la messagerie
  de Sciences Po doivent respecter des conditions d'usages.
  Pour les consulter rendez-vous sur
  http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


Re: Local param tag voodoo ?

2011-01-20 Thread Xavier SCHEPLER
Since it seems to be no voodoo available I did it on the client side.
I send a first request to get the facets and a second to get the documents and 
their highlighting.
It works well but requires more processing.

 
 From: Xavier SCHEPLER xavier.schep...@sciences-po.fr
 Sent: Thu Jan 20 10:59:40 CET 2011
 To: solr-user@lucene.apache.org
 Subject: Re: Local param tag voodoo ?
 
 
 Ok,
 I tryed to use nested queries this way:
 wt=jsonindent=truefl=qFRq=sarkozy 
 _query_:{!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId
 It resulted in this error:
 facet_counts:{
   facet_queries:{},
   exception:java.lang.NullPointerException\n\tat 
 org.apache.solr.request.SimpleFacets.parseParams(SimpleFacets.java:132)\n\tat 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:278)\n\tat
  
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)\n\tat
  
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)\n\tat
  
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)\n\tat
  
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)\n\tat
  org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)\n\tat 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)\n\tat
  
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)\n\tat
  
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat
  
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat
  
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat
  
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat
  
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat
  
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat
  
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat
  
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)\n\tat
  
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)\n\tat
  
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)\n\tat
  
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat 
 java.lang.Thread.run(Thread.java:636)\n}}
 
 Then I tryed a simpler version:
 q={!tag=test}chiracfacet=truefacet.field={!ex=test}studyDescriptionId
 
 It resulted in the same error.
 
 
  
  From: Jonathan Rochkind rochk...@jhu.edu
  Sent: Wed Jan 19 17:38:53 CET 2011
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Subject: Re: Local param tag voodoo ?
  
  
  What query are you actually trying to do?  There's probably a way to do 
  it, possibly using nested queries -- but not using illegal syntax like 
  some of your examples!  If you explain what you want to do, someone may 
  be able to tell you how.  From the hints in your last message, I suspect 
  nested queries _might_ be helpful to you.
  
  On 1/19/2011 3:46 AM, Xavier SCHEPLER wrote:
   Ok I was already at this point.
   My facetting system use exactly what is described in this page. I read it 
   from the Solr 1.4 book. Otherwise I would'nt ask.
   The problem is that the filter queries doesn't affect the relevance score 
   of the results so I want the terms in the main query.
  
  
   
   From: Markus Jelsmamarkus.jel...@openindex.io
   Sent: Tue Jan 18 21:31:52 CET 2011
   To:solr-user@lucene.apache.org
   Subject: Re: Local param tag voodoo ?
  
  
   Hi,
  
   You get an error because LocalParams need to be in the beginning of a
   parameter's value. So no parenthesis first. The second query should not 
   give an
   error because it's a valid query.
  
   Anyway, i assume you're looking for :
   http://wiki.apache.org/solr/SimpleFacetParameters#Multi-
   Select_Faceting_and_LocalParams
  
   Cheers,
  
   Hey,
  
   here are my needs :
  
   - a query that has tagged and untagged contents
   - facets that ignore the tagged contents
  
   I tryed :
  
   q=({!tag=toExclude} ignored)  taken into account
   q={tag=toExclude v='ignored'} take into account
  
   Both resulted in a error.
  
   Is this possible or do I have to try another way ?
  
   --
   Tous les courriers électroniques émis depuis la messagerie
   de Sciences Po doivent respecter des conditions d'usages.
   Pour les consulter rendez-vous sur
   http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm
 
 
 --
 Tous les courriers électroniques émis depuis la

Re: Local param tag voodoo ?

2011-01-19 Thread Xavier SCHEPLER
You're right the second query didn't result in an error but neither gave the 
expected result.
I'm gone to have a look at the link you gave me.
Thanks !

 
 From: Markus Jelsma markus.jel...@openindex.io
 Sent: Tue Jan 18 21:31:52 CET 2011
 To: solr-user@lucene.apache.org
 Subject: Re: Local param tag voodoo ?
 
 
 Hi,
 
 You get an error because LocalParams need to be in the beginning of a 
 parameter's value. So no parenthesis first. The second query should not give 
 an 
 error because it's a valid query.
 
 Anyway, i assume you're looking for :
 http://wiki.apache.org/solr/SimpleFacetParameters#Multi-
 Select_Faceting_and_LocalParams
 
 Cheers,
 
  Hey,
  
  here are my needs :
  
  - a query that has tagged and untagged contents
  - facets that ignore the tagged contents
  
  I tryed :
  
  q=({!tag=toExclude} ignored)  taken into account
  q={tag=toExclude v='ignored'} take into account
  
  Both resulted in a error.
  
  Is this possible or do I have to try another way ?


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


Re: Local param tag voodoo ?

2011-01-19 Thread Xavier SCHEPLER
Ok I was already at this point.
My facetting system use exactly what is described in this page. I read it from 
the Solr 1.4 book. Otherwise I would'nt ask.
The problem is that the filter queries doesn't affect the relevance score of 
the results so I want the terms in the main query.


 
 From: Markus Jelsma markus.jel...@openindex.io
 Sent: Tue Jan 18 21:31:52 CET 2011
 To: solr-user@lucene.apache.org
 Subject: Re: Local param tag voodoo ?
 
 
 Hi,
 
 You get an error because LocalParams need to be in the beginning of a 
 parameter's value. So no parenthesis first. The second query should not give 
 an 
 error because it's a valid query.
 
 Anyway, i assume you're looking for :
 http://wiki.apache.org/solr/SimpleFacetParameters#Multi-
 Select_Faceting_and_LocalParams
 
 Cheers,
 
  Hey,
  
  here are my needs :
  
  - a query that has tagged and untagged contents
  - facets that ignore the tagged contents
  
  I tryed :
  
  q=({!tag=toExclude} ignored)  taken into account
  q={tag=toExclude v='ignored'} take into account
  
  Both resulted in a error.
  
  Is this possible or do I have to try another way ?


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


Local param tag voodoo ?

2011-01-18 Thread Xavier Schepler

Hey,

here are my needs :

- a query that has tagged and untagged contents
- facets that ignore the tagged contents

I tryed :

q=({!tag=toExclude} ignored)  taken into account
q={tag=toExclude v='ignored'} take into account

Both resulted in a error.

Is this possible or do I have to try another way ?


Solr boolean operators

2011-01-13 Thread Xavier Schepler

Hi,

with the Lucene query syntax, is :

a AND (a OR b)

equivalent to :

a

(absorption)

?


Re: Solr boolean operators

2011-01-13 Thread Xavier SCHEPLER
Ok, thanks.
That's what I expected :D

 
 From: dante stroe dante.st...@gmail.com
 Sent: Thu Jan 13 15:56:33 CET 2011
 To: solr-user@lucene.apache.org
 Subject: Re: Solr boolean operators
 
 
 To my understanding: in terms of the results that will be matched by your
 query ... it's the same. In terms of the score of the results  no,
 since, if you are using the first query, the documents that will match both
 the a and the b terms, will match higher then the ones matching just the
 a term.
 
 On Thu, Jan 13, 2011 at 3:29 PM, Xavier Schepler 
 xavier.schep...@sciences-po.fr wrote:
 
  Hi,
 
  with the Lucene query syntax, is :
 
  a AND (a OR b)
 
  equivalent to :
 
  a
 
  (absorption)
 
  ?
 


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm


Re: No response from Solr on complex request after several days

2010-10-29 Thread Xavier Schepler

On 29/10/2010 12:08, Lance Norskog wrote:

There are a few problems that can happen. This is usually a sign of
garbage collection problems.
You can monitor the Tomcat instance with JConsole or one of the other
java monitoring tools and see if there is a memory leak.

Also, most people don't need to do it, but you can automatically
restart it once a day.

On Thu, Oct 28, 2010 at 2:20 AM, Xavier Schepler
xavier.schep...@sciences-po.fr  wrote:
   

Hi,

We are in a beta testing phase, with several users a day.

After several days of waiting, the solr server didn't respond to requests
that require a lot of processing time.

I'm using Solr inside Tomcat.

This is the request that had no response from the server :

wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false

It involves highlighting on a multivalued field with more than 600 short
values inside. It takes 200 or 300 ms because of highlighting.

After restarting tomcat all went fine again.

I'm trying to understand why I had to restart tomcat and solr and what
should I do to have it working 7/7 24/24.

Xavier



 



   

Thanks for your response.
Today, I've increased the Tomcat JVM heap size from 128-256 to 
1024-2048. I will see if it helps.





No response from Solr on complex request after several days

2010-10-28 Thread Xavier Schepler

Hi,

We are in a beta testing phase, with several users a day.

After several days of waiting, the solr server didn't respond to 
requests that require a lot of processing time.


I'm using Solr inside Tomcat.

This is the request that had no response from the server :

wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false 



It involves highlighting on a multivalued field with more than 600 short 
values inside. It takes 200 or 300 ms because of highlighting.


After restarting tomcat all went fine again.

I'm trying to understand why I had to restart tomcat and solr and what 
should I do to have it working 7/7 24/24.


Xavier




No response from Solr on complex request (real issue explained)

2010-10-28 Thread Xavier Schepler

Hi,

We are in a beta testing phase, with several users a day.

After several days of running well, the solr server stopped responding 
to requests that require a lot of processing time, like this one :


wt=jsonomitHeader=trueq=qiAndMSwFR%3A%28transport%29q.op=ANDstart=0rows=5fl=id,domainId,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DdomainIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%2CdomainIds%7DstudyYearAndDescriptionIdfacet.sort=countf.studyDecade.facet.sort=lexspellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMFRspellcheck.q=transporthl=onhl.fl=qSwFR,iHLSwFR,mHLSwFRhl.fragsize=0hl.snippets=1hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false 



It involves highlighting on a multivalued field with more than 600 short 
values inside. Usually, it takes 200 or 300 ms.


I'm using Solr within Tomcat.
After restarting Tomcat all went fine again.

I'm trying to understand why I had to restart tomcat and what should I 
do to have it working 7/7 24/24.



Xavier



More like this and terms positions

2010-10-04 Thread Xavier Schepler

Hi,

does the more like this search uses terms positions information in the 
score formula ?


Re: More like this and terms positions

2010-10-04 Thread Xavier Schepler

On 04/10/2010 16:40, Robert Muir wrote:

On Mon, Oct 4, 2010 at 10:16 AM, Xavier Schepler
xavier.schep...@sciences-po.fr  wrote:

   

Hi,

does the more like this search uses terms positions information in the
score formula ?

 

no, it would be nice if it did use them though (based upon query terms),
seems like it would yield improvements.

http://sifaka.cs.uiuc.edu/~ylv2/pub/sigir10-prm.pdf

   

maybe in a next solr version ?


stopwords in AND clauses

2010-09-13 Thread Xavier Noria
Let's suppose we have a regular search field body_t, and an internal
boolean flag flag_t not exposed to the user.

I'd like

body_t:foo AND flag_t:true

to be an intersection, but if foo is a stopword I get all documents
for which flag_t is true, as if the first class was dropped, or if
technically all documents match an empty string.

Is there a way to get 0 results instead?


Re: stopwords in AND clauses

2010-09-13 Thread Xavier Noria
On Mon, Sep 13, 2010 at 4:29 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:

 On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria f...@hashref.com wrote:
 Let's suppose we have a regular search field body_t, and an internal
 boolean flag flag_t not exposed to the user.

 I'd like

    body_t:foo AND flag_t:true

 this is solr right? why don't you use filterquery for you unexposed
 flat_t field q=boty_t:foofq=flag_t:true
 this might help too: http://wiki.apache.org/solr/CommonQueryParameters#fq

Sounds good.


Phrase search + multi-word index time expanded synonym

2010-09-08 Thread Xavier Schepler

Hello,

well, first, here's the field type that is searched :

fieldtype name=SyFR class=solr.TextField
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
!-- Synonyms --
filter class=solr.SynonymFilterFactory synonyms=synonyms-fr.txt 
ignoreCase=true expand=true/

filter class=solr.LowerCaseFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
/fieldtype

here's the synonym from the synonyms-fr.txt file :

...
PS,Parti socialiste
...

and here's the query :

PS et.

It returns no result, whereas Parti socialiste et returns the results.

How can I have both queries working ? I'm thinking about different 
configurations but I didn't found any solution at the moment.

Thx for reading,

Xavier Schepler


Re: Phrase search + multi-word index time expanded synonym

2010-09-08 Thread Xavier Schepler

On 08/09/2010 12:21, Grijesh.singh wrote:

see the analysis.jsp with debug verbose and see what happens at index time
and search time during analysis with your data

Also u can use debugQuery=on for seeing what actually parsed query is.

-
Grijesh
   
I've found a first solution by myself, using the query analyzer, that 
works for couple of synonyms. I have to test it with rows of 3 or 4 
equivalents synonyms.

I used analysis.jsp.

The query time analyzer became :

analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms2-fr.txt 
ignoreCase=true expand=true/

filter class=solr.LowerCaseFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer

And the synonyms2-fr.txt contains :

PS = Parti socialiste

Thxs for your reply.


spellcheck distance measure algorithms error ?

2010-09-03 Thread Xavier Schepler

Hi,

When I take the two letters from the middle of a word and put the first 
in place of the second and the second in place of the first, ex : jospin 
= jopsin, I don't get any suggestion from the spellchecker component.


I tryed the default algorithm and the Jaro Winkler Distance, with a 
coeff of 0.5.


Errors like :
jospni instead of jospin
josipn instead of jospin
ojspin instead of jospin
...

 are successfully corrected.

But jopsin instead of jospin returns no suggestion and I wonder why.

Has anyone else encountered this error ?


Re: spellcheck distance measure algorithms error ?

2010-09-03 Thread Xavier Schepler

On 03/09/2010 15:31, Grant Ingersoll wrote:

On Sep 3, 2010, at 9:14 AM, Xavier Schepler wrote:

   

On 03/09/2010 14:47, Grant Ingersoll wrote:
 

On Sep 3, 2010, at 6:02 AM, Xavier Schepler wrote:
   

no, jopsin isn't in the index.
I tryed this with other words and I had the same error.
Thx for your reply.
 


And what happens if you drop the accuracy to 0?  Also, please share your 
relevant configuration (spell checker config) and the URL command you are using.
   


I lowered the accuracy to 0 and restarted the server but I had no extra 
suggestion.

Here are extracts from my configuration :


- solrconfig :

searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypeSC/str

!-- French spellcheckers : Start --

lst name=spellchecker
str name=nameqiAndMAndVlFR/str
str name=fieldqiAndMAndVlSCFR/str
str name=distanceMeasure
  org.apache.lucene.search.spell.JaroWinklerDistance
/str
str name=spellcheckIndexDir./spellchecker_qiAndMAndVlFR/str
str name=buildOnOptimizetrue/str
str name=accuracy0.6/str
/lst


- schema.xml SC type :

fieldtype name=SC class=solr.TextField positionIncrementGap=100 
stored=false multiValued=true

analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
/fieldtype


- schema.xml qiAndMAndVlFR field :

field name=qiAndMAndVlSCFR type=SC/

- url command :

wt=jsonomitHeader=trueq=qiAndMAndVlSyFR%3A%28jopsin%29q.op=ANDstart=0rows=5fl=id,solrLangCode,ddiFileId,studyDescriptionId,studyYearAndDescriptionId,nesstarServerId,studyNesstarId,variableId,questionId,variableNesstarId,concept,studyTitle,studyQuestionCount,hasMultipleItems,variableName,hasQuestionnaire,questionnaireUrl,studyDescriptionUrl,universe,notes,preQuestionText,postQuestionText,interviewerInstructions,questionPosition,vlFR,qFR,iFR,mFR,vlEN,qEN,iEN,mEN,sort=score%20descfq=solrLangCode%3AFRfq=solrLangCode%3AFRfq=solrLangCode%3AFRfq=solrLangCode%3AFRfacet=truefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%7DstudyDecadefacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%7DstudySerieIdfacet.field=%7B%21ex%3DstudySerieIds%2Cdecades%2CstudyIds%2CqueryFilters%2CconceptIds%7DstudyYearAndDescriptionIdfacet.sort=lex

spellcheck=truespellcheck.count=10spellcheck.dictionary=qiAndMAndVlFRspellcheck.q=jopsin

hl=onhl.fl=qFR,iFR,mFR,vlFRhl.fragsize=1hl.snippets=100hl.usePhraseHighlighter=truehl.highlightMultiTerm=truehl.simple.pre=%3Cb%3Ehl.simple.post=%3C%2Fb%3Ehl.mergeContiguous=false

Regards,

Xavier








Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

Hi,

can the highlighting component highlight terms only if the distance 
between them matches the query ?

I use those parameters :

hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMultiTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false


Re: Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

On 01/09/2010 12:38, Markus Jelsma wrote:

I think you need to enable usePhraseHighlighter in order to use the
highlightMultiTerm parameter.

  On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:
   

Hi,

can the highlighting component highlight terms only if the distance
between them matches the query ?
I use those parameters :

hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult
iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false

 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


   

yes, you're right.


Re: Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

On 01/09/2010 13:54, Xavier Schepler wrote:

On 01/09/2010 12:38, Markus Jelsma wrote:

I think you need to enable usePhraseHighlighter in order to use the
highlightMultiTerm parameter.

  On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:

Hi,

can the highlighting component highlight terms only if the distance
between them matches the query ?
I use those parameters :

hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult 

iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false 




Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



yes, you're right.


but it doesn't help for the other problem


Re: Highlighting, return the matched terms only

2010-08-31 Thread Xavier Schepler

Chris Hostetter wrote:

: how could I have the highlighting component return only the terms that were
: matched, without any surrounding text ?

I'm not a Highlighter expert, but this is somethign that certainly 
*sounds* like it should be easy.


I took a shot at it and this is hte best i could come up with...

http://localhost:8983/solr/select/?q=solrhl.simple.pre=%20hl.simple.post=%20fl=idhl=truehl.snippets=1000hl.fragmenter=regexhl.regex.pattern=^\S%2B%24hl.fragsize=1hl.regex.slop=1000.0

...however the fragments still wind up wider then it seems like they 
should based on the regex  slop.  I have no idea why.


I've seen enough people with this request htat it seems like there should 
be a built in fragmenter/formatter option for it in Solr, so i opened a 
feature request...


https://issues.apache.org/jira/browse/SOLR-2095

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!


  

Hi,

your solution is a little better than the one I'm using ATM.
Thanks.




Expanded Synonyms + phrase search

2010-08-30 Thread Xavier Schepler

Hi,

several documents from my index contain the phrase : PS et.
However, PS is expanded to parti socialiste and a phrase search for 
PS et fails.

A phrase search for parti socialiste et succeeds.

Can I have both queries working ?


Here's the field type :

   fieldtype name=SyFR class=solr.TextField
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StandardFilterFactory/
   !-- Synonyms --
   filter class=solr.SynonymFilterFactory 
synonyms=synonyms-fr.txt ignoreCase=true expand=true/

   filter class=solr.LowerCaseFilterFactory/
   charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StandardFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
   charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

 /analyzer
   /fieldtype


Highlighting, return the matched terms only

2010-08-03 Thread Xavier Schepler

Hi,

how could I have the highlighting component return only the terms that 
were matched, without any surrounding text ?


Re: Adding new elements to index

2010-07-07 Thread Xavier Rodriguez
Thanks for the quick reply!

In fact it was a typo, the 200 rows I got were from postgres. I tried to say
that the full-import was omitting the 100 oracle rows.

When I run the full import, I run it as a single job, using the url
command=full-import. I've tried to clear the index both using the clean
command and manually deleting it, but when I run the full-import, the number
of indexed documents are the documents coming from postgres.

To be sure that the id field is unique, i get the id by assigning a letter
before the id value. When indexed, the id looks like s_123, and that's the
id 123 for an entity identified as s. Other entities use different
prefixes, but never s.

I used DIH to index the data. My configuration is the folllowing:

File db-data-config.xml

 dataSource
type=JdbcDataSource
name=ds_ora
driver=oracle.jdbc.OracleDriver
url=jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:SID
user=user
password=password
/

 dataSource
type=JdbcDataSource
name=ds_pg
driver=org.postgresql.Driver
url=jdbc:postgresql://xxx.xxx.xxx.yyy:5432/sid
user=user
password=password
/

entity name=carrers dataSource=ds_ora query=select 's_'||id as
id_carrer,'a' as tooltip from imi_carrers
field column=id_carrer name=identificador /
field column=tooltip name=Nom /
/entity


entity name=hidrants dataSource=ds_pg query=select 'h_'||id as
id_hidrant, parc as tooltip from hidrants
field column=id_hidrant name=identificador /
field column=tooltip name=Nom /
 /entity

--

In that configuration, all the fields coming from ds_pg are indexed, and the
fields coming from ds_ora are not indexed. As I've said, the strange
behaviour for me is that no error is logged in tomcat, the number of
documents created is the number of rows returned by hidrants, while the
number of rows returned is the sum of the rows from hidrants and
carrers.

Thanks in advance.

Xavi.







On 7 July 2010 02:46, Erick Erickson erickerick...@gmail.com wrote:

 first do you have a unique key defined in your schema.xml? If you
 do, some of those 300 rows could be replacing earlier rows.

 You say:  if I have 200
 rows indexed from postgres and 100 rows from Oracle, the full-import
 process
 only indexes 200 documents from oracle, although it shows clearly that the
 query retruned 300 rows.

 Which really looks like a typo, if you have 100 rows from Oracle how
 did you get 200 rows from Oracle?

 Are you perhaps doing this in two different jobs and deleting the
 first import before running the second?

 And if this is irrelevant, could you provide more details like how you're
 indexing things (I'm assuming DIH, but you don't state that anywhere).
 If it *is* DIH, providing that configuration would help.

 Best
 Erick

 On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez xee...@gmail.com
 wrote:

  Hi,
 
  I have a SOLR installed on a Tomcat application server. This solr
 instance
  has some data indexed from a postgres database. Now I need to add some
  entities from an Oracle database. When I run the full-import command, the
  documents indexed are only documents from postgres. In fact, if I have
 200
  rows indexed from postgres and 100 rows from Oracle, the full-import
  process
  only indexes 200 documents from oracle, although it shows clearly that
 the
  query retruned 300 rows.
 
  I'm not doing a delta-import, simply a full import. I've tried to clean
 the
  index, reload the configuration, and manually remove
 dataimport.properties
  because it's the only metadata i found.  Is there any other file to check
  or
  modify just to get all 300 rows indexed?
 
  Of course, I tried to find one of that oracle fields, with no results.
 
  Thanks a lot,
 
  Xavier Rodriguez.
 



Adding new elements to index

2010-07-06 Thread Xavier Rodriguez
Hi,

I have a SOLR installed on a Tomcat application server. This solr instance
has some data indexed from a postgres database. Now I need to add some
entities from an Oracle database. When I run the full-import command, the
documents indexed are only documents from postgres. In fact, if I have 200
rows indexed from postgres and 100 rows from Oracle, the full-import process
only indexes 200 documents from oracle, although it shows clearly that the
query retruned 300 rows.

I'm not doing a delta-import, simply a full import. I've tried to clean the
index, reload the configuration, and manually remove dataimport.properties
because it's the only metadata i found.  Is there any other file to check or
modify just to get all 300 rows indexed?

Of course, I tried to find one of that oracle fields, with no results.

Thanks a lot,

Xavier Rodriguez.


Multi word synonyms + highlighting

2010-06-04 Thread Xavier Schepler

Hi,

Here's a field type using synonyms :

fieldtype name=SFR class=solr.TextField
analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.SynonymFilterFactory 
synonyms=french-synonyms.txt ignoreCase=true expand=true/

 filter class=solr.LowerCaseFilterFactory/
 charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

/analyzer
/fieldtype

Here are the contents of 'french-synonyms.txt' that I used for testing :

PC,parti communiste
PS,parti socialiste

When I query a field for the words : parti communiste, those things are 
highlighted :

parti communiste
parti socialiste
parti
PC
PS
communiste

Having parti socialiste highlighted is a problem.
I expected only parti communiste, parti, communiste and PC 
highlighted.


Is there a way to have things working like I expected ?

Here is the query I use :

wt=json
q=qAndMSFR%3A%28parti%20communiste%29
q.op=AND
start=0
rows=5
fl=id,studyId,questionFR,modalitiesFR,variableLabelFR,variableName,nesstarVariableId,lang,studyTitle,nesstarStudyId,CevipofConcept,studyQuestionCount,questionPosition,preQuestionText,
sort=score%20desc
facet=true
facet.field=CevipofConceptCode
facet.field=studyDateAndId
facet.sort=lex
spellcheck=true
spellcheck.collate=on
spellcheck.count=10
hl=on
hl.fl=questionSMFR,modalitiesSMFR,variableLabelSMFR
hl.fragsize=1
hl.snippets=100
hl.usePhraseHighlighter=true
hl.highlightMultiTerm=true
hl.simple.pre=%3Cb%3E
hl.simple.post=%3C%2Fb%3E



Targeting two fields with the same query or one field gathering contents from both ?

2010-05-17 Thread Xavier Schepler

Hey,

let's say  I have :

- a field named A with specific contents

- a field named B with specific contents

- a field named C witch contents only from A and B added with copyField.

Are those queries equivalents in terms of performance :

- A: (the lazy fox) AND B: (the lazy fox)
- C: (the lazy fox)

??

Thanks,

Xavier





Re: Targeting two fields with the same query or one field gathering contents from both ?

2010-05-17 Thread Xavier Schepler

Le 17/05/2010 16:57, Xavier Schepler a écrit :

Hey,

let's say  I have :

- a field named A with specific contents

- a field named B with specific contents

- a field named C witch contents only from A and B added with copyField.

Are those queries equivalents in terms of performance :

- A: (the lazy fox) AND B: (the lazy fox)
- C: (the lazy fox)

??

Thanks,

Xavier



I made some tests and it appears than the second query is much faster 
than the first ...




Re: Targeting two fields with the same query or one field gathering contents from both ?

2010-05-17 Thread Xavier Schepler

Le 17/05/2010 17:49, Marco Martinez a écrit :

No, the equivalent for this will be:

- A: (the lazy fox) *OR* B: (the lazy fox)
- C: (the lazy fox)


Imagine the situation that you dont have in B 'the lazy fox', with the AND
you get 0 results although you have 'the lazy fox' in A and C

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/17 Xavier Scheplerxavier.schep...@sciences-po.fr

   

Hey,

let's say  I have :

- a field named A with specific contents

- a field named B with specific contents

- a field named C witch contents only from A and B added with copyField.

Are those queries equivalents in terms of performance :

- A: (the lazy fox) AND B: (the lazy fox)
- C: (the lazy fox)

??

Thanks,

Xavier




 
   

yes you're right I figured it after posting.



What hardware do I need ?

2010-04-23 Thread Xavier Schepler

Hi,

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm using full text search in short strings (~ 30-100 terms) and 
facetted search.

My index will have 100 000 documents.

The number of requests per second will be low. Let's say between 0 and 
1000 because of auto-complete.


Is a standard server (3ghz proc, 4gb ram) with the client application 
(apache + php5 + ZF + apc) and Tomcat + Solr enough ???

Do I need more hardware ?

Thanks in advance,

Xavier S.




Re: What hardware do I need ?

2010-04-23 Thread Xavier Schepler

Le 23/04/2010 17:08, Otis Gospodnetic a écrit :

Xavier,

0-1000 QPS is a pretty wide range.  Plus, it depends on how good your 
auto-complete is, which depends on types of queries it issues, among other 
things.
100K short docs is small, so that will all fit in RAM nicely, assuming those 
other processes leave enough RAM for the OS to cache the index.

  That said, you do need more than 1 box if you want your auto-complete more 
fault tolerant.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
   

From: Xavier Scheplerxavier.schep...@sciences-po.fr
To: solr-user@lucene.apache.org
Sent: Fri, April 23, 2010 11:01:24 AM
Subject: What hardware do I need ?

Hi,
 

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm
   

using full text search in short strings (~ 30-100 terms) and facetted
search.
 

My index will have 100 000 documents.

The number of requests
   

per second will be low. Let's say between 0 and 1000 because of
auto-complete.
 

Is a standard server (3ghz proc, 4gb ram) with the client
   

application (apache + php5 + ZF + apc) and Tomcat + Solr enough ???
 

Do I need
   

more hardware ?
 

Thanks in advance,

Xavier S.
   

Well my auto-complete is built on the facet prefix search component.
I think that 100-700 requests per seconds is maybe a better approximation.



More like this - setting a minimum number of terms used to build queries

2010-03-29 Thread Xavier Schepler

Hey,

Is there a way to make the  more like this feature build its queries 
from a minimum number of interesting terms ?

It looks like this component fires query with only 1 term in them.
I got a lot of results that aren't similar at all with the parsed 
document fields.


My parameters :
mlt.fl=question,mlt.mintf=1mlt.mindf=mlt.minwl=4

The question field contains between 15 and 50 terms.

Xavier S.


Highlighting inside a field with HTML contents

2010-02-22 Thread Xavier Schepler

Hello,

this field would not be searched, but it would be used to display results.

A query could be :

q=tablehl=truehl.fl=htmlfieldhl.fragsize=0

It would be tokenized with the HTMLStripStandardTokenizerFactory, then 
analyzed the same way as the searcheable fields.


Could this result in highlighting inside HTML tags (I mean thinks like 
emtable/em.../emtable/em) ?


Re: Need feedback on solr security

2010-02-17 Thread Xavier Schepler

Vijayant Kumar wrote:

Hi Group,

I need some feedback on  solr security.

For Making by solr admin password protected,
 I had used the Path Based Authentication form
http://wiki.apache.org/solr/SolrSecurity.

In this way my admin area,search,delete,add to index is protected.But Now 
when I make solr authenticated then for every update/delete from the fornt

end is blocked without authentication.

I do not need this authentication from the front end so I simply pass the
username and password to the solr in my fornt end scripts and it is
working fine. I had done it in the below way.

http://username:passw...@localhost:8983/solr/admin/update
I need your suggestion and feed back on the above method.Is it fessiable
method and secure? TO over come from this issue is there any alternate
method?

Hey,

there is at least another solution. You can set a firewall rule that 
allow  connections to the Solr's port only from trusted IPs.




Re: Need feedback on solr security

2010-02-17 Thread Xavier Schepler

Vijayant Kumar wrote:

Hi Xavier,

Thanks for your feedback
the firewall rule for the trusted IP is not fessiable for us because the
application is open for public so we can not work through IP banning.
  

Vijayant Kumar wrote:


Hi Group,

I need some feedback on  solr security.

For Making by solr admin password protected,
 I had used the Path Based Authentication form
http://wiki.apache.org/solr/SolrSecurity.

In this way my admin area,search,delete,add to index is protected.But
Now
when I make solr authenticated then for every update/delete from the
fornt
end is blocked without authentication.

I do not need this authentication from the front end so I simply pass
the
username and password to the solr in my fornt end scripts and it is
working fine. I had done it in the below way.

http://username:passw...@localhost:8983/solr/admin/update
I need your suggestion and feed back on the above method.Is it fessiable
method and secure? TO over come from this issue is there any alternate
method?
  

Hey,

there is at least another solution. You can set a firewall rule that
allow  connections to the Solr's port only from trusted IPs.





  

Do your users connect directly to Solr ?
I mean, the firewall rule is for the solr client, i.e. the computer that 
host the application that connect to Solr.


Re: Need feedback on solr security

2010-02-17 Thread Xavier Schepler

Xavier Schepler wrote:

Vijayant Kumar wrote:

Hi Xavier,

Thanks for your feedback
the firewall rule for the trusted IP is not fessiable for us because the
application is open for public so we can not work through IP banning.
 

Vijayant Kumar wrote:
   

Hi Group,

I need some feedback on  solr security.

For Making by solr admin password protected,
 I had used the Path Based Authentication form
http://wiki.apache.org/solr/SolrSecurity.

In this way my admin area,search,delete,add to index is protected.But
Now
when I make solr authenticated then for every update/delete from the
fornt
end is blocked without authentication.

I do not need this authentication from the front end so I simply pass
the
username and password to the solr in my fornt end scripts and it is
working fine. I had done it in the below way.

http://username:passw...@localhost:8983/solr/admin/update
I need your suggestion and feed back on the above method.Is it 
fessiable

method and secure? TO over come from this issue is there any alternate
method?
  

Hey,

there is at least another solution. You can set a firewall rule that
allow  connections to the Solr's port only from trusted IPs.





  

Do your users connect directly to Solr ?
I mean, the firewall rule is for the solr client, i.e. the computer 
that host the application that connect to Solr.





You could set a firewall that forbid any connection to your Solr's 
server port to everyone, except the computer that host your application 
that connect to Solr.

So, only your application will be able to connect to Solr.

This idea comes from the book Solr 1.4 Entreprise Search Server.


Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Xavier Schepler

Shalin Shekhar Mangar a écrit :

On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler 
xavier.schep...@sciences-po.fr wrote:

  

Hey,

I'm thinking about using dynamic fields.

I need one or more user specific field in my schema, for example,
concept_user_*, and I will have maybe more than 200 users using this
feature.
One user will send and retrieve values from its field. It will then be used
to filter result.

How would it impact query performance ?




Can you give an example of such a query?

  

Hi,

it could be queries such as :

allFr: état-unis AND concept_researcher_99 = 303

modalitiesFr: exactement AND questionFr: correspond AND 
concept_researcher_2 = 101


and facetting like this :

q=%2A%3A%2Afl=variableXMLFr,langstart=0rows=10facet=truefacet.field=concept_researcher_2facet.field=studyDateAndStudyTitlefacet.sort=lex

Thanks in advance,

Xavier S.


Re: Dynamic fields with more than 100 fields inside

2010-02-09 Thread Xavier Schepler

Shalin Shekhar Mangar a écrit :

On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler 
xavier.schep...@sciences-po.fr wrote:

  

Shalin Shekhar Mangar a écrit :

 On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler 


xavier.schep...@sciences-po.fr wrote:



  

Hey,

I'm thinking about using dynamic fields.

I need one or more user specific field in my schema, for example,
concept_user_*, and I will have maybe more than 200 users using this
feature.
One user will send and retrieve values from its field. It will then be
used
to filter result.

How would it impact query performance ?






Can you give an example of such a query?



  

Hi,

it could be queries such as :

allFr: état-unis AND concept_researcher_99 = 303

modalitiesFr: exactement AND questionFr: correspond AND
concept_researcher_2 = 101

and facetting like this :


q=%2A%3A%2Afl=variableXMLFr,langstart=0rows=10facet=truefacet.field=concept_researcher_2facet.field=studyDateAndStudyTitlefacet.sort=lex




It doesn't impact query performance any more than filtering on other fields.
Is there a performance problem or were you just asking generally?

  

I was asking generally, thanks for your response.




Dynamic fields with more than 100 fields inside

2010-02-08 Thread Xavier Schepler

Hey,

I'm thinking about using dynamic fields.

I need one or more user specific field in my schema, 
for example, concept_user_*, and I will have maybe more than 200 users 
using this feature.
One user will send and retrieve values from its field. It will then be 
used to filter result.


How would it impact query performance ?

Thanks,

Xavier S.


Field highlighting

2010-01-07 Thread Xavier Schepler

Hi,

I'm trying to highlight short text values. The field they came from has 
a type shared with other fields. I have highlighting working on other 
fields but not on this one.

Why ?


Re: Field highlighting

2010-01-07 Thread Xavier Schepler

Erick Erickson a écrit :

It's really hard to provide any response with so little information,
could you show us the difference between a field that works
and one that doesn't? Especially the relevant schema.xml entries
and the query that fails to highlight

Erick

On Thu, Jan 7, 2010 at 7:47 AM, Xavier Schepler 
xavier.schep...@sciences-po.fr wrote:

  

Hi,

I'm trying to highlight short text values. The field they came from has a
type shared with other fields. I have highlighting working on other fields
but not on this one.
Why ?




  

Thanks for your response.
Here are some extracts from my schema.xml :

fieldtype name=textFr class=solr.TextField
 analyzer
   !-- suppression des mots vides de sens --
   filter class=solr.StopFilterFactory 
words=french-stopwords.txt ignoreCase=true/

   !-- decoupage en jetons --
   tokenizer class=solr.StandardTokenizerFactory/
   !-- suppression des accents --
   filter class=solr.ISOLatin1AccentFilterFactory/
   !-- suppression des points a la fin des accronymes --
   filter class=solr.StandardFilterFactory/
   !-- passage en miniscules --
   filter class=solr.LowerCaseFilterFactory/
   !-- lexemisation avec le filtre porter --
   filter class=solr.SnowballPorterFilterFactory language=French/
   !-- synonymes --
   filter class=solr.SynonymFilterFactory 
synonyms=test-synonyms.txt ignoreCase=true expand=true/

 /analyzer
   /fieldtype

Here's a field on which highlighting works :

field name=questionsLabelsFr
   required=false
   type=textFr
   multiValued=true
   indexed=true
   stored=true
   compressed=false
   omitNorms=false
   termVectors=true
   termPositions=true
   termOffsets=true
   /

Here's the field on which it doesn't :

  field name=modalitiesLabelsFr
   required=false
   type=textFr
   multiValued=true
   indexed=true
   stored=true
   compressed=false
   omitNorms=false
   termVectors=true
   termPositions=true
   termOffsets=true
   /

They are kinda the same.

But modalitiesLabelFr contains mostly short strings like :

Côtes-d Armor
Creuse
Dordogne
Doubs
Drôme
Eure
Eure-et-Loir
Finistère

When matches are found in them, I get a list like this, with no text :

lst name=highlighting
lst name=dbbd3642-db1d-4b35-9280-11582523903d/

lst name=f1d8be2d-1070-4111-b16e-94d16c8c0bc6/
/lst

The name attribute is the uid of the document.

I tryed several values for hl.fragsize (0, 1, 2, ...) with no success at 
all.