Getting hits in RequestHandler

2009-06-29 Thread pof

Hi, I am writing my own request handler and I was wondering how I go about
get a list of hits back. Thanks.
-- 
View this message in context: 
http://www.nabble.com/Getting-hits-in-RequestHandler-tp24248810p24248810.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: nested dismax queries

2009-06-29 Thread Michael Ludwig

Ensdorf Ken schrieb:


For exmaple, a user might enter Alabama Biotechnology in the main
search box, triggering a dismax request which returns lots of
different types of results.  They may then want to refine their search
by selecting a specific industry from a drop-down box.  We handle this
by adding a filterquery (fq=) to the original query.  We have dozens
of additional fields like this - some with a finite set of discrete
values, some with arbitrary text values.  The combinations are
infinite, and I'm worried we will overwhelm the filterCache by
supporting all of these cases as filter queries.


Filter queries with arbitrary text values may swamp the cache in 1.3.

Otherwise, the combinations aren't infinite. Keep the filters seperate
in order to limit their number. Specify two simple filters instead of
one composite filter, fq=x:bla and fq=y:blub instead of fq=x:bla
AND y:blub. See:

filterCache/@size, queryResultCache/@size, documentCache/@size
http://markmail.org/thread/tb6aanicpt43okcm

Michael Ludwig


Re: plans for switching to maven2 (after 1.4 release)?

2009-06-29 Thread Grant Ingersoll
I'm not particularly opposed to it, but I'm not exactly for it  
either.  I very much have a love hate relationship with Maven.  The  
simple things work fine w/ Maven and the power of pointing Eclipse or  
IntelliJ at a POM file and having the whole project imported and ready  
to work on w/o one iota of setup is something that the proponents of  
Ant just don't get, especially when it comes to multiple module builds  
like Solr and Lucene have.That being said, there are a lot of  
headaches with Maven, number one being releases, number two being  
anything custom and number three being the constant instability of the  
magic happening behind the scenes with it upgrading dependencies, etc.  
automatically.  Finally, I've always had a hard time getting help in  
Maven land.  It always seemed to me the number of incoming questions  
outweighed the number of answers about 10 to 1.


I converted Mahout to Maven and it was a pain.  I also use Maven for  
personal development as well.  It is much easier to start fresh on  
Maven than it is to add it in later.  And, there is something to be  
said for the Maven Ant plugin, but even that is clunky.


In the end, I think I'd be +0 on it.  It's also come up in the past on  
the lists and there never is a clear consensus.


-Grant

On Jun 28, 2009, at 12:33 PM, aldana wrote:



hi,

are there plans to migrate from ant to maven2? maybe not for the  
current
trunk (mainline for 1.4), but maybe for the trunk after releasing  
solr 1.4.

it makes the build more standard and easier to import to IDEs.

-
manuel aldana
aldana((at))gmx.de
software-engineering blog: http://www.aldana-online.de
--
View this message in context: 
http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: plans for switching to maven2 (after 1.4 release)?

2009-06-29 Thread Erik Hatcher
I'll weigh in and throw a -1 to a Maven-only build system for Solr.   
If there is still a functioning Ant build, but Mavenites have a  
parallel setup, that's fine by me and I'd be -0 on that.


These days, Buildr has my attention as a way to get the best of all  
worlds: access to Ant's powerful task library, POM/repo handling, AND  
Ruby :)


Erik

On Jun 29, 2009, at 9:01 AM, Grant Ingersoll wrote:

I'm not particularly opposed to it, but I'm not exactly for it  
either.  I very much have a love hate relationship with Maven.  The  
simple things work fine w/ Maven and the power of pointing Eclipse  
or IntelliJ at a POM file and having the whole project imported and  
ready to work on w/o one iota of setup is something that the  
proponents of Ant just don't get, especially when it comes to  
multiple module builds like Solr and Lucene have.That being  
said, there are a lot of headaches with Maven, number one being  
releases, number two being anything custom and number three being  
the constant instability of the magic happening behind the scenes  
with it upgrading dependencies, etc. automatically.  Finally, I've  
always had a hard time getting help in Maven land.  It always seemed  
to me the number of incoming questions outweighed the number of  
answers about 10 to 1.


I converted Mahout to Maven and it was a pain.  I also use Maven for  
personal development as well.  It is much easier to start fresh on  
Maven than it is to add it in later.  And, there is something to be  
said for the Maven Ant plugin, but even that is clunky.


In the end, I think I'd be +0 on it.  It's also come up in the past  
on the lists and there never is a clear consensus.


-Grant

On Jun 28, 2009, at 12:33 PM, aldana wrote:



hi,

are there plans to migrate from ant to maven2? maybe not for the  
current
trunk (mainline for 1.4), but maybe for the trunk after releasing  
solr 1.4.

it makes the build more standard and easier to import to IDEs.

-
manuel aldana
aldana((at))gmx.de
software-engineering blog: http://www.aldana-online.de
--
View this message in context: 
http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html
Sent from the Solr - User mailing list archive at Nabble.com.







Re: facets: case and accent insensitive sort

2009-06-29 Thread Sébastien Lamy

Thanks for your reply. I will have a look at this.

Peter Wolanin a écrit :

Seems like this might be approached using a Lucene payload?  For
example where the original string is stored as the payload and
available in the returned facets for display purposes?

Payloads are byte arrays stored with Terms on Fields. See
https://issues.apache.org/jira/browse/LUCENE-755

Solr seems to have support for a few example payloads already like
NumericPayloadTokenFilter

Almost any way you approach this it seems like there are potentially
problems since you might have multiple combinations of case and accent
mapping to the same case-less accent-less value that you want to use
for sorting (and I assume for counting) your facets?

-Peter

On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamylamys...@free.fr wrote:
  

Shalin Shekhar Mangar a écrit :


On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr wrote:


  

If I use a copyField to store into a string type, and facet on that, my
problem remains:
The facets are sorted case and accent sensitive. And I want an
*insensitive* sort.
If I use a copyField to store into a type with no accents and case (e.g
alphaOnlySort), then solr return me facet values with no accents and no
case. And I want the facet values returned by solr to *have accents and
case*.



Ah, of course you are right. There is no way to do this right now except
at
the client side.

  

Thank you for your response.
Would it be easy to modify Solr to behave like I want. Where should I start
to investigate?






  




Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-06-29 Thread Ben

Hello,

I've been using SOLR for a while now, but am stuck for information on 
two issues :


1) Is it possible to exclude characters in a SOLR facet wildcard query?
e.g.
[^,]* to match any character except an ,  ?

2) Can one setup the facet wildcard query to return the exact sub 
strings it matched of the queried facet, rather than the whole string?


I hope somebody can help :)

Thanks,

Ben



Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-06-29 Thread Erik Hatcher

Ben,

Could you post an example of the type of data you're dealing with and  
how you want it handled?   I suspect there is a way to accomplish what  
you want using an analyzed field, or by preprocessing the data you're  
indexing.


Erik

On Jun 29, 2009, at 9:29 AM, Ben wrote:


Hello,

I've been using SOLR for a while now, but am stuck for information  
on two issues :


1) Is it possible to exclude characters in a SOLR facet wildcard  
query?

e.g.
[^,]* to match any character except an ,  ?

2) Can one setup the facet wildcard query to return the exact sub  
strings it matched of the queried facet, rather than the whole string?


I hope somebody can help :)

Thanks,

Ben




Re: plans for switching to maven2 (after 1.4 release)?

2009-06-29 Thread Grant Ingersoll


On Jun 29, 2009, at 9:01 AM, Grant Ingersoll wrote:



I converted Mahout to Maven and it was a pain.


I'd add, however, that now that it is done, it is fine, except of  
course, that Maven 2.1.0 doesn't work with it apparently because of  
upgrades.


RE: nested dismax queries

2009-06-29 Thread Ensdorf Ken
 Filter queries with arbitrary text values may swamp the cache in 1.3.

Are you implying this won't happen in 1.4?  Can you point me to the feature 
that would mitigate this?


 Otherwise, the combinations aren't infinite. Keep the filters seperate
 in order to limit their number. Specify two simple filters instead of
 one composite filter, fq=x:bla and fq=y:blub instead of fq=x:bla
 AND y:blub. See:

 filterCache/@size, queryResultCache/@size, documentCache/@size
 http://markmail.org/thread/tb6aanicpt43okcm

 Michael Ludwig

That's what I was thinking would make the most sense, assuming the intersection 
of the cached bitmaps is efficient enough.  Thanks for the reply.

-Ken


Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-06-29 Thread Ben

Hi Erik,

I'm not sure exactly how much context you need here, so I'll try to keep 
it short and expand as needed.


The column I am faceting contains a comma deliniated set of vectors. 
Each vector is made up of {Make,Year,Model} e.g. 
_ford_1996_focus,mercedes_1996_clk,ford_2000_focus


I have a custom request handler, where if I want to find all the cars 
from 1996 I pass in a facet query for the Year (1996) which is 
transformed to a wildcard facet query :


_*_1996_*

In otherwords, it'll match any records whose vector column contains a 
string, which somewhere has a car from 1996.


Why not put the Make, Year and Model in separate columns and do a facet 
query of multiple columns?... because once we've selected 1996, we 
should (in the above example) then be offering ford and mercedes as 
further facet choices, and nothing more. If the parts were in their own 
columns, there would be no way to tie the Makes and Models to specific 
years, for example.


At anyrate, the wildcard search returns the entire match 
(_ford_1996_focus,mercedes_1996_clk,ford_2000_focus). I then have to do 
another RegExp over it to extract only the two parts (the first ford and 
mercedes) that were from 1996. This isn't using SOLR's cache very 
effectively.


It would be excellent if SOLR could break up that comma separated list 
into three different parts, and run the RegExp over each , returning 
only those which match. Is that what you're implying with Analysis? If 
that were the case, I'd not need to worry about character exclusion.


Sorry if that's a bit fuzzy... it's hard trying to explain enough to be 
useful, but not too much that it turns into an essay!!!


Thanks,
Ben

The solution I'm using is to form a vector

Erik Hatcher wrote:

Ben,

Could you post an example of the type of data you're dealing with and 
how you want it handled?   I suspect there is a way to accomplish what 
you want using an analyzed field, or by preprocessing the data you're 
indexing.


Erik

On Jun 29, 2009, at 9:29 AM, Ben wrote:


Hello,

I've been using SOLR for a while now, but am stuck for information on 
two issues :


1) Is it possible to exclude characters in a SOLR facet wildcard query?
e.g.
[^,]* to match any character except an ,  ?

2) Can one setup the facet wildcard query to return the exact sub 
strings it matched of the queried facet, rather than the whole string?


I hope somebody can help :)

Thanks,

Ben






Re: nested dismax queries

2009-06-29 Thread Michael Ludwig

Ensdorf Ken schrieb:

Filter queries with arbitrary text values may swamp the cache in 1.3.


Are you implying this won't happen in 1.4?


I intended to say just this, but I was on the wrong track.


Can you point me to the feature that would mitigate this?


What I was thinking of is the following:

[#SOLR-475] multi-valued faceting via un-inverted field
https://issues.apache.org/jira/browse/SOLR-475

But as you can see, this refers to faceting on multi-valued fields, not
to filter queries with arbitrary text. I was off on a tangent. Sorry.

To get back to your initial mail, I tend to think that drop-down boxes
(the values of which you control) are a nice match for the filter query,
whereas user-entered text is more likely to be a candidate for the main
query.

Michael Ludwig


RE: nested dismax queries

2009-06-29 Thread Ensdorf Ken


  Filter queries with arbitrary text values may swamp the cache in
 1.3.
 
  Are you implying this won't happen in 1.4?

 I intended to say just this, but I was on the wrong track.

  Can you point me to the feature that would mitigate this?

 What I was thinking of is the following:

 [#SOLR-475] multi-valued faceting via un-inverted field
 https://issues.apache.org/jira/browse/SOLR-475

 But as you can see, this refers to faceting on multi-valued fields, not
 to filter queries with arbitrary text. I was off on a tangent. Sorry.

 To get back to your initial mail, I tend to think that drop-down boxes
 (the values of which you control) are a nice match for the filter
 query,
 whereas user-entered text is more likely to be a candidate for the main
 query.

 Michael Ludwig

I agree, which brings me back tot the issue of combining dismax with standard 
queries.  It looks like we may need to create a custom query parser to get 
optimal performance.  Thanks again.




Entire heap consumed to answer initial ping()

2009-06-29 Thread Phillip Farber


Jconsole shows the entire 2.1g heap consumed on the first request (a 
simple ping) to Solr after a Tomcat restart.


After a Tomcat restart:
13140 tomcatvirtual=2255m resident=183m ... jsvc

After the ping():
13140 tomcatvirtual=2255m resident=2.0g ... jsvc

Jconsole says my Tenured Gen heap is at 100%.

08:06:02
Lucene Implementation Version: 2.9-dev 719313 - 2008-11-20 23:51:24
Java:
JAVA_OPTS=-Xmx2048M -Xms2048M -XX:MaxPermSize=128M -Xshare:off
i.e. about 2.1g

 I have several solr test instances under this tomcat.  When one gets 
all the heap after restart, I can't even open the admin interface to the 
others.


Can someone advise me as to whether this is a Tomcat issue or a Solr 
issue?  And an approach to fixing this?


Thanks!

Phil Farber




Re: plans for switching to maven2 (after 1.4 release)?

2009-06-29 Thread manuel aldana
I know migrating to maven2 has its pain points but in my view is worth 
it if one sees it as a long run investment. It follows 
standards/conventions and importing projects to IDEs like eclipse or 
IntelliJ is much more straightforward. When using maven  getting used to 
a new project using it is also much quicker as grasping propriertary 
builds reinventing the wheel.


After having used maven2 for three years now I really couldn't live with 
it (though in the beginning when migrating builds I was swearing at its 
evil details). Support (documentation + mailing-list) has also greatly 
improved since then.


Because smooth migration is not that easy, one should maybe take the cut 
after release 1.4 or 1.5? Though I am not so much into codebase history 
would like to help out.



Grant Ingersoll schrieb:
I'm not particularly opposed to it, but I'm not exactly for it 
either.  I very much have a love hate relationship with Maven.  The 
simple things work fine w/ Maven and the power of pointing Eclipse or 
IntelliJ at a POM file and having the whole project imported and ready 
to work on w/o one iota of setup is something that the proponents of 
Ant just don't get, especially when it comes to multiple module builds 
like Solr and Lucene have.That being said, there are a lot of 
headaches with Maven, number one being releases, number two being 
anything custom and number three being the constant instability of the 
magic happening behind the scenes with it upgrading dependencies, etc. 
automatically.  Finally, I've always had a hard time getting help in 
Maven land.  It always seemed to me the number of incoming questions 
outweighed the number of answers about 10 to 1.


I converted Mahout to Maven and it was a pain.  I also use Maven for 
personal development as well.  It is much easier to start fresh on 
Maven than it is to add it in later.  And, there is something to be 
said for the Maven Ant plugin, but even that is clunky.


In the end, I think I'd be +0 on it.  It's also come up in the past on 
the lists and there never is a clear consensus.


-Grant

On Jun 28, 2009, at 12:33 PM, aldana wrote:



hi,

are there plans to migrate from ant to maven2? maybe not for the current
trunk (mainline for 1.4), but maybe for the trunk after releasing 
solr 1.4.

it makes the build more standard and easier to import to IDEs.

-
manuel aldana
aldana((at))gmx.de
software-engineering blog: http://www.aldana-online.de
--
View this message in context: 
http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html 


Sent from the Solr - User mailing list archive at Nabble.com.






--
manuel aldana
ald...@gmx.de
software-engineering blog: http://www.aldana-online.de



RE: plans for switching to maven2 (after 1.4 release)?

2009-06-29 Thread Smiley, David W.
FWIW
I strongly agree with your sentiments, Manual.
One of the neat maven features that isn't well known is just being able to do 
mvn jetty:run and have Jetty load up right away (no creating of a web-app 
directory or packaging of a war or anything like that).
What I hate about ant based projects is that each ant file is yet another build 
script to figure out.  That and dealing with .jar's of course.

Yeah, maven can be annoying at times.

~ David Smiley

From: manuel aldana [ald...@gmx.de]
Sent: Monday, June 29, 2009 5:36 PM
To: solr-user@lucene.apache.org
Subject: Re: plans for switching to maven2 (after 1.4 release)?

I know migrating to maven2 has its pain points but in my view is worth
it if one sees it as a long run investment. It follows
standards/conventions and importing projects to IDEs like eclipse or
IntelliJ is much more straightforward. When using maven  getting used to
a new project using it is also much quicker as grasping propriertary
builds reinventing the wheel.

After having used maven2 for three years now I really couldn't live with
it (though in the beginning when migrating builds I was swearing at its
evil details). Support (documentation + mailing-list) has also greatly
improved since then.

Because smooth migration is not that easy, one should maybe take the cut
after release 1.4 or 1.5? Though I am not so much into codebase history
would like to help out.


Grant Ingersoll schrieb:
 I'm not particularly opposed to it, but I'm not exactly for it
 either.  I very much have a love hate relationship with Maven.  The
 simple things work fine w/ Maven and the power of pointing Eclipse or
 IntelliJ at a POM file and having the whole project imported and ready
 to work on w/o one iota of setup is something that the proponents of
 Ant just don't get, especially when it comes to multiple module builds
 like Solr and Lucene have.That being said, there are a lot of
 headaches with Maven, number one being releases, number two being
 anything custom and number three being the constant instability of the
 magic happening behind the scenes with it upgrading dependencies, etc.
 automatically.  Finally, I've always had a hard time getting help in
 Maven land.  It always seemed to me the number of incoming questions
 outweighed the number of answers about 10 to 1.

 I converted Mahout to Maven and it was a pain.  I also use Maven for
 personal development as well.  It is much easier to start fresh on
 Maven than it is to add it in later.  And, there is something to be
 said for the Maven Ant plugin, but even that is clunky.

 In the end, I think I'd be +0 on it.  It's also come up in the past on
 the lists and there never is a clear consensus.

 -Grant

 On Jun 28, 2009, at 12:33 PM, aldana wrote:


 hi,

 are there plans to migrate from ant to maven2? maybe not for the current
 trunk (mainline for 1.4), but maybe for the trunk after releasing
 solr 1.4.
 it makes the build more standard and easier to import to IDEs.

 -
 manuel aldana
 aldana((at))gmx.de
 software-engineering blog: http://www.aldana-online.de
 --
 View this message in context:
 http://www.nabble.com/plans-for-switching-to-maven2-%28after-1.4-release%29--tp24243036p24243036.html

 Sent from the Solr - User mailing list archive at Nabble.com.




--
 manuel aldana
 ald...@gmx.de
 software-engineering blog: http://www.aldana-online.de

Re: Reverse querying

2009-06-29 Thread AlexElba

Any other suggestion this suggestion doesn't loook to work 



AlexElba wrote:
 
 
 Otis Gospodnetic wrote:
 
 
 Alex  Oleg,
 
 Look at MemoryIndex in Lucene's contrib.  It's the closest thing to what
 you are looking for.  What you are describing is sometimes referred to as
 prospective search, sometimes saved searches, and a few other names.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: AlexElba ramal...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 24, 2009 7:47:20 PM
 Subject: Reverse querying
 
 
 Hello,
 
 I have problem which I am trying to solve using solr.
 
 I have search text (term) and I have index full of words which are
 mapped to
 ids.
 
 Is there any query that I can run to do this?
 
 Example:
 
 Term
 3) A recommendation to use VAR=value in the configure command line will
  not work with some 'configure' scripts that comply to GNU standards
  but are not generated by autoconf. 
 
 Index docs
 
 id:1 name:recommendation 
 ...
 id:3 name:GNU
 id:4 name food
 
 after running query I want to get as results 1 and 3 
 
 Thanks
 
 -- 
 View this message in context: 
 http://www.nabble.com/Reverse-querying-tp24194777p24194777.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 Hello,
 I looked into this MemoryIndex, there search is returning only score.
 Which will mean is it here or not
 
 I build test method base on example
 
 Term:
 
 On my last night in the Silicon Valley area, I decided to head up the east
 side of San Francisco Bay to 
 visit Vito’s Pizzeria located in Newark, California.  I have to say it was
 excellent!  
 I met the owner (Vito!) and after eating a couple slices I introduced
 myself.  
 I was happy to know he was familiar with the New York Pizza Blog and the
 New York Pizza Finder directory.   
 Once we got to talking he decided I NEEDED to try some bread sticks and
 home-made marinara 
 sauce and they were muy delicioso.  I finished off my late night snack
 with a meatball dipped in the same marinara.
 
 
 Data {Silicon Valley, New York, Chicago}
 
 
   public static void find(String term, SetString data) throws Exception 
 {
 
   Analyzer analyzer = PatternAnalyzer.EXTENDED_ANALYZER;
   MemoryIndex index = new MemoryIndex();
   int i = 0;
   for (String str : data) {
   index.addField(bn + i, str, analyzer);
   i++;
   }
   QueryParser parser = new QueryParser(bn*, analyzer);
   Query query = parser.parse(URLEncoder.encode(term, UTF-8));
   float score = index.search(query);
   if (score  0.0f) {
   System.out.println(it's a match);
   } else {
   System.out.println(no match found);
   }
   // System.out.println(indexData= + index.toString());
 
   }
 
 no match found 
 
 
 What am I doing wrong?
 
 Thanks,
 Alex
 

-- 
View this message in context: 
http://www.nabble.com/Reverse-querying-tp24194777p24261892.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Entire heap consumed to answer initial ping()

2009-06-29 Thread Otis Gospodnetic

Hello,

If Solr is your only webapp in that container, than this is probably a Solr 
issue. Note that Solr issue could also mean issue with your ping query.  
Perhaps you can provide some more information about the size of your index, 
number of docs, your ping query, including the relevant piece of the config in 
the email, and such.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Phillip Farber pfar...@umich.edu
 To: solr-user solr-user@lucene.apache.org
 Sent: Monday, June 29, 2009 4:20:26 PM
 Subject: Entire heap consumed to answer initial ping()
 
 
 Jconsole shows the entire 2.1g heap consumed on the first request (a simple 
 ping) to Solr after a Tomcat restart.
 
 After a Tomcat restart:
 13140 tomcatvirtual=2255m resident=183m ... jsvc
 
 After the ping():
 13140 tomcatvirtual=2255m resident=2.0g ... jsvc
 
 Jconsole says my Tenured Gen heap is at 100%.
 
 08:06:02
 Lucene Implementation Version: 2.9-dev 719313 - 2008-11-20 23:51:24
 Java:
 JAVA_OPTS=-Xmx2048M -Xms2048M -XX:MaxPermSize=128M -Xshare:off
 i.e. about 2.1g
 
 I have several solr test instances under this tomcat.  When one gets all the 
 heap after restart, I can't even open the admin interface to the others.
 
 Can someone advise me as to whether this is a Tomcat issue or a Solr issue?  
 And 
 an approach to fixing this?
 
 Thanks!
 
 Phil Farber