returning only certain fields from the docs - parsing on the server side

2013-07-25 Thread Matt Lieber
Hi,

I only want to return one field in the documents being returned from my query.
I know there is the 'fl' parameter, which is described in the documentation 
http://wiki.apache.org/solr/CommonQueryParameters as:

This parameter can be used to specify a set of fields to return, limiting the 
amount of information in the response. When returning the results to the 
client, only fields in this list will be included.

But seems like 'fl' works on the client side, after the results have been 
constructed on the server side, passing the whole docs back on the wire. Is my 
assumption wrong ?
Is there a way to filter things out directly on the Solr side, and return only 
the field that I desire to the client?

Thanks,
Matt










NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber
Hello Solr users,

Question regarding processing a lot of docs returned from a query; I
potentially have millions of documents returned back from a query. What is
the common design to deal with this ?

2 ideas I have are:
- create a client service that is multithreaded to handled this
- Use the Solr pagination to retrieve a batch of rows at a time (start,
rows in Solr Admin console )

Any other ideas that I may be missing ?

Thanks,
Matt









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Processing a lot of results in Solr

2013-07-23 Thread Matt Lieber
That sounds like a satisfactory solution for the time being -
I am assuming you dump the data from Solr in a csv format?
How did you implement the streaming processor ? (what tool did you use for
this? Not familiar with that)
You say it takes a few minutes only to dump the data - how long does it to
stream it back in, are performances acceptable (~ within minutes) ?

Thanks,
Matt

On 7/23/13 6:57 PM, Roman Chyla roman.ch...@gmail.com wrote:

Hello Matt,

You can consider writing a batch processing handler, which receives a
query
and instead of sending results back, it writes them into a file which is
then available for streaming (it has its own UUID). I am dumping many GBs
of data from solr in few minutes - your query + streaming writer can go
very long way :)

roman


On Tue, Jul 23, 2013 at 5:04 PM, Matt Lieber mlie...@impetus.com wrote:

 Hello Solr users,

 Question regarding processing a lot of docs returned from a query; I
 potentially have millions of documents returned back from a query. What
is
 the common design to deal with this ?

 2 ideas I have are:
 - create a client service that is multithreaded to handled this
 - Use the Solr pagination to retrieve a batch of rows at a time
(start,
 rows in Solr Admin console )

 Any other ideas that I may be missing ?

 Thanks,
 Matt


 






 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that
the
 communication is free of errors, virus, interception or interference.










NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Solr with Hadoop

2013-07-18 Thread Matt Lieber
Rajesh,

If you require to have an integration between Solr and Hadoop or NoSQL, I
would recommend using a commercial distribution. I think most are free to
use as long as you don't require support.
I inquired about the Cloudera Search capability, but it seems like that
far it is just preliminary: there is no tight integration yet between
Hbase and Solr, for example, other than full text search on the HDFS data
(I believe enabled in Hue). I am not too familiar with what MapR's M7 has
to offer.
However Datastax does a good job of tightly integrating Solr with
Cassandra, and lets you query over the data ingested from Solr in Hive for
example, which is pretty nice. Solr would not trigger Hadoop jobs, though.

Cheers,
Matt


On 7/17/13 7:37 PM, Rajesh Jain rjai...@gmail.com wrote:

I
 have a newbie question on integrating Solr with Hadoop.

There are some vendors like Cloudera/MapR who have announced Solr Search
for Hadoop.

If I use the Apache distro, how can I use Solr Search on docs in
HDFS/Hadoop

Is there a tutorial on how to use it or getting started.

I am using Flume to sink CSV docs into Hadoop/HDFS and I would like to use
Solr to provide Search.

Does Solr Search trigger MapReduce Jobs (like Splunk-Hunk) does?

Thanks,
Rajesh










NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: How to set a condition on the number of docs found

2013-07-12 Thread Matt Lieber
Thanks William, I'll do that.

Matt


On 7/12/13 7:38 AM, William Bell billnb...@gmail.com wrote:

Hmmm. One way is:

http://localhost:8983/solr/core/select/?q=*%3A*facet=truefacet.field=id;
facet.offset=10rows=0facet.limit=1http://hgsolr2devmstr:8983/solr/provi
dersearch/select/?q=*%3A*facet=truefacet.field=cityfacet.offset=10rows
=0facet.limit=1

If you have a result you have results  10.

Another way is to just look at it wth a facet.query and have your app deal
with it.

http:/localhost:8983/solr/core/select/?q=*%3A*facet=truefacet.query={!lu
cene%20key=numberofresults}state:COrows=0http://hgsolr2devmstr:8983/solr
/providersearch/select/?q=*%3A*facet=truefacet.query={!lucene%20key=numb
erofresults}state:COrows=0




On Thu, Jul 11, 2013 at 11:45 PM, Matt Lieber mlie...@impetus.com wrote:

 Hello there,

 I would like to be able to know whether I got over a certain threshold
of
 doc results.

 I.e. Test (Result.numFound  10 ) - true.

 Is there a way to do this ? I can't seem to find how to do this; (other
 than have to do this test on the client app, which is not great).

 Thanks,
 Matt


 






 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that
the
 communication is free of errors, virus, interception or interference.




--
Bill Bell
billnb...@gmail.com
cell 720-256-8076









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


How to set a condition over stats result

2013-07-11 Thread Matt Lieber

Hello,

I am trying to see how I can test the sum of values of an attribute across
docs.
I.e. Whether sum(myfieldvalue)100 .

I know I can use the stats module which compiles the sum of my attributes
on a certain facet , but how can I perform a test this result (i.e. Is
sum100) within my stats query? From what I read, it's not supported yet
to perform a function on the stats module..
Any other way to do this ?

Cheers,
Matt












NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


How to set a condition on the number of docs found

2013-07-11 Thread Matt Lieber
Hello there,

I would like to be able to know whether I got over a certain threshold of
doc results.

I.e. Test (Result.numFound  10 ) - true.

Is there a way to do this ? I can't seem to find how to do this; (other
than have to do this test on the client app, which is not great).

Thanks,
Matt









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.