How to read bag of words from Lucene index

2016-05-28 Thread vitaly bulgakov
I use Lucene 3.

How can I read content as a bag of words or similar which was indexed from a
text file?

The indexing is done in the following way:
addFiles(new File(fileName));
int originalNumDocs = writer.numDocs();
for (File f : queue) {
FileReader fr = null;
try {
Document doc = new Document();
fr = new FileReader(f);
doc.add(new Field("content", fr));  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-read-bag-of-words-from-Lucene-index-tp4279679.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Shard console shows roughly same number of documents?

2016-05-28 Thread Siddhartha Singh Sandhu
Still struggling with this. Bump. :)

On Thu, May 26, 2016 at 3:53 PM, Siddhartha Singh Sandhu <
sandhus...@gmail.com> wrote:

> Hi Erick,
>
> Thank you for the reply. What I meant was suppose I have the config:
>
> 2 shards each with 1 replica.
>
> Hence, on both servers I have
> 1.  shard1_replica1
> 2 . shard2_replica1
>
> Suppose I have 50 documents then,
> shard1_replica1 + shard2_replica1 = 50 ?
>
> or shard2_replica1 = 50 && shard1_replica1 = 50 ?
>
> Regards,
>
> Sid.
>
> On Thu, May 26, 2016 at 2:30 PM, Erick Erickson 
> wrote:
>
>> Q1: Not quite sure what you mean. Let's say I have 2 shards, 3
>> replicas each 16 docs on each.I _think_ you're
>> talking about the "core selector", which shows the docs on that
>> particular core, 16 in our case not 48.
>>
>> Q2: Yes, that's how SolrCloud is designed. It has to be for HA/DR.
>> Every replica in a shard has all the docs, 16 as above. Otherwise if
>> one of your machines went down there could be no guarantee even
>> attempted about there not being data loss.
>>
>> Q3: Yes, indexing will be slower when there is more than one replica
>> per shard since the raw document is forwarded from the leader to all
>> followers before acking back. In distributed situations, you will have
>> a bunch (potentially) more machines doing indexing so total throughput
>> can be faster.
>>
>> Why do you care? Is there a problem or is this just general background
>> info? There are a number of techniques for speeding up indexing, the
>> first is to use SolrJ and CloudSolrClient and send batches of docs at
>> once rather than one-at-a-time.
>>
>> Best,
>> Erick
>>
>> On Wed, May 25, 2016 at 1:54 PM, Siddhartha Singh Sandhu
>>  wrote:
>> > Hi,
>> >
>> > I recently moved to a SolrCloud config. I had a few questions:
>> >
>> > Q1. Does a shard show cumulative number of documents or documents
>> present
>> > in that particular shard on the admin console of respective shard?
>> >
>> > Q2. If 1's answer is non-cumulative then my shards(on different servers)
>> > are indexing all the documents on each instance of shard. Is this
>> natural?
>> > I created the shards with compositeId.
>> >
>> > Q3. If the answer to 1 is cumulative then my indexing was slower then a
>> > single core instance which was on the same machine of which I have 2
>> >  now(my shards). What could I be missing while configuring Solr?
>> >
>> >
>> > I am using Solr 6.0.0 on Ubuntu 14.04 with external zookeeper.
>> >
>> > Regards,
>> >
>> > Sid.
>>
>
>


Re: Solr vs JDBC driver

2016-05-28 Thread Joel Bernstein
The driver is included in /META-INF/services/java.sql.Driver. So if you're
using JDBC 4.0, the driver should be autoloaded.

What version of java are you running?

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 27, 2016 at 8:16 PM, Vachon, Jean-Sébastien <
jvac...@cebglobal.com> wrote:

> Never mindŠ I had to load the class just like any database driver:
>
>
> Class.forName("org.apache.solr.client.solrj.io.sql.DriverImpl").newInstance
> ();
>
>
>
>
> On 2016-05-27, 2:59 PM, "Vachon, Jean-Sébastien" 
> wrote:
>
> >Hi All,
> >
> >
> >
> >I am trying to use Solr¹s JDBC driver in Java and I¹m stuck with the
> >following error message:
> >
> >
> >
> >
> >
> >14:52:37,802 ERROR [consoleLogger] java.sql.SQLException: No suitable
> >driver found for jdbc:solr://10.28.213.133:2181/solr?collection=Current
> >
> >
> >
> >My pom.xml contains:
> >
> >
> >
> >
> >
> >
> >
> >org.apache.solr
> >
> >
> >
> >solr-solrj
> >
> >
> >
> >6.0.0
> >
> >
> >
> >
> >
> >
> >
> >I looked at different posts:
> >
> >
> >
> >Yonnik¹s:
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__yonik.com_solr-2D6_=
> >CwIGaQ=zzHkMf6HMoOvCB4yTPe0Gg=oMPffnCI8igMuHU_-rBzYXM4_YN0UQILws5LxiHl
> >0UMSHcx1HOXvooqVgod85DbS=DgBFXI6SnwLs-KZ4iYaH6oaILBnR6DSJHIloMLKIrp8=-
> >bVufG7EPgmW-V_ya5J9YMQDMKwuR14YORhwW6IAU2o=
> >
> >Sematext:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sematext.com_blog_201
> >6_04_26_solr-2D6-2Das-2Djdbc-2Ddata-2Dsource_=CwIGaQ=zzHkMf6HMoOvCB4yT
> >Pe0Gg=oMPffnCI8igMuHU_-rBzYXM4_YN0UQILws5LxiHl0UMSHcx1HOXvooqVgod85DbS
> >=DgBFXI6SnwLs-KZ4iYaH6oaILBnR6DSJHIloMLKIrp8=-KJ2iAt0odQ4BrkKaxc-TgJ0wkL
> >l7vTOWmYbSmnpVYM=
> >
> >
> >
> >And I seem to meet all the requirements
> >
> >
> >
> >Any idea on what I¹m doing wrong?
> >
> >
> >
> >Thanks
> >
> >
> >
> >
> >
> >
> >
> >CEB Canada Inc. Registration No: 1781071. Registered office: 199 Bay
> >Street Commerce Court West, # 2800, Toronto, Ontario, Canada, M5L 1AP.
> >
> >
> >
> >
> >
> >
> >
> >This e-mail and/or its attachments are intended only for the use of the
> >addressee(s) and may contain confidential and legally privileged
> >information belonging to CEB and/or its subsidiaries, including SHL. If
> >you have received this e-mail in error, please notify the sender and
> >immediately, destroy all copies of this email and its attachments. The
> >publication, copying, in whole or in part, or use or dissemination in any
> >other way of this e-mail and attachments by anyone other than the
> >intended person(s) is prohibited.
> >
> >
> >
> >
> >
>
>
>
> CEB Canada Inc. Registration No: 1781071. Registered office: 199 Bay
> Street Commerce Court West, # 2800, Toronto, Ontario, Canada, M5L 1AP.
>
> This e-mail and/or its attachments are intended only for the use of the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including SHL. If you
> have received this e-mail in error, please notify the sender and
> immediately, destroy all copies of this email and its attachments. The
> publication, copying, in whole or in part, or use or dissemination in any
> other way of this e-mail and attachments by anyone other than the intended
> person(s) is prohibited.
>
>


[ANNOUNCE] Apache Solr 6.0.1 released

2016-05-28 Thread Steve Rowe
28 May 2016, Apache Solr™ 6.0.1 available 

The Lucene PMC is pleased to announce the release of Apache Solr 6.0.1 

Solr is the popular, blazing fast, open source NoSQL search platform 
from the Apache Lucene project. Its major features include powerful 
full-text search, hit highlighting, faceted search, dynamic 
clustering, database integration, rich document (e.g., Word, PDF) 
handling, and geospatial search. Solr is highly scalable, providing 
fault tolerant distributed search and indexing, and powers the search 
and navigation features of many of the world's largest internet sites. 

This release includes 31 bug fixes, documentation updates, etc., 
since the 6.0.0 release. 

The release is available for immediate download at: 

http://www.apache.org/dyn/closer.lua/lucene/solr/6.0.1 

Please read CHANGES.txt for a detailed list of changes: 

https://lucene.apache.org/solr/6_0_1/changes/Changes.html 

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html) 

Note: The Apache Software Foundation uses an extensive mirroring 
network for distributing releases. It is possible that the mirror you 
are using may not have replicated the release yet. If that is the 
case, please try another mirror. This also goes for Maven access.

Re: Can a DocTransformer access the whole results tree?

2016-05-28 Thread Stefan Matheis
Isn't that exactly what [explain] and [child] are doing? They locate
whatever data they're working on alongside the document it's related to

What Upayavira asks for looks the very same to me, doesn't it?

-Stefan
On May 27, 2016 7:27 PM, "Erick Erickson"  wrote:

> Maybe you'd be better off using a custom search component.
> instead of a doc transformer. The intent of a doc transformer
> is, as you've discovered, working on single docs at a time. You
> want to manipulate the whole response which seems to fit more
> naturally into a search component. Make sure to put it after
> the highlight component (i.e. last-components).
>
> Best,
> Erick
>
> On Fri, May 27, 2016 at 6:55 AM, Upayavira  wrote:
> > In a JSON response, we get this:
> >
> > {
> >   "responseHeader": {...},
> >   "response": { "docs": [...] },
> >   "highlighting": {...}
> >   ...
> > }
> >
> > I'm assuming that the getProcessedDocuments call would give me the docs:
> > {} element, whereas I'm after the whole response so I can retrieve the
> > "highlighting" element.
> >
> > Make sense?
> >
> > On Fri, 27 May 2016, at 02:45 PM, Mikhail Khludnev wrote:
> >> Upayavira,
> >>
> >> It's not clear what do you mean in "results themselves", perhaps you
> mean
> >> SolrDocuments ?
> >>
> >> public abstract class ResultContext {
> >>  ..
> >>   public Iterator getProcessedDocuments() {
> >> return new DocsStreamer(this);
> >>   }
> >>
> >> On Fri, May 27, 2016 at 4:15 PM, Upayavira  wrote:
> >>
> >> > Yes, I've seen that. I can see the getDocList() method will presumably
> >> > give me the results themselves, but I need the full response so I can
> >> > get the highlighting details, but I can't see them anywhere.
> >> >
> >> > On Thu, 26 May 2016, at 09:39 PM, Mikhail Khludnev wrote:
> >> > > public abstract class ResultContext {
> >> > >
> >> > >  /// here are all results
> >> > >   public abstract DocList getDocList();
> >> > >
> >> > >   public abstract ReturnFields getReturnFields();
> >> > >
> >> > >   public abstract SolrIndexSearcher getSearcher();
> >> > >
> >> > >   public abstract Query getQuery();
> >> > >
> >> > >   public abstract SolrQueryRequest getRequest();
> >> > >
> >> > > On Thu, May 26, 2016 at 11:25 PM, Upayavira  wrote:
> >> > >
> >> > > > Hi Mikhail,
> >> > > >
> >> > > > Is there really? If I look at ResultContext, I see it is an
> abstract
> >> > > > class, completed by BasicResultContext. I don't see any context
> method
> >> > > > there. I can see a getContext() on SolrQueryRequest which just
> returns
> >> > a
> >> > > > hashmap. Will I find the response in there? Is that what you are
> >> > > > suggesting?
> >> > > >
> >> > > > Upayavira
> >> > > >
> >> > > > On Thu, 26 May 2016, at 06:28 PM, Mikhail Khludnev wrote:
> >> > > > > Hello,
> >> > > > >
> >> > > > > There is a protected ResultContext field named context.
> >> > > > >
> >> > > > > On Thu, May 26, 2016 at 5:31 PM, Upayavira 
> wrote:
> >> > > > >
> >> > > > > > Looking at the code for a sample DocTransformer, it seems
> that a
> >> > > > > > DocTransformer only has access to the document itself, not to
> the
> >> > whole
> >> > > > > > results. Because of this, it isn't possible to use a
> >> > DocTransformer to
> >> > > > > > merge, for example, the highlighting results into the main
> >> > document.
> >> > > > > >
> >> > > > > > Am I missing something?
> >> > > > > >
> >> > > > > > Upayavira
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Sincerely yours
> >> > > > > Mikhail Khludnev
> >> > > > > Principal Engineer,
> >> > > > > Grid Dynamics
> >> > > > >
> >> > > > > 
> >> > > > > 
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Sincerely yours
> >> > > Mikhail Khludnev
> >> > > Principal Engineer,
> >> > > Grid Dynamics
> >> > >
> >> > > 
> >> > > 
> >> >
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> 
> >> 
>