Re: Caching Solr Grouping Results

2018-05-20 Thread Yasufumi Mizoguchi
Hi,

I know few about groping component, but I think it is very hard. Because
query result cache has {query and conditions} -> {DocList} structure.
(
https://github.com/apache/lucene-solr/blob/e30264b31400a147507aabd121b1152020b8aa6d/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L120
)

But in caching grouping result, query result cache should have {query and
conditions} -> {grouped value, condition, etc...} -> {DocList} structure
cache, I think.

Thanks,
Yasufumi

2018年5月18日(金) 23:41 rubi.hali :

> Hi All
>
> Can somebody please explain if we can cache solr grouping results in query
> result cache as i dont see any inserts in query result cache once we
> enabled
> grouping?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Index filename while indexing JSON file

2018-05-20 Thread Raymond Xie
would you consider to include the filename as another meta data fields for
being indexed? I think your downstream python can do that easily.


**
*Sincerely yours,*


*Raymond*

On Fri, May 18, 2018 at 3:47 PM, S.Ashwath  wrote:

> Hello,
>
> I have 2 directories: 1 with txt files and the other with corresponding
> JSON (metadata) files (around 9 of each). There is one JSON file for
> each CSV file, and they share the same name (they don't share any other
> fields).
>
> The txt files just have plain text, I mapped each line to a field call
> 'sentence' and included the file name as a field using the data import
> handler. No problems here.
>
> The JSON file has metadata: 3 tags: a URL, author and title (for the
> content in the corresponding txt file).
> When I index the JSON file (I just used the _default schema, and posted the
> fields to the schema, as explained in the official solr tutorial),* I don't
> know how to get the file name into the index as a field.* As far as i know,
> that's no way to use the Data import handler for JSON files. I've read that
> I can pass a literal through the bin/post tool, but again, as far as I
> understand, I can't pass in the file name dynamically as a literal.
>
> I NEED to get the file name, it is the only way in which I can associate
> the metadata with each sentence in the txt files in my downstream Python
> code.
>
> So if anybody has a suggestion about how I should index the JSON file name
> along with the JSON content (or even some workaround), I'd be eternally
> grateful.
>
> Regards,
>
> Ash
>


Re: about solr reduce shard nums

2018-05-20 Thread Erick Erickson
Simplest would be to host multiple shards on the same machine. Use
ADDREPLICA/DELETEREPLICA (collections API calls) to move the replicas
hosted on the nodes you want to use for another purpose and, when all
replicas are moved you can repurpose those machines.

Another option would be to create a _new_ collection on the machines
you'll have dedicated to Solr with fewer shards, re-index all the
documents to the new machine and use collection aliasing (collections
API CREATEALIAS) to point to the new collection, then delete the old
collection.

There are significantly more complex options to truly merge the
shards, but I wouldn't consider them until the above was proven to be
unsatisfactory.

In any case, you need to be sure that the machines that remain are
powerful enough to host all the documents you've put on them.

Best,
Erick

On Sat, May 19, 2018 at 11:58 PM, 苗海泉  wrote:
> Hello everyone, I encountered a shard reduction problem with solr. My
> current solr cluster is deployed in solrcloud mode. Now I need to use
> several solr machines for other purposes. The solr version I use is Solr
> 6.0. What should I do? Do it, thank you for your help.--
> ==
> 联创科技
> 知行如一
> ==


How to do parallel indexing on files (not on HDFS)

2018-05-20 Thread Raymond Xie
I know how to do indexing on file system like single file or folder, but
how do I do that in a parallel way? The data I need to index is of huge
volume and can't be put on HDFS.

Thank you

**
*Sincerely yours,*


*Raymond*