Re: Caching Solr Grouping Results
Hi, I know few about groping component, but I think it is very hard. Because query result cache has {query and conditions} -> {DocList} structure. ( https://github.com/apache/lucene-solr/blob/e30264b31400a147507aabd121b1152020b8aa6d/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L120 ) But in caching grouping result, query result cache should have {query and conditions} -> {grouped value, condition, etc...} -> {DocList} structure cache, I think. Thanks, Yasufumi 2018年5月18日(金) 23:41 rubi.hali: > Hi All > > Can somebody please explain if we can cache solr grouping results in query > result cache as i dont see any inserts in query result cache once we > enabled > grouping? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Index filename while indexing JSON file
would you consider to include the filename as another meta data fields for being indexed? I think your downstream python can do that easily. ** *Sincerely yours,* *Raymond* On Fri, May 18, 2018 at 3:47 PM, S.Ashwathwrote: > Hello, > > I have 2 directories: 1 with txt files and the other with corresponding > JSON (metadata) files (around 9 of each). There is one JSON file for > each CSV file, and they share the same name (they don't share any other > fields). > > The txt files just have plain text, I mapped each line to a field call > 'sentence' and included the file name as a field using the data import > handler. No problems here. > > The JSON file has metadata: 3 tags: a URL, author and title (for the > content in the corresponding txt file). > When I index the JSON file (I just used the _default schema, and posted the > fields to the schema, as explained in the official solr tutorial),* I don't > know how to get the file name into the index as a field.* As far as i know, > that's no way to use the Data import handler for JSON files. I've read that > I can pass a literal through the bin/post tool, but again, as far as I > understand, I can't pass in the file name dynamically as a literal. > > I NEED to get the file name, it is the only way in which I can associate > the metadata with each sentence in the txt files in my downstream Python > code. > > So if anybody has a suggestion about how I should index the JSON file name > along with the JSON content (or even some workaround), I'd be eternally > grateful. > > Regards, > > Ash >
Re: about solr reduce shard nums
Simplest would be to host multiple shards on the same machine. Use ADDREPLICA/DELETEREPLICA (collections API calls) to move the replicas hosted on the nodes you want to use for another purpose and, when all replicas are moved you can repurpose those machines. Another option would be to create a _new_ collection on the machines you'll have dedicated to Solr with fewer shards, re-index all the documents to the new machine and use collection aliasing (collections API CREATEALIAS) to point to the new collection, then delete the old collection. There are significantly more complex options to truly merge the shards, but I wouldn't consider them until the above was proven to be unsatisfactory. In any case, you need to be sure that the machines that remain are powerful enough to host all the documents you've put on them. Best, Erick On Sat, May 19, 2018 at 11:58 PM, 苗海泉wrote: > Hello everyone, I encountered a shard reduction problem with solr. My > current solr cluster is deployed in solrcloud mode. Now I need to use > several solr machines for other purposes. The solr version I use is Solr > 6.0. What should I do? Do it, thank you for your help.-- > == > 联创科技 > 知行如一 > ==
How to do parallel indexing on files (not on HDFS)
I know how to do indexing on file system like single file or folder, but how do I do that in a parallel way? The data I need to index is of huge volume and can't be put on HDFS. Thank you ** *Sincerely yours,* *Raymond*