Re: Solr with encrypted HDFS
Great. Thanks so much Hendrik for your experience! We will might have high volume levels to deal with but probably not high commit rates. On Thu, Sep 12, 2019 at 1:45 AM Hendrik Haddorp wrote: > Hi, > > we have some setups that use an encryption zone in HDFS. Once you have > the hdfs config setup the rest is transparent to the client and thus > Solr works just fine like that. Said that, we have some general issues > with Solr and HDFS. The main problem seems to be around the transaction > log files. We have a quite high commit rate and these short lived files > don't seem to play well with HDFS and triple replication of the blocks > in HDFS. But encryption did not add anything issues for us. > > regards, > Hendrik > > On 11.09.19 22:53, John Thorhauer wrote: > > Hi, > > > > I am interested in encrypting/protecting my solr indices. I am wondering > > if Solr can work the an encrypted HDFS. I see that these instructions ( > > > https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-hdfs-encryption/content/configuring_and_using_hdfs_data_at_rest_encryption.html > ) > > explain that: > > > > "After permissions are set, Java API clients and HDFS applications with > > sufficient HDFS and Ranger KMS access privileges can write and read > to/from > > files in the encryption zone" > > > > > > So I am wondering if the solr/java api that uses HDFS would work with > this > > as well and also, has anyone had experience running this? Either good > > or bad? > > > > Thanks, > > John > > > >
Solr with encrypted HDFS
Hi, I am interested in encrypting/protecting my solr indices. I am wondering if Solr can work the an encrypted HDFS. I see that these instructions ( https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-hdfs-encryption/content/configuring_and_using_hdfs_data_at_rest_encryption.html) explain that: "After permissions are set, Java API clients and HDFS applications with sufficient HDFS and Ranger KMS access privileges can write and read to/from files in the encryption zone" So I am wondering if the solr/java api that uses HDFS would work with this as well and also, has anyone had experience running this? Either good or bad? Thanks, John
Re: is SearchComponent the correct way?
So my understanding is that the DelegatingCollector.collect() method has access to a single doc. At that point I must choose to either call super.collect() or not. So this is the point at which I have to check redis for security data for a single doc and determine if this doc should be allowed as part of the result set or not. So it seems that I have to check my redis cache one doc at a time since I am only provided one doc in the collect() method and I must determine at this point if I should call the super.collect() or not. I would like to find an option where I can get all the docs in the postfilter and run a single query to redis with all of the docs at once to get a single answer back from redis and then determine, based on the redis response, which of the docs should be allowed to pass thru my postfilter. On Fri, Nov 16, 2018 at 4:30 PM Mikhail Khludnev wrote: > On Tue, Nov 13, 2018 at 6:36 AM John Thorhauer > wrote: > > > Mikhail, > > > > Where do I implement the buffering? I can not do it in then collect() > > method. > > Please clarify why exactly? Notice my statement about one segment only. > > > > I can not see how I can get access to what I need in the finish() > > method. > > > > Thanks, > > John > > > > On Tue, Nov 6, 2018 at 12:44 PM Mikhail Khludnev > wrote: > > > > > Not really. It expect to work segment by segment. So it can buffer all > > doc > > > from one segment, hit redis and push all results into delegating > > collector. > > > > > > On Tue, Nov 6, 2018 at 8:29 PM John Thorhauer > > > wrote: > > > > > > > Mikhail, > > > > > > > > Thanks for the suggestion. After looking over the PostFilter > interface > > > and > > > > the DelegatingCollector, it appears that this would require me to > query > > > my > > > > outside datastore (redis) for security information once for each > > > document. > > > > This would be a big performance issue. I would like to be able to > > > iterate > > > > through the documents, gathering all the critical ID's and then send > a > > > > single query to redis, getting back my security related data, and > then > > > > iterate through the documents, pulling out the ones that the user > > should > > > > not see. > > > > > > > > Is this possible? > > > > > > > > Thanks again for your help! > > > > John > > > > > > > > > > > > On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer < > jthorha...@yakabod.com> > > > > wrote: > > > > > > > > > We have a need to check the results of a search against a set of > > > security > > > > > lists that are maintained in a redis cache. I need to be able to > > take > > > > each > > > > > document that is returned for a search and check the redis cache to > > see > > > > if > > > > > the document should be displayed or not. > > > > > > > > > > I am attempting to do this by creating a SearchComponent. I am > able > > to > > > > > iterate thru the results and identify the items I want to remove > from > > > the > > > > > results but I am not sure how to proceed in removing them. > > > > > > > > > > Is SearchComponent the best way to do this? If so, any thoughts on > > how > > > > to > > > > > proceed? > > > > > > > > > > > > > > > Thanks, > > > > > John Thorhauer > > > > > > > > > > > > > > > > > > -- > > > > John Thorhauer > > > > Vice President, Software Development > > > > Yakabod, Inc. > > > > Cell: 240-818-9050 > > > > Office: 301-662-4554 x2105 > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > > -- > > John Thorhauer > > Vice President, Software Development > > Yakabod, Inc. > > Cell: 240-818-9050 > > Office: 301-662-4554 x2105 > > > > > -- > Sincerely yours > Mikhail Khludnev >
Re: is SearchComponent the correct way?
Mikhail, Where do I implement the buffering? I can not do it in then collect() method. I can not see how I can get access to what I need in the finish() method. Thanks, John On Tue, Nov 6, 2018 at 12:44 PM Mikhail Khludnev wrote: > Not really. It expect to work segment by segment. So it can buffer all doc > from one segment, hit redis and push all results into delegating collector. > > On Tue, Nov 6, 2018 at 8:29 PM John Thorhauer > wrote: > > > Mikhail, > > > > Thanks for the suggestion. After looking over the PostFilter interface > and > > the DelegatingCollector, it appears that this would require me to query > my > > outside datastore (redis) for security information once for each > document. > > This would be a big performance issue. I would like to be able to > iterate > > through the documents, gathering all the critical ID's and then send a > > single query to redis, getting back my security related data, and then > > iterate through the documents, pulling out the ones that the user should > > not see. > > > > Is this possible? > > > > Thanks again for your help! > > John > > > > > > On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer > > wrote: > > > > > We have a need to check the results of a search against a set of > security > > > lists that are maintained in a redis cache. I need to be able to take > > each > > > document that is returned for a search and check the redis cache to see > > if > > > the document should be displayed or not. > > > > > > I am attempting to do this by creating a SearchComponent. I am able to > > > iterate thru the results and identify the items I want to remove from > the > > > results but I am not sure how to proceed in removing them. > > > > > > Is SearchComponent the best way to do this? If so, any thoughts on how > > to > > > proceed? > > > > > > > > > Thanks, > > > John Thorhauer > > > > > > > > > > -- > > John Thorhauer > > Vice President, Software Development > > Yakabod, Inc. > > Cell: 240-818-9050 > > Office: 301-662-4554 x2105 > > > > > -- > Sincerely yours > Mikhail Khludnev > -- John Thorhauer Vice President, Software Development Yakabod, Inc. Cell: 240-818-9050 Office: 301-662-4554 x2105
Re: is SearchComponent the correct way?
Mikhail, Thanks for the suggestion. After looking over the PostFilter interface and the DelegatingCollector, it appears that this would require me to query my outside datastore (redis) for security information once for each document. This would be a big performance issue. I would like to be able to iterate through the documents, gathering all the critical ID's and then send a single query to redis, getting back my security related data, and then iterate through the documents, pulling out the ones that the user should not see. Is this possible? Thanks again for your help! John On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer wrote: > We have a need to check the results of a search against a set of security > lists that are maintained in a redis cache. I need to be able to take each > document that is returned for a search and check the redis cache to see if > the document should be displayed or not. > > I am attempting to do this by creating a SearchComponent. I am able to > iterate thru the results and identify the items I want to remove from the > results but I am not sure how to proceed in removing them. > > Is SearchComponent the best way to do this? If so, any thoughts on how to > proceed? > > > Thanks, > John Thorhauer > > -- John Thorhauer Vice President, Software Development Yakabod, Inc. Cell: 240-818-9050 Office: 301-662-4554 x2105
is SearchComponent the correct way?
We have a need to check the results of a search against a set of security lists that are maintained in a redis cache. I need to be able to take each document that is returned for a search and check the redis cache to see if the document should be displayed or not. I am attempting to do this by creating a SearchComponent. I am able to iterate thru the results and identify the items I want to remove from the results but I am not sure how to proceed in removing them. Is SearchComponent the best way to do this? If so, any thoughts on how to proceed? Thanks, John Thorhauer
Re: dynamic field assignments
Chris, Thanks so much for the suggestion. I will look into this approach. It looks very promising! John On Mon, May 5, 2014 at 9:50 PM, Chris Hostetter wrote: > > : My understanding is that DynamicField can do something like > : FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have > : FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2. Both of those > : field names need to map to a field type of 'fullText'. > > I'm pretty sure you can get what you are after with the new Manged Schema > functionality... > > https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode > > https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig > > Assuming you have managed schema enabled in solrconfig.xml, and you define > both of your fieldTypes using names like "text" and "select" then > something like this should work in your processor chain... > > >.*_TEXT_.* >text > > >.*_SELECT_.* >select > > > > (Normally that processor is used once with multiple value->type mappings > -- but in your case you don't care about the run-time value, just the run > time field name regex (which should also be configurable according > to the various FieldNameSelector rules... > > > https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html > > https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html > > > -Hoss > http://www.lucidworks.com/ > -- John Thorhauer Director/Remote Interfaces Yakabod, Inc. 301-662-4554 x2105
Re: dynamic field assignments
Jack, Thanks for your help. > Reading your last paragraph, how is that any different than exactly what > DynamicField actually does? My understanding is that DynamicField can do something like FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2. Both of those field names need to map to a field type of 'fullText'. > You say you want to change fields at "run time" - what is "run time"? When > exactly do your field names change? What I mean is that when the document is fed to solr. So on the update process when the document is being indexed. The document that is being indexed may have fields that are unknown until the time of indexing. However, some of those fields will follow a predictable naming patter as mentioned above. > You can always write an update request processor to do any manipulation of > field values at index time. I see that Solr has this capability. However, I dont think I need to manipulate field values. I need to map a field/value to a particular fieldType for indexing.
dynamic field assignments
I have a scenario where I would like dynamically assign incoming document fields to two different solr schema fieldTypes. One fieldType will be an exact match fieldType while the other will be a full text fieldType. I know that I can use the dynamicField to assign fields using the asterisk in a naming pattern. However, I have a set of incoming data in which the field names can change at run time. The fields will follow a predictable pattern but the pattern can not be recognized using the dynamicField type. So here is an example of the new types of field names that I need to be able to process: FOO_BAR_TEXT_1 FOO_BAR_TEXT_2 FOO_BAR_TEXT_3 FOO_BAR_TEXT_4 FOO_BAR_SELECT_1 FOO_BAR_SELECT_2 FOO_BAR_SELECT_3 So the above fields will not be defined in advance. I need to map all fields with the name FOO_BAR_SELECT_* to a fieldType of 'exactMatch' and I need to map all of the fields with name FOO_BAR_TEXT_* to a fieldType of full'text'. I was hoping there might be a way of doing this dynamically. Does anyone have any ideas how to approach this? Thanks, John Thorhauer
Re: filter query parsing problem
Ahmet, Thanks so much for the help. I will give it a shot. John On Mon, Jan 18, 2010 at 4:40 PM, Ahmet Arslan wrote: >> I am submitting a query and it seems >> to be parsing incorrectly. Here >> is the query with the debug output. Any ideas what >> the problem is: >> >> >> >> ((VLog:814124 || VLog:12342) && >> (PublisherType:U || PublisherType:A)) >> >> >> >> +(VLog:814124 VLog:12342) >> +PublisherType:u >> >> >> I would have thought that the parsed filter would have >> looked like this: >> +(VLog:814124 >> VLog:12342) +(PublisherType:u PublisherType:a) > > It seems that stopfilterfactory is eating A which is a stop word. You can > remove stopfilterfactory from analyzer chain of type of PublisherType. Or you > can remove entry a from stopwords.txt. > > > >
filter query parsing problem
I am submitting a query and it seems to be parsing incorrectly. Here is the query with the debug output. Any ideas what the problem is: ((VLog:814124 || VLog:12342) && (PublisherType:U || PublisherType:A)) +(VLog:814124 VLog:12342) +PublisherType:u I would have thought that the parsed filter would have looked like this: +(VLog:814124 VLog:12342) +(PublisherType:u PublisherType:a) Thanks for the help, John Thorhauer