Re: Solr with encrypted HDFS

2019-09-12 Thread John Thorhauer
Great.  Thanks so much Hendrik for your experience!  We will might have
high volume levels to deal with but probably not high commit rates.

On Thu, Sep 12, 2019 at 1:45 AM Hendrik Haddorp 
wrote:

> Hi,
>
> we have some setups that use an encryption zone in HDFS. Once you have
> the hdfs config setup the rest is transparent to the client and thus
> Solr works just fine like that. Said that, we have some general issues
> with Solr and HDFS. The main problem seems to be around the transaction
> log files. We have a quite high commit rate and these short lived files
> don't seem to play well with HDFS and triple replication of the blocks
> in HDFS. But encryption did not add anything issues for us.
>
> regards,
> Hendrik
>
> On 11.09.19 22:53, John Thorhauer wrote:
> > Hi,
> >
> > I am interested in encrypting/protecting my solr indices.  I am wondering
> > if Solr can work the an encrypted HDFS.  I see that these instructions (
> >
> https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-hdfs-encryption/content/configuring_and_using_hdfs_data_at_rest_encryption.html
> )
> > explain that:
> >
> > "After permissions are set, Java API clients and HDFS applications with
> > sufficient HDFS and Ranger KMS access privileges can write and read
> to/from
> > files in the encryption zone"
> >
> >
> > So I am wondering if the solr/java api that uses HDFS would work with
> this
> > as well and also, has anyone had experience running this?  Either good
> > or bad?
> >
> > Thanks,
> > John
> >
>
>


Solr with encrypted HDFS

2019-09-11 Thread John Thorhauer
Hi,

I am interested in encrypting/protecting my solr indices.  I am wondering
if Solr can work the an encrypted HDFS.  I see that these instructions (
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-hdfs-encryption/content/configuring_and_using_hdfs_data_at_rest_encryption.html)
explain that:

"After permissions are set, Java API clients and HDFS applications with
sufficient HDFS and Ranger KMS access privileges can write and read to/from
files in the encryption zone"


So I am wondering if the solr/java api that uses HDFS would work with this
as well and also, has anyone had experience running this?  Either good
or bad?

Thanks,
John


Re: is SearchComponent the correct way?

2018-11-29 Thread John Thorhauer
So my understanding is that the DelegatingCollector.collect() method has
access to a single doc.  At that point I must choose to either call
super.collect() or not.  So this is the point at which I have to check
redis for security data for a single doc and determine if this doc should
be allowed as part of the result set or not.  So it seems that I have to
check my redis cache one doc at a time since I am only provided one doc in
the collect() method and I must determine at this point if I should call
the super.collect() or not.

I would like to find an option where I can get all the docs in the
postfilter and run a single query to redis with all of the docs at once to
get a single answer back from redis and then determine, based on the redis
response, which of the docs should be allowed to pass thru my postfilter.




On Fri, Nov 16, 2018 at 4:30 PM Mikhail Khludnev  wrote:

> On Tue, Nov 13, 2018 at 6:36 AM John Thorhauer 
> wrote:
>
> > Mikhail,
> >
> > Where do I implement the buffering?  I can not do it in then collect()
> > method.
>
> Please clarify why exactly? Notice my statement about one segment only.
>
>
> > I can not see how I can get access to what I need in the finish()
> > method.
> >
> > Thanks,
> > John
> >
> > On Tue, Nov 6, 2018 at 12:44 PM Mikhail Khludnev 
> wrote:
> >
> > > Not really. It expect to work segment by segment. So it can buffer all
> > doc
> > > from one segment, hit redis and push all results into delegating
> > collector.
> > >
> > > On Tue, Nov 6, 2018 at 8:29 PM John Thorhauer 
> > > wrote:
> > >
> > > > Mikhail,
> > > >
> > > > Thanks for the suggestion.  After looking over the PostFilter
> interface
> > > and
> > > > the DelegatingCollector, it appears that this would require me to
> query
> > > my
> > > > outside datastore (redis) for security information once for each
> > > document.
> > > > This would be a big performance issue.  I would like to be able to
> > > iterate
> > > > through the documents, gathering all the critical ID's and then send
> a
> > > > single query to redis, getting back my security related data, and
> then
> > > > iterate through the documents, pulling out the ones that the user
> > should
> > > > not see.
> > > >
> > > > Is this possible?
> > > >
> > > > Thanks again for your help!
> > > > John
> > > >
> > > >
> > > > On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer <
> jthorha...@yakabod.com>
> > > > wrote:
> > > >
> > > > > We have a need to check the results of a search against a set of
> > > security
> > > > > lists that are maintained in a redis cache.  I need to be able to
> > take
> > > > each
> > > > > document that is returned for a search and check the redis cache to
> > see
> > > > if
> > > > > the document should be displayed or not.
> > > > >
> > > > > I am attempting to do this by creating a SearchComponent.  I am
> able
> > to
> > > > > iterate thru the results and identify the items I want to remove
> from
> > > the
> > > > > results but I am not sure how to proceed in removing them.
> > > > >
> > > > > Is SearchComponent the best way to do this?  If so, any thoughts on
> > how
> > > > to
> > > > > proceed?
> > > > >
> > > > >
> > > > > Thanks,
> > > > > John Thorhauer
> > > > >
> > > > >
> > > >
> > > > --
> > > > John Thorhauer
> > > > Vice President, Software Development
> > > > Yakabod, Inc.
> > > > Cell: 240-818-9050
> > > > Office: 301-662-4554 x2105
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
> >
> > --
> > John Thorhauer
> > Vice President, Software Development
> > Yakabod, Inc.
> > Cell: 240-818-9050
> > Office: 301-662-4554 x2105
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: is SearchComponent the correct way?

2018-11-12 Thread John Thorhauer
Mikhail,

Where do I implement the buffering?  I can not do it in then collect()
method.  I can not see how I can get access to what I need in the finish()
method.

Thanks,
John

On Tue, Nov 6, 2018 at 12:44 PM Mikhail Khludnev  wrote:

> Not really. It expect to work segment by segment. So it can buffer all doc
> from one segment, hit redis and push all results into delegating collector.
>
> On Tue, Nov 6, 2018 at 8:29 PM John Thorhauer 
> wrote:
>
> > Mikhail,
> >
> > Thanks for the suggestion.  After looking over the PostFilter interface
> and
> > the DelegatingCollector, it appears that this would require me to query
> my
> > outside datastore (redis) for security information once for each
> document.
> > This would be a big performance issue.  I would like to be able to
> iterate
> > through the documents, gathering all the critical ID's and then send a
> > single query to redis, getting back my security related data, and then
> > iterate through the documents, pulling out the ones that the user should
> > not see.
> >
> > Is this possible?
> >
> > Thanks again for your help!
> > John
> >
> >
> > On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer 
> > wrote:
> >
> > > We have a need to check the results of a search against a set of
> security
> > > lists that are maintained in a redis cache.  I need to be able to take
> > each
> > > document that is returned for a search and check the redis cache to see
> > if
> > > the document should be displayed or not.
> > >
> > > I am attempting to do this by creating a SearchComponent.  I am able to
> > > iterate thru the results and identify the items I want to remove from
> the
> > > results but I am not sure how to proceed in removing them.
> > >
> > > Is SearchComponent the best way to do this?  If so, any thoughts on how
> > to
> > > proceed?
> > >
> > >
> > > Thanks,
> > > John Thorhauer
> > >
> > >
> >
> > --
> > John Thorhauer
> > Vice President, Software Development
> > Yakabod, Inc.
> > Cell: 240-818-9050
> > Office: 301-662-4554 x2105
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
John Thorhauer
Vice President, Software Development
Yakabod, Inc.
Cell: 240-818-9050
Office: 301-662-4554 x2105


Re: is SearchComponent the correct way?

2018-11-06 Thread John Thorhauer
Mikhail,

Thanks for the suggestion.  After looking over the PostFilter interface and
the DelegatingCollector, it appears that this would require me to query my
outside datastore (redis) for security information once for each document.
This would be a big performance issue.  I would like to be able to iterate
through the documents, gathering all the critical ID's and then send a
single query to redis, getting back my security related data, and then
iterate through the documents, pulling out the ones that the user should
not see.

Is this possible?

Thanks again for your help!
John


On Tue, Nov 6, 2018 at 6:24 AM John Thorhauer 
wrote:

> We have a need to check the results of a search against a set of security
> lists that are maintained in a redis cache.  I need to be able to take each
> document that is returned for a search and check the redis cache to see if
> the document should be displayed or not.
>
> I am attempting to do this by creating a SearchComponent.  I am able to
> iterate thru the results and identify the items I want to remove from the
> results but I am not sure how to proceed in removing them.
>
> Is SearchComponent the best way to do this?  If so, any thoughts on how to
> proceed?
>
>
> Thanks,
> John Thorhauer
>
>

-- 
John Thorhauer
Vice President, Software Development
Yakabod, Inc.
Cell: 240-818-9050
Office: 301-662-4554 x2105


is SearchComponent the correct way?

2018-11-06 Thread John Thorhauer
We have a need to check the results of a search against a set of security
lists that are maintained in a redis cache.  I need to be able to take each
document that is returned for a search and check the redis cache to see if
the document should be displayed or not.

I am attempting to do this by creating a SearchComponent.  I am able to
iterate thru the results and identify the items I want to remove from the
results but I am not sure how to proceed in removing them.

Is SearchComponent the best way to do this?  If so, any thoughts on how to
proceed?


Thanks,
John Thorhauer


Re: dynamic field assignments

2014-05-15 Thread John Thorhauer
Chris,

Thanks so much for the suggestion.  I will look into this approach.  It
looks very promising!

John


On Mon, May 5, 2014 at 9:50 PM, Chris Hostetter wrote:

>
> : My understanding is that DynamicField can do something like
> : FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
> : FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
> : field names need to map to a field type of 'fullText'.
>
> I'm pretty sure you can get what you are after with the new Manged Schema
> functionality...
>
> https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
>
> https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig
>
> Assuming you have managed schema enabled in solrconfig.xml, and you define
> both of your fieldTypes using names like "text" and "select" then
> something like this should work in your processor chain...
>
>  
>.*_TEXT_.*
>text
>  
>  
>.*_SELECT_.*
>select
>  
>
>
> (Normally that processor is used once with multiple value->type mappings
> -- but in your case you don't care about the run-time value, just the run
> time field name regex (which should also be configurable according
> to the various FieldNameSelector rules...
>
>
> https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
>
> https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html
>
>
> -Hoss
> http://www.lucidworks.com/
>



-- 
John Thorhauer
Director/Remote Interfaces
Yakabod, Inc.
301-662-4554 x2105


Re: dynamic field assignments

2014-04-25 Thread John Thorhauer
Jack,

Thanks for your help.

> Reading your last paragraph, how is that any different than exactly what
> DynamicField actually does?

My understanding is that DynamicField can do something like
FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
field names need to map to a field type of 'fullText'.

> You say you want to change fields at "run time" - what is "run time"? When
> exactly do your field names change?

What I mean is that when the document is fed to solr.  So on the
update process when the document is being indexed.  The document that
is being indexed may have fields that are unknown until the time of
indexing.  However, some of those fields will follow a predictable
naming patter as mentioned above.

> You can always write an update request processor to do any manipulation of
> field values at index time.

I see that Solr has this capability.  However, I dont think I need to
manipulate field values.  I need to map a field/value to a particular
fieldType for indexing.


dynamic field assignments

2014-04-25 Thread John Thorhauer
I have a scenario where I would like dynamically assign incoming
document fields to two different solr schema fieldTypes.  One
fieldType will be an exact match fieldType while the other will be a
full text fieldType.  I know that I can use the dynamicField to assign
fields using the asterisk in a naming pattern.  However, I have a set
of incoming data in which the field names can change at run time.  The
fields will follow a predictable pattern but the pattern can not be
recognized using the dynamicField type.

So here is an example of the new types of field names that I need to
be able to process:

FOO_BAR_TEXT_1
FOO_BAR_TEXT_2
FOO_BAR_TEXT_3
FOO_BAR_TEXT_4

FOO_BAR_SELECT_1
FOO_BAR_SELECT_2
FOO_BAR_SELECT_3

So the above fields will not be defined in advance.  I need to map all
fields with the name FOO_BAR_SELECT_* to a fieldType of 'exactMatch'
and I need to map all of the fields with name FOO_BAR_TEXT_* to a
fieldType of full'text'.  I was hoping there might be a way of doing
this dynamically.  Does anyone have any ideas how to approach this?

Thanks,
John Thorhauer


Re: filter query parsing problem

2010-01-19 Thread John Thorhauer
Ahmet,

Thanks so much for the help.  I will give it a shot.

John

On Mon, Jan 18, 2010 at 4:40 PM, Ahmet Arslan  wrote:
>> I am submitting a query and it seems
>> to be parsing incorrectly.  Here
>> is the query with the debug output.  Any ideas what
>> the problem is:
>>
>> 
>>   
>>     ((VLog:814124 || VLog:12342) &&
>> (PublisherType:U || PublisherType:A))
>>   
>> 
>> 
>>     +(VLog:814124 VLog:12342)
>> +PublisherType:u
>> 
>>
>> I would have thought that the parsed filter would have
>> looked like this:
>>         +(VLog:814124
>> VLog:12342) +(PublisherType:u PublisherType:a)
>
> It seems that stopfilterfactory is eating A which is a stop word. You can 
> remove stopfilterfactory from analyzer chain of type of PublisherType. Or you 
> can remove entry a from stopwords.txt.
>
>
>
>


filter query parsing problem

2010-01-18 Thread John Thorhauer
I am submitting a query and it seems to be parsing incorrectly.  Here
is the query with the debug output.  Any ideas what the problem is:


  
((VLog:814124 || VLog:12342) && (PublisherType:U || PublisherType:A))
  


+(VLog:814124 VLog:12342) +PublisherType:u


I would have thought that the parsed filter would have looked like this:
+(VLog:814124 VLog:12342) +(PublisherType:u PublisherType:a)

Thanks for the help,
John Thorhauer