Re: Encryption At Rest - Using CustomAnalyzer
Hi, sorry to say that, but your encryption is not secure at all. Actually it is very weak. Since you encrypt tokens only (and apply padding) then it is very easy based on the examples above to actually reverse engineer your text. If somebody understands the domain, has text distribution and may build so-called word2vec then he/she may easily use it to build a reverse dictionary of your tokens. On the other hand: this means that actually it should not be so difficult to build wildcard queries (at least with asterisk at the end, not at the beginning of the word). Check how fuzzy query works right now - it is query easy to understand and straightforward when looking in source code. I built my own version of FuzzyQuery some time ago based on MultiTermQuery class. MW [image: photo] *Michael Wilkowski* Chief Technology Officer, Silent Eight Pte Ltd +48 600 995 603 | m...@silenteight.com www.silenteight.com Get your own email signature <https://wisestamp.com/email-install?utm_source=promotion&utm_medium=signature&utm_campaign=get_your_own> On Tue, Feb 6, 2018 at 3:42 AM, aravinth thangasami < aravinththangas...@gmail.com> wrote: > Kindly post your suggestions. > > > > On Mon, Dec 4, 2017 at 11:27 PM, aravinth thangasami < > aravinththangas...@gmail.com> wrote: > > > Hi all, > > > > To support Encryption at Rest, We have written a custom analyzer, that > > encrypts every token in the Input string and proceeds to the default > > indexing chain > > > > We are using AES/CTR/NoPadding with unique Key Per User. > > This helps that the input string with common prefix, the encrypted > strings > > will also get common prefix > > So that we can perform Prefix Query also. > > > > For example, > > > > run x5X7 > > runs x5X7tg== > > running x5X7q/nE5g== > > > > > > During searching, we will preprocess the query for encrypted Field before > > searching > > we can't do WildCard & Fuzzy Query > > > > > > Did anyone try this approach? > > Please post your suggestions and your tried approaches > > > > > > Thanks > > Aravinth > > > > > > > > >
MultiTermQuery vs multiple TermQuery'ies - is there a performance gain?
Hi, I am building an app that will create multiple term queries join with OR (>100 primitive TermQuery'ies). Is there a real performance gain implementing custom MultiTermQuery instead of simply joining multiple TermQuery with OR? Regards, MW
Re: Heavy usage of final in Lucene classes
Perfect! Thanks, that is what I was looking for :-). MW On Thu, Jan 12, 2017 at 12:02 PM, Alan Woodward wrote: > Hi Michael, > > You want to set the positionIncrementGap - either wrap your analyzer with > an AnalyzerWrapper that overrides getPositionIncrementGap(), or use a > CustomAnalyzer builder and set it there. > > Alan Woodward > www.flax.co.uk > > > > On 12 Jan 2017, at 10:57, Michael Wilkowski wrote: > > > > Hi, > > I wanted to subclass StandardTokenizer to manipulate a little with > > PositionAttribute. I wanted to increase steps between adjacent fields of > > the same, so if there is a multi-value TextField: > > > > fieldX: "name1 name2", > > fieldX:"name3 name4" > > > > then PhraseQuery like this fieldX:"name2 name3" would not return a > result. > > I was forced to create "empty" values like this: > > > > fieldX: "name1 name2", > > fieldX: "EMPTY_VALUE", > > fieldX:"name3 name4" > > > > to achieve it. > > > > Regards, > > MW > > > > > > > > On Thu, Jan 12, 2017 at 1:10 AM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> I don't think it's about efficiency but rather about not exposing > >> possibly trappy APIs / usage ... > >> > >> Do you have a particular class/method that you'd want to remove final > from? > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Wed, Jan 11, 2017 at 4:15 PM, Michael Wilkowski > >> wrote: > >>> Hi, > >>> I sometimes wonder what is the purpose of so heavy "final" methods and > >>> classes usage in Lucene. It makes it my life much harder to override > >>> standard classes with some custom implementation. > >>> > >>> What comes first to my mind is runtime efficiency (compiler "knows" > that > >>> this class/method will not be overridden and may create more efficient > >> code > >>> without jump lookup tables and with method inlining). Is my assumption > >>> correct or there are other benefits that were behind this decision? > >>> > >>> Regards, > >>> Michael W. > >> > >
Re: Heavy usage of final in Lucene classes
Hi, I wanted to subclass StandardTokenizer to manipulate a little with PositionAttribute. I wanted to increase steps between adjacent fields of the same, so if there is a multi-value TextField: fieldX: "name1 name2", fieldX:"name3 name4" then PhraseQuery like this fieldX:"name2 name3" would not return a result. I was forced to create "empty" values like this: fieldX: "name1 name2", fieldX: "EMPTY_VALUE", fieldX:"name3 name4" to achieve it. Regards, MW On Thu, Jan 12, 2017 at 1:10 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > I don't think it's about efficiency but rather about not exposing > possibly trappy APIs / usage ... > > Do you have a particular class/method that you'd want to remove final from? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Jan 11, 2017 at 4:15 PM, Michael Wilkowski > wrote: > > Hi, > > I sometimes wonder what is the purpose of so heavy "final" methods and > > classes usage in Lucene. It makes it my life much harder to override > > standard classes with some custom implementation. > > > > What comes first to my mind is runtime efficiency (compiler "knows" that > > this class/method will not be overridden and may create more efficient > code > > without jump lookup tables and with method inlining). Is my assumption > > correct or there are other benefits that were behind this decision? > > > > Regards, > > Michael W. >
Heavy usage of final in Lucene classes
Hi, I sometimes wonder what is the purpose of so heavy "final" methods and classes usage in Lucene. It makes it my life much harder to override standard classes with some custom implementation. What comes first to my mind is runtime efficiency (compiler "knows" that this class/method will not be overridden and may create more efficient code without jump lookup tables and with method inlining). Is my assumption correct or there are other benefits that were behind this decision? Regards, Michael W.
Re: Lucene performance benchmark | search throughput
My guess: more conditions = less documents to score and sort to return. On Mon, Jan 2, 2017 at 7:23 PM, Rajnish kamboj wrote: > Hi > > Is there any Lucene performance benchmark against certain set of data? > [i.e Is there any stats for search throughput which Lucene can provide for > a certain data?] > > Search throughput Example: > Max. 200 TPS for 50K data on Lucene 5.3.1 on RHEL version x (with SSD) > Max. 150 TPS for 100K data on Lucene 5.3.1 on RHEL version x (with SSD) > Max. 300 TPS for 50K data on Lucene 6.0.0 on RHEL version x (with SSD) > etc. > > Also, does the index size matters for search throughput? > > Our observation: > When we increase the data size (hence index size) the search throughput > decreases. > When we add more AND conditions, the search throughput increases. Why? > Ideally if we add more conditions then the Lucene should have more work to > do (including merging) and the throughput should decrease but the > throughput increases? > > > Regards > Rajnish >
FuzzyQuery on entire set of terms
Hi, I need to implement a function that performs fuzzy search on multiple terms in the way that a summarized distance 2 from ALL terms is allowed. For example query: Lucene Apache Group with maximum distance 2 should match: Luceni Apachi Group Lucen Apache Group Luce Apache Group but not: Lucen Apach Grou I know that I can achieve it using multiple FuzzyQueries nested with BooleanQueries, but in case of more terms (>5) and distance of 2 there could be many many combinations and I am afraid of performance. Perhaps there is a better solution that someone may recommend? Regards, Michael
Re: Handling multiple locale
Hi, in my opinion your system locales have nothing to do with the analyzers that you want to apply. I would not rely on system locales as that makes application very unportable. Regarding any other way - there are none. You may apply regex query and create custom queries, but not dynamically refer to fields, because you may easily do it by hand (as you said by brute force, but you are only limited by max number of clauses in BooleanQuery). MW On Mon, Sep 26, 2016 at 4:10 AM, lukes wrote: > 1 more question :). Are numbers analyzed ? Like IntField, LongField, etc. ? > > Regards. > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Handling-multiple-locale-tp4297805p4297949.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Handling multiple locale
Hi, please explain I get it correctly: do you want to search your query within all possible locales? If yes then my personal pattern in such case would be to create multiple BooleanClause (with Occur.SHOULD, one clause per each locale) and add them to one BooleanQuery. MW On Sun, Sep 25, 2016 at 8:51 PM, lukes wrote: > Hi all, > > Any suggestions from the experts ? I assume, this problem is not coming > for the first time. > > Regards. > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Handling-multiple-locale-tp4297805p4297927.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Using Lucene to model ownership of documents
Definitely b). I would also suggest groups and expanding user groups at user sign in time. MW On Thu, Jun 16, 2016 at 12:36 PM, Ian Lea wrote: > I'd definitely go for b). The index will of course be larger for every > extra bit of data you store but it doesn't sound like this would make much > difference. Likewise for speed of indexing. > > > -- > Ian. > > > On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder wrote: > > > Hi there, > > I would like to use Lucene to solve the following problem: > > > > 1.We have about 100k customers and we have 25 millions of documents. > > > > 2.When a customer performs a text search on the document space, we want > to > > return only documents that the customer has access to. > > > > 3.The # of documents a customer owns varies a lot. some have close to 23 > > million, some have close to 10k and some own a third of the documents > etc. > > > > What is an efficient way to use Lucene in this scenario in terms of > > performance and indexing? > > We have tried a number of solutions such as > > > > a)100k boolean fields per document that indicates whether a customer has > > access to the document. > > b)A single text field that has a list of customers who owns the document > > e.g. (customers field : "abc abd cfx...") > > c) the above option with shards by customers > > > > The search&index performance for a was bad. b,c performed better for > search > > but lengthened the time needed for indexing & index size. > > We are also thinking about using a custom filter but we are concerned > about > > the memory requirements. > > > > Any ideas/suggestions would be really appreciated. > > >
Re: Cache Lucene based index.
I recommend reading http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1 Just use mmapdirectory and let operating system do the rest. MW Sent from Mi phone On 21 May 2016 12:42, "Prateek Singhal" wrote: > You can consider that I want to store the lucene index in some sort of > temporary memory or a HashMap so that I do not need to index the documents > every time as it is a costly operation. I can directly return the lucene > index from that HashMap and use it to answer my queries. > > Just want to know if I can access the lucene index object which lucene has > created so that I can cache it. > > > > On Sat, May 21, 2016 at 3:46 PM, Uwe Schindler wrote: > > > Hi, > > > > What do you mean with "cache"? > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > -Original Message- > > > From: Prateek Singhal [mailto:prateek.b...@gmail.com] > > > Sent: Saturday, May 21, 2016 11:27 AM > > > To: java-user@lucene.apache.org > > > Subject: Cache Lucene based index. > > > > > > Hi Lucene lovers, > > > > > > I have a use-case where I want to *create a lucene based index* of > > multiple > > > documents and then *want to cache that index*. > > > > > > Can anyone suggest if this is possible ? > > > And which *type of cache* will be most efficient for this use case. > > > > > > Also if you can provide me with any *example *of the same then it will > be > > > really very helpful. > > > > > > Thanks. > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > -- > Regards, > Prateek Singhal > Software Development Engineer @ Amazon.com > > "Believe in yourself and you can do unbelievable things." >
Re: TermRangeQuery work not
You mixed lowerDate with upperDate. MW Sent from Mi phone On 25 Dec 2015 16:41, "kaog" wrote: > hi > I did the change of variable "ISBN, it was a mistake I did when I wrote in > the post. unfortunately still it does not work TermRangeQuery. :( > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/TermRangeQuery-work-not-tp4246519p4247358.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Wildcard Terms and total word or phrase count
Hi Doug, your attachment is not available (likely security settings). Please put it in github or somewhere else and provide a link to download. MW On Mon, Nov 30, 2015 at 2:29 AM, Kunzman, Douglas * < douglas.kunz...@fda.hhs.gov> wrote: > > Jack - > > Thanks a lot for taking the time to try and answer my question. > > From using Solr I knew that it needed to be a TextField. > > I'm including the entire unit tester as an attachment. > > Thanks, > Doug > > -Original Message- > From: Jack Krupansky [mailto:jack.krupan...@gmail.com] > Sent: Sunday, November 29, 2015 12:18 PM > To: java-user@lucene.apache.org > Subject: Re: Wildcard Terms and total word or phrase count > > You didn't post your code that creates the index. Make sure you are using a > tokenized TextField rather than a single-token StringField. > > -- Jack Krupansky > > On Fri, Nov 27, 2015 at 4:06 PM, Kunzman, Douglas * < > douglas.kunz...@fda.hhs.gov> wrote: > > > Hi - > > > > This is my first Lucene project, my other search projects have used Solr. > > I would like to find the total number of WildCard terms in a set of > > documents with 0-N matches per document. > > I would prefer not have to open each document where a match is found. I > > need to be able to support wildcards but my requirements are somewhat > > flexible in about phrase search support. > > Whatever is easier. > > > > This is what I have so far. > > > >public static void main(String args[]) throws IOException, > > ParseException { > > Directory idx = FSDirectory.open(path); > > index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index"); > > > > Term term = new Term("Doc", "quar*"); > > > > WildcardQuery wc = new WildcardQuery(term); > > > > SpanQuery spanTerm = new > > SpanMultiTermQueryWrapper(wc); > > IndexReader indexReader = DirectoryReader.open(idx); > > > > System.out.println("Term freq=" + > indexReader.totalTermFreq(term)); > > System.out.println("Term freq=" + > > indexReader.getSumTotalTermFreq("Doc")); > > > > IndexSearcher isearcher = new IndexSearcher(indexReader); > > > > IndexReaderContext indexReaderContext = > > isearcher.getTopReaderContext(); > > TermContext context = TermContext.build(indexReaderContext, > term); > > TermStatistics termStatistics = isearcher.termStatistics(term, > > context); > > System.out.println("termStatics=" + > > termStatistics.totalTermFreq()); > > } > > > > Does anyone have any suggestions? totalTermFreq is zero, but when search > > using quartz we find matches. > > I'm searching the Quartz user's guide as an example. > > > > Thanks, > > Doug > > > > > > > > > > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >
Re: Wildcard Terms and total word or phrase count
It is because your index does not contain term quar* and this statistics function is not a query (you have to pass exact form of the term). To count terms that meet search criteria you may run search query with custom collector and count results. Or use normal search query returning TopDocs and just check totalHitCount (however, first option is faster because no results are gathered and sorted). MW Sent from Mi phone On 27 Nov 2015 22:06, "Kunzman, Douglas *" wrote: > Hi - > > This is my first Lucene project, my other search projects have used Solr. > I would like to find the total number of WildCard terms in a set of > documents with 0-N matches per document. > I would prefer not have to open each document where a match is found. I > need to be able to support wildcards but my requirements are somewhat > flexible in about phrase search support. > Whatever is easier. > > This is what I have so far. > >public static void main(String args[]) throws IOException, > ParseException { > Directory idx = FSDirectory.open(path); > index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index"); > > Term term = new Term("Doc", "quar*"); > > WildcardQuery wc = new WildcardQuery(term); > > SpanQuery spanTerm = new > SpanMultiTermQueryWrapper(wc); > IndexReader indexReader = DirectoryReader.open(idx); > > System.out.println("Term freq=" + indexReader.totalTermFreq(term)); > System.out.println("Term freq=" + > indexReader.getSumTotalTermFreq("Doc")); > > IndexSearcher isearcher = new IndexSearcher(indexReader); > > IndexReaderContext indexReaderContext = > isearcher.getTopReaderContext(); > TermContext context = TermContext.build(indexReaderContext, term); > TermStatistics termStatistics = isearcher.termStatistics(term, > context); > System.out.println("termStatics=" + > termStatistics.totalTermFreq()); > } > > Does anyone have any suggestions? totalTermFreq is zero, but when search > using quartz we find matches. > I'm searching the Quartz user's guide as an example. > > Thanks, > Doug > > > > > >
Re: Determine whether a MatchAllQuery or a Query with atleast one Term
Instanceof? MW Sent from Mi phone On 28 Nov 2015 06:57, "Sandeep Khanzode" wrote: > Hi, > I have a question. > In my program, I need to check whether the input query is a MatchAll Query > that contains no terms, or a Query (any variant) that has at least one > term. For typical Term queries, this seems reasonable to be done with > Query.extractTerms(Set<> terms) which gives the list of terms. > However, when there is a NumericRangeQuery, this method throws an > UnsupportedOperationException. > How can I determine that a NumericRangeQuery or any non-Term query exists > in the Input Query and differentiate it from the MatchAllQuery? -- SRK
Re: Lucene auto suggest
Try some examples from stackoverflow: http://stackoverflow.com/questions/24968697/how-to-implements-auto-suggest-using-lucenes-new-analyzinginfixsuggester-api On Wed, Nov 25, 2015 at 4:18 AM, Bhaskar wrote: > Could you please some help here? > > On Mon, Nov 23, 2015 at 10:50 PM, Bhaskar wrote: > > > Hi, > > I have one column in the data base and it is having below data( it can > > have 5000 to 3 rows) > > > > Fenway Antenna Dipole Top CH00 with Coaxial Cable Length 140mm > > Fenway Antenna Dipole Side CH01 with Coaxial Cable Length 220mm > > Fenway Antenna Slot Front CH02 with Coaxial Cable Length 220mm > > ANTENNA,C1318-510009-A,GP,SE0810 > > ANTENNA,C1318-510010-A,GP,SE0810 > > ANTENNA,C1318-510011-A,GP,SE0810 > > ANTENNA,MAF94108,GP,SN0905A,WLAN > > ANTENNA,MAF94119,GP,SN0905A,WLAN > > ANTENNA,MAF94362,GP,SN0905A,WLAN > > ANTENNA,MAF94159,GP,SN0906A,WLAN > > ANTENNA,MAF94195,GP,SN0906A,WLAN > > ANTENNA,MAF94196,GP,SN0906A,WLAN > > ANTENNA, STAMPED METAL, BOOST, CHAIN-0 > > ANTENNA, STAMPED METAL, BOOST, CHAIN-2 > > ANVIL DIPOLE ANT0 > > ANVIL DIPOLE ANT1 > > LIMELIGHT ANTENNA-A CABLE > > LIMELIGHT ANTENNA-B CABLE > > LIMELIGHT ANTENNA-D CABLE > > > > > > I think I want fuzzy suggestions only... for example... > > when user types *Fenway *then the words starting with *Fenway *should > > come.. i.e. > > > > Fenway Antenna Dipole Top CH00 with Coaxial Cable Length 140mm > > Fenway Antenna Dipole Side CH01 with Coaxial Cable Length 220mm > > Fenway Antenna Slot Front CH02 with Coaxial Cable Length 220mm > > > > Based on the user input the result should change. if user typed *Fenway > > Antenna Dipole *then > > > > Fenway Antenna Dipole Top CH00 with Coaxial Cable Length 140mm > > Fenway Antenna Dipole Side CH01 with Coaxial Cable Length 220mm > > > > like this based on the typed data then the result should change. > > > > > > Could you please suggest what is the best way to achieve this( may be > some > > samples for the same). > > Please let me know if I miss any info you required. > > > > Thank you very much. > > > > Regards, > > Bhaskar > > > > > > > > On Mon, Nov 23, 2015 at 10:24 PM, Alessandro Benedetti < > > abenede...@apache.org> wrote: > > > >> Can you list us your requirements ? > >> > >> Is analysis needed in the suggester ? > >> Do you want infix suggestions ? > >> Do you want fuzzy suggestions ? > >> Suggestions of the whole content of a field or only few tokens ? > >> > >> Starting from that you can take a look to the suggester component and > all > >> the different implementations. > >> There are a lot of Lookup strategy, very specific depending on the use > >> case. > >> > >> Cheers > >> > >> On 23 November 2015 at 12:39, Bhaskar wrote: > >> > >> > Hi, > >> > > >> > I am new Lucene Auto suggest. > >> > Could you please some share the lucene auto suggest sample > >> > applications/code.. > >> > > >> > My use case is: > >> > I have the data in the database. I would like write some auto suggest > on > >> > the data base data. > >> > > >> > i.e. we have some text box in UI. when user is trying to enter some > >> thing > >> > we have to auto suggest based on the user input. > >> > > >> > Thanks in advance for help. > >> > > >> > -- > >> > Keep Smiling > >> > Thanks & Regards > >> > Bhaskar. > >> > Mobile:9866724142 > >> > > >> > >> > >> > >> -- > >> -- > >> > >> Benedetti Alessandro > >> Visiting card : http://about.me/alessandro_benedetti > >> > >> "Tyger, tyger burning bright > >> In the forests of the night, > >> What immortal hand or eye > >> Could frame thy fearful symmetry?" > >> > >> William Blake - Songs of Experience -1794 England > >> > > > > > > > > -- > > Keep Smiling > > Thanks & Regards > > Bhaskar. > > Mobile:9866724142 > > > > > > -- > Keep Smiling > Thanks & Regards > Bhaskar. > Mobile:9866724142 >
Re: does field cache support multivalue?
Yes, according to Lucene in Action book, you cannot use field cache in such situations. MW On Fri, Nov 20, 2015 at 8:41 AM, Yonghui Zhao wrote: > If I index one filed more than 1 times, it seems I can't get all values > from lucene field cache? > > right? >
Re: one large index vs many small indexes
Hi, many small indexes seem more reasonable and much more efficient than one common large index for all customers. I recommend a very good book Lucene in Action - just reading a first few chapters (indexing & searching) will give you a very good idea about Lucene internals, index structure and why separate indexes will be much more efficient. Regards, Michael Wilkowski On Wed, Nov 11, 2015 at 9:40 AM, Sascha Janz wrote: > hello, > > we must make a design decision for our system. we have many customers wich > all should use the same server. now we are thinking about to make a > separate lucene index for each customer, or to make one large index and use > a filter for each customer. > > any suggestions, comments or expierences on that? > > greetings > Sascha > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >