[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields
[ https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963 ] Christopher Ball edited comment on SOLR-12884 at 10/22/18 11:14 PM: Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions - for example the following expression provides a frequency table for a numeric field: {code:java} let (a=search(MyCollection, q=":", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)){code} With the addition of a filter function (either an exponential function or just a list of step points), it would be on par with the data being provided from Luke. @[~joel.bernstein] - thoughts? was (Author: christopherball): Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions - for example the following expression provides a frequency table for a numeric field: {code:java} let (a=search(MyCollection, q=":", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)){code} With the addition of a filter function (either an exponential function or just a list of step points), it would be on par with the data being provided from Luke. @[~joel.bernstein] - thoughts? > Admin UI, admin/luke and *Point fields > -- > > Key: SOLR-12884 > URL: https://issues.apache.org/jira/browse/SOLR-12884 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0) >Reporter: Erick Erickson >Priority: Major > > One of the conference attendees noted that you go to the schema browser and > click on, say, a pint field, then click "load term info", nothing is shown. > admin/luke similarly doesn't show much interesting, here's the response for a > pint .vs. a tint field: > "popularity":\{ "type":"pint", "schema":"I-SD-OF--"}, > "popularityt":{ "type":"tint", "schema":"I-S--OF--", > "index":"-TS--", "docs":15}, > > What, if anything, should we do in these two cases? Since the points-based > numerics don't have terms like Trie* fields, I don't think we _can_ show much > more so the above makes sense, it's just jarring to end users and looks like > a bug. > WDYT about putting in some useful information though. Say for the Admin UI > for points-based "terms cannot be shown for points-based fields" or some such? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields
[ https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963 ] Christopher Ball edited comment on SOLR-12884 at 10/22/18 11:13 PM: Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions - for example the following expression provides a frequency table for a numeric field: {code:java} let (a=search(MyCollection, q=":", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)){code} With the addition of a filter function (either an exponential function or just a list of step points), it would be on par with the data being provided from Luke. @[~joel.bernstein] - thoughts? was (Author: christopherball): Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions - for example the following expression provides a frequency table for a numeric field: let (a=search(MyCollection, q="*:*", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)) With the addition of a filter function (either an exponential function or just a list of step points), it would be on par with the data being provided from Luke. @[~joel.bernstein] - thoughts? > Admin UI, admin/luke and *Point fields > -- > > Key: SOLR-12884 > URL: https://issues.apache.org/jira/browse/SOLR-12884 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0) >Reporter: Erick Erickson >Priority: Major > > One of the conference attendees noted that you go to the schema browser and > click on, say, a pint field, then click "load term info", nothing is shown. > admin/luke similarly doesn't show much interesting, here's the response for a > pint .vs. a tint field: > "popularity":\{ "type":"pint", "schema":"I-SD-OF--"}, > "popularityt":{ "type":"tint", "schema":"I-S--OF--", > "index":"-TS--", "docs":15}, > > What, if anything, should we do in these two cases? Since the points-based > numerics don't have terms like Trie* fields, I don't think we _can_ show much > more so the above makes sense, it's just jarring to end users and looks like > a bug. > WDYT about putting in some useful information though. Say for the Admin UI > for points-based "terms cannot be shown for points-based fields" or some such? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields
[ https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963 ] Christopher Ball edited comment on SOLR-12884 at 10/22/18 3:22 PM: --- Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions - for example the following expression provides a frequency table for a numeric field: let (a=search(MyCollection, q="*:*", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)) With the addition of a filter function (either an exponential function or just a list of step points), it would be on par with the data being provided from Luke. @[~joel.bernstein] - thoughts? was (Author: christopherball): Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions . . . For example the following expression provides a frequency table for a numeric field: let (a=search(MyCollection, q="*:*", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)) With the addition of a filter function (either an exponential function or just a list of step points), it would be on par with the data being provided from Luke. > Admin UI, admin/luke and *Point fields > -- > > Key: SOLR-12884 > URL: https://issues.apache.org/jira/browse/SOLR-12884 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0) >Reporter: Erick Erickson >Priority: Major > > One of the conference attendees noted that you go to the schema browser and > click on, say, a pint field, then click "load term info", nothing is shown. > admin/luke similarly doesn't show much interesting, here's the response for a > pint .vs. a tint field: > "popularity":\{ "type":"pint", "schema":"I-SD-OF--"}, > "popularityt":{ "type":"tint", "schema":"I-S--OF--", > "index":"-TS--", "docs":15}, > > What, if anything, should we do in these two cases? Since the points-based > numerics don't have terms like Trie* fields, I don't think we _can_ show much > more so the above makes sense, it's just jarring to end users and looks like > a bug. > WDYT about putting in some useful information though. Say for the Admin UI > for points-based "terms cannot be shown for points-based fields" or some such? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields
[ https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963 ] Christopher Ball edited comment on SOLR-12884 at 10/20/18 9:58 PM: --- Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions . . . For example the following expression provides a frequency table for a numeric field: let (a=search(MyCollection, q="*:*", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)) With the addition of a filter function (either an exponential function or just a list of step points), it would be on par with the data being provided from Luke. was (Author: christopherball): Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions . . . For example the following expression would provides a frequency table for a numeric field: let (a=search(MyCollection, q="*:*", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)) With the addition of an exponential function as a filter, it would be on par with the data being provided from Luke. > Admin UI, admin/luke and *Point fields > -- > > Key: SOLR-12884 > URL: https://issues.apache.org/jira/browse/SOLR-12884 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0) >Reporter: Erick Erickson >Priority: Major > > One of the conference attendees noted that you go to the schema browser and > click on, say, a pint field, then click "load term info", nothing is shown. > admin/luke similarly doesn't show much interesting, here's the response for a > pint .vs. a tint field: > "popularity":\{ "type":"pint", "schema":"I-SD-OF--"}, > "popularityt":{ "type":"tint", "schema":"I-S--OF--", > "index":"-TS--", "docs":15}, > > What, if anything, should we do in these two cases? Since the points-based > numerics don't have terms like Trie* fields, I don't think we _can_ show much > more so the above makes sense, it's just jarring to end users and looks like > a bug. > WDYT about putting in some useful information though. Say for the Admin UI > for points-based "terms cannot be shown for points-based fields" or some such? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12884) Admin UI, admin/luke and *Point fields
[ https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963 ] Christopher Ball commented on SOLR-12884: - Could this be an opportunity . . . for Solr to eat its own dog food? How about using Streaming Expressions . . . For example the following expression would provides a frequency table for a numeric field: let (a=search(MyCollection, q="*:*", fl="myWordCount_l", fq="myWordCount_l:[0 TO *]", rows=1000, sort="myWordCount_l asc"), b=col(a, myWordCount_l), c=freqTable(b)) With the addition of an exponential function as a filter, it would be on par with the data being provided from Luke. > Admin UI, admin/luke and *Point fields > -- > > Key: SOLR-12884 > URL: https://issues.apache.org/jira/browse/SOLR-12884 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (8.0) >Reporter: Erick Erickson >Priority: Major > > One of the conference attendees noted that you go to the schema browser and > click on, say, a pint field, then click "load term info", nothing is shown. > admin/luke similarly doesn't show much interesting, here's the response for a > pint .vs. a tint field: > "popularity":\{ "type":"pint", "schema":"I-SD-OF--"}, > "popularityt":{ "type":"tint", "schema":"I-S--OF--", > "index":"-TS--", "docs":15}, > > What, if anything, should we do in these two cases? Since the points-based > numerics don't have terms like Trie* fields, I don't think we _can_ show much > more so the above makes sense, it's just jarring to end users and looks like > a bug. > WDYT about putting in some useful information though. Say for the Admin UI > for points-based "terms cannot be shown for points-based fields" or some such? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2272) Join
[ https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442481#comment-13442481 ] Christopher Ball commented on SOLR-2272: This does not appear to support use with a delete query . . . For example, the following does not work: http://localhost:8984/solr/myMusic/update?stream.body=deletequery{!join from=artist_name to=artist_name fromIndex=MusicBrainz}*:*/query/deletecommit=true Join Key: SOLR-2272 URL: https://issues.apache.org/jira/browse/SOLR-2272 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: 4.0-ALPHA Attachments: SOLR-2272.patch, SOLR-2272.patch, SOLR-2272.patch Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields. Example: fq={!join from=parent_ptr to:parent_id}child_doc:query -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr
[ https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113611#comment-13113611 ] Christopher Ball commented on LUCENE-3435: -- Grant - Great start =) Below is some initial feedback (happy to help further if you want to chat in real-time) *Quickly Groking* - To make it easier to quickly comprehend, the cells that are to be updated in the spreadsheet should be color coded (as opposed to those that are calculated) *Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and documentCache as 512 which implies the size is based on bytes when in fact the units of the cache are entries - I would clarify this in the spreadsheet as I have seen numerous blogs and emails confuse this. *Approach to Cache Sizing* - Given memory requirements are heavily contingent on caching I would suggest including at least one approach for how to determine cache size * Query Result Cache ** Estimation: should be greater than 'number of commonly reoccurring unique queries' x 'number of sort parameters' x 'number of possible sort orders' * Document Cache ** Estimation: should be greater than 'maximum number of documents per query' x 'maximum number of concurrent queries' * Filter Cache ** Estimation: should be number of unique filter queries (should clarify what constitutes 'unique') * Field Value Cache ** Estimation: should be ? * Custom Caches ** Estimation: should be ? - A common use case? *Faceting* - Surprised there is no reference to use of faceting which is increasingly common default query functionality would further increase memory requirements for effective use *Obscure Metrics* - To really give this spreadsheet some teeth, there really should be pointers for at least one approach on how to estimate each input metric (could be on another tab). * Some are fairly easy: ** Number of Unique Terms / field ** Number of documents ** Number of indexed fields (no norms) ** Number of fields w/ norms ** Number of non-String Sort Fields other than score ** Number of String Sort Fields ** Number of deleted docs on avg ** Avg. number of terms per query * Some are quite obscure (and guidance on how to estimate is essential): ** Numberof RAM-based Column Stride Fields (DocValues) ** ramBufferSizeMB ** Transient Factor (MB) ** fieldValueCache Max Size ** Custom Cache Size (MB) ** Avg. Number of Bytes per Term ** Bytes/Term ** Field Cache bits/term ** Cache Key Avg. Size (Bytes) ** Avg QueryResultKey size (in bytes) Create a Size Estimator model for Lucene and Solr - Key: LUCENE-3435 URL: https://issues.apache.org/jira/browse/LUCENE-3435 Project: Lucene - Java Issue Type: Task Components: core/other Affects Versions: 4.0 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor It is often handy to be able to estimate the amount of memory and disk space that both Lucene and Solr use, given certain assumptions. I intend to check in an Excel spreadsheet that allows people to estimate memory and disk usage for trunk. I propose to put it under dev-tools, as I don't think it should be official documentation just yet and like the IDE stuff, we'll see how well it gets maintained. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr
[ https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113611#comment-13113611 ] Christopher Ball edited comment on LUCENE-3435 at 9/24/11 2:15 AM: --- Grant - Great start =) Below is some initial feedback (happy to help further if you want to chat in real-time) *Quickly Groking* - To make it easier to quickly comprehend, the cells that are to be updated in the spreadsheet should be color coded (as opposed to those that are calculated) *Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and documentCache as 512 which subtle implies the size is based on bytes when the units of the cache are actually the number of entries. I would clarify the unit of measure (I've seen numerous blogs and emails confuse this). *Approach to Cache Sizing* - Given memory requirements are heavily contingent on caching I would suggest including at least one approach for how to determine cache size * Query Result Cache ** Estimation: should be greater than 'number of commonly reoccurring unique queries' x 'number of sort parameters' x 'number of possible sort orders' * Document Cache ** Estimation: should be greater than 'maximum number of documents per query' x 'maximum number of concurrent queries' * Filter Cache ** Estimation: should be number of unique filter queries (should clarify what constitutes 'unique') * Field Value Cache ** Estimation: should be ? * Custom Caches ** Estimation: should be ? - A common use case? *Faceting* - Surprised there is no reference to use of faceting which is both increasingly common default query functionality and further increases memory requirements for effective use *Obscure Metrics* - To really give this spreadsheet some teeth, there really should be pointers for at least one approach on how to estimate each input metric (could be on another tab). * Some are fairly easy: ** Number of Unique Terms / field ** Number of documents ** Number of indexed fields (no norms) ** Number of fields w/ norms ** Number of non-String Sort Fields other than score ** Number of String Sort Fields ** Number of deleted docs on avg ** Avg. number of terms per query * Some are quite obscure (and guidance on how to estimate is essential): ** Numberof RAM-based Column Stride Fields (DocValues) ** ramBufferSizeMB ** Transient Factor (MB) ** fieldValueCache Max Size ** Custom Cache Size (MB) ** Avg. Number of Bytes per Term ** Bytes/Term ** Field Cache bits/term ** Cache Key Avg. Size (Bytes) ** Avg QueryResultKey size (in bytes) was (Author: christopherball): Grant - Great start =) Below is some initial feedback (happy to help further if you want to chat in real-time) *Quickly Groking* - To make it easier to quickly comprehend, the cells that are to be updated in the spreadsheet should be color coded (as opposed to those that are calculated) *Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and documentCache as 512 which implies the size is based on bytes when in fact the units of the cache are entries - I would clarify this in the spreadsheet as I have seen numerous blogs and emails confuse this. *Approach to Cache Sizing* - Given memory requirements are heavily contingent on caching I would suggest including at least one approach for how to determine cache size * Query Result Cache ** Estimation: should be greater than 'number of commonly reoccurring unique queries' x 'number of sort parameters' x 'number of possible sort orders' * Document Cache ** Estimation: should be greater than 'maximum number of documents per query' x 'maximum number of concurrent queries' * Filter Cache ** Estimation: should be number of unique filter queries (should clarify what constitutes 'unique') * Field Value Cache ** Estimation: should be ? * Custom Caches ** Estimation: should be ? - A common use case? *Faceting* - Surprised there is no reference to use of faceting which is increasingly common default query functionality would further increase memory requirements for effective use *Obscure Metrics* - To really give this spreadsheet some teeth, there really should be pointers for at least one approach on how to estimate each input metric (could be on another tab). * Some are fairly easy: ** Number of Unique Terms / field ** Number of documents ** Number of indexed fields (no norms) ** Number of fields w/ norms ** Number of non-String Sort Fields other than score ** Number of String Sort Fields ** Number of deleted docs on avg ** Avg. number of terms per query * Some are quite obscure (and guidance on how to estimate is essential): ** Numberof RAM-based Column Stride Fields (DocValues) ** ramBufferSizeMB ** Transient Factor (MB) ** fieldValueCache Max Size ** Custom Cache Size (MB) ** Avg. Number of Bytes per Term ** Bytes/Term ** Field Cache bits/term ** Cache Key