[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields

2018-10-22 Thread Christopher Ball (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963
 ] 

Christopher Ball edited comment on SOLR-12884 at 10/22/18 11:14 PM:


Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions - for example the following expression 
provides a frequency table for a numeric field:
{code:java}
let (a=search(MyCollection, q=":", fl="myWordCount_l", fq="myWordCount_l:[0 TO 
*]",  rows=1000, sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b)){code}
With the addition of a filter function (either an exponential function or just 
a list of step points), it would be on par with the data being provided from 
Luke. 

@[~joel.bernstein] - thoughts?


was (Author: christopherball):
Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions - for example the following expression 
provides a frequency table for a numeric field:
{code:java}
let (a=search(MyCollection, q=":", fl="myWordCount_l", fq="myWordCount_l:[0 TO 
*]",  rows=1000, sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b)){code}
With the addition of a filter function (either an exponential function or just 
a list of step points), it would be on par with the data being provided from 
Luke. 

@[~joel.bernstein] - thoughts?

> Admin UI, admin/luke and *Point fields
> --
>
> Key: SOLR-12884
> URL: https://issues.apache.org/jira/browse/SOLR-12884
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0)
>Reporter: Erick Erickson
>Priority: Major
>
> One of the conference attendees noted that you go to the schema browser and 
> click on, say, a pint field, then click "load term info", nothing is shown.
> admin/luke similarly doesn't show much interesting, here's the response for a 
> pint .vs. a tint field:
> "popularity":\{ "type":"pint", "schema":"I-SD-OF--"},
> "popularityt":{ "type":"tint", "schema":"I-S--OF--",
>                        "index":"-TS--", "docs":15},
>  
> What, if anything, should we do in these two cases? Since  the points-based 
> numerics don't have terms like Trie* fields, I don't think we _can_ show much 
> more so the above makes sense, it's just jarring to end users and looks like 
> a bug.
> WDYT about putting in some useful information though. Say for the Admin UI 
> for points-based "terms cannot be shown for points-based fields" or some such?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields

2018-10-22 Thread Christopher Ball (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963
 ] 

Christopher Ball edited comment on SOLR-12884 at 10/22/18 11:13 PM:


Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions - for example the following expression 
provides a frequency table for a numeric field:
{code:java}
let (a=search(MyCollection, q=":", fl="myWordCount_l", fq="myWordCount_l:[0 TO 
*]",  rows=1000, sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b)){code}
With the addition of a filter function (either an exponential function or just 
a list of step points), it would be on par with the data being provided from 
Luke. 

@[~joel.bernstein] - thoughts?


was (Author: christopherball):
Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions - for example the following expression 
provides a frequency table for a numeric field:

let (a=search(MyCollection,
 q="*:*",
 fl="myWordCount_l",
 fq="myWordCount_l:[0 TO *]",
 rows=1000,
 sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b))

With the addition of a filter function (either an exponential function or just 
a list of step points), it would be on par with the data being provided from 
Luke. 

@[~joel.bernstein] - thoughts?

> Admin UI, admin/luke and *Point fields
> --
>
> Key: SOLR-12884
> URL: https://issues.apache.org/jira/browse/SOLR-12884
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0)
>Reporter: Erick Erickson
>Priority: Major
>
> One of the conference attendees noted that you go to the schema browser and 
> click on, say, a pint field, then click "load term info", nothing is shown.
> admin/luke similarly doesn't show much interesting, here's the response for a 
> pint .vs. a tint field:
> "popularity":\{ "type":"pint", "schema":"I-SD-OF--"},
> "popularityt":{ "type":"tint", "schema":"I-S--OF--",
>                        "index":"-TS--", "docs":15},
>  
> What, if anything, should we do in these two cases? Since  the points-based 
> numerics don't have terms like Trie* fields, I don't think we _can_ show much 
> more so the above makes sense, it's just jarring to end users and looks like 
> a bug.
> WDYT about putting in some useful information though. Say for the Admin UI 
> for points-based "terms cannot be shown for points-based fields" or some such?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields

2018-10-22 Thread Christopher Ball (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963
 ] 

Christopher Ball edited comment on SOLR-12884 at 10/22/18 3:22 PM:
---

Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions - for example the following expression 
provides a frequency table for a numeric field:

let (a=search(MyCollection,
 q="*:*",
 fl="myWordCount_l",
 fq="myWordCount_l:[0 TO *]",
 rows=1000,
 sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b))

With the addition of a filter function (either an exponential function or just 
a list of step points), it would be on par with the data being provided from 
Luke. 

@[~joel.bernstein] - thoughts?


was (Author: christopherball):
Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions . . . For example the following 
expression provides a frequency table for a numeric field:

let (a=search(MyCollection,
 q="*:*",
 fl="myWordCount_l",
 fq="myWordCount_l:[0 TO *]",
 rows=1000,
 sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b))

With the addition of a filter function (either an exponential function or just 
a list of step points), it would be on par with the data being provided from 
Luke.

> Admin UI, admin/luke and *Point fields
> --
>
> Key: SOLR-12884
> URL: https://issues.apache.org/jira/browse/SOLR-12884
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0)
>Reporter: Erick Erickson
>Priority: Major
>
> One of the conference attendees noted that you go to the schema browser and 
> click on, say, a pint field, then click "load term info", nothing is shown.
> admin/luke similarly doesn't show much interesting, here's the response for a 
> pint .vs. a tint field:
> "popularity":\{ "type":"pint", "schema":"I-SD-OF--"},
> "popularityt":{ "type":"tint", "schema":"I-S--OF--",
>                        "index":"-TS--", "docs":15},
>  
> What, if anything, should we do in these two cases? Since  the points-based 
> numerics don't have terms like Trie* fields, I don't think we _can_ show much 
> more so the above makes sense, it's just jarring to end users and looks like 
> a bug.
> WDYT about putting in some useful information though. Say for the Admin UI 
> for points-based "terms cannot be shown for points-based fields" or some such?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-12884) Admin UI, admin/luke and *Point fields

2018-10-20 Thread Christopher Ball (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963
 ] 

Christopher Ball edited comment on SOLR-12884 at 10/20/18 9:58 PM:
---

Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions . . . For example the following 
expression provides a frequency table for a numeric field:

let (a=search(MyCollection,
 q="*:*",
 fl="myWordCount_l",
 fq="myWordCount_l:[0 TO *]",
 rows=1000,
 sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b))

With the addition of a filter function (either an exponential function or just 
a list of step points), it would be on par with the data being provided from 
Luke.


was (Author: christopherball):
Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions . . . For example the following 
expression would provides a frequency table for a numeric field:

let (a=search(MyCollection,
 q="*:*",
 fl="myWordCount_l",
 fq="myWordCount_l:[0 TO *]",
 rows=1000,
 sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b))

With the addition of an exponential function as a filter, it would be on par 
with the data being provided from Luke.

> Admin UI, admin/luke and *Point fields
> --
>
> Key: SOLR-12884
> URL: https://issues.apache.org/jira/browse/SOLR-12884
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0)
>Reporter: Erick Erickson
>Priority: Major
>
> One of the conference attendees noted that you go to the schema browser and 
> click on, say, a pint field, then click "load term info", nothing is shown.
> admin/luke similarly doesn't show much interesting, here's the response for a 
> pint .vs. a tint field:
> "popularity":\{ "type":"pint", "schema":"I-SD-OF--"},
> "popularityt":{ "type":"tint", "schema":"I-S--OF--",
>                        "index":"-TS--", "docs":15},
>  
> What, if anything, should we do in these two cases? Since  the points-based 
> numerics don't have terms like Trie* fields, I don't think we _can_ show much 
> more so the above makes sense, it's just jarring to end users and looks like 
> a bug.
> WDYT about putting in some useful information though. Say for the Admin UI 
> for points-based "terms cannot be shown for points-based fields" or some such?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12884) Admin UI, admin/luke and *Point fields

2018-10-20 Thread Christopher Ball (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657963#comment-16657963
 ] 

Christopher Ball commented on SOLR-12884:
-

Could this be an opportunity . . . for Solr to eat its own dog food?

How about using Streaming Expressions . . . For example the following 
expression would provides a frequency table for a numeric field:

let (a=search(MyCollection,
 q="*:*",
 fl="myWordCount_l",
 fq="myWordCount_l:[0 TO *]",
 rows=1000,
 sort="myWordCount_l asc"),
 b=col(a, myWordCount_l),
 c=freqTable(b))

With the addition of an exponential function as a filter, it would be on par 
with the data being provided from Luke.

> Admin UI, admin/luke and *Point fields
> --
>
> Key: SOLR-12884
> URL: https://issues.apache.org/jira/browse/SOLR-12884
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (8.0)
>Reporter: Erick Erickson
>Priority: Major
>
> One of the conference attendees noted that you go to the schema browser and 
> click on, say, a pint field, then click "load term info", nothing is shown.
> admin/luke similarly doesn't show much interesting, here's the response for a 
> pint .vs. a tint field:
> "popularity":\{ "type":"pint", "schema":"I-SD-OF--"},
> "popularityt":{ "type":"tint", "schema":"I-S--OF--",
>                        "index":"-TS--", "docs":15},
>  
> What, if anything, should we do in these two cases? Since  the points-based 
> numerics don't have terms like Trie* fields, I don't think we _can_ show much 
> more so the above makes sense, it's just jarring to end users and looks like 
> a bug.
> WDYT about putting in some useful information though. Say for the Admin UI 
> for points-based "terms cannot be shown for points-based fields" or some such?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2272) Join

2012-08-27 Thread Christopher Ball (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442481#comment-13442481
 ] 

Christopher Ball commented on SOLR-2272:


This does not appear to support use with a delete query . . .

For example, the following does not work: 

http://localhost:8984/solr/myMusic/update?stream.body=deletequery{!join 
from=artist_name to=artist_name 
fromIndex=MusicBrainz}*:*/query/deletecommit=true



 Join
 

 Key: SOLR-2272
 URL: https://issues.apache.org/jira/browse/SOLR-2272
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: 4.0-ALPHA

 Attachments: SOLR-2272.patch, SOLR-2272.patch, SOLR-2272.patch


 Limited join functionality for Solr, mapping one set of IDs matching a query 
 to another set of IDs, based on the indexed tokens of the fields.
 Example:
 fq={!join  from=parent_ptr to:parent_id}child_doc:query

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr

2011-09-23 Thread Christopher Ball (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113611#comment-13113611
 ] 

Christopher Ball commented on LUCENE-3435:
--

Grant - Great start =)

Below is some initial feedback (happy to help further if you want to chat in 
real-time) 

*Quickly Groking* - To make it easier to quickly comprehend, the cells that are 
to be updated in the spreadsheet should be color coded (as opposed to those 
that are calculated)  

*Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and 
documentCache as 512 which implies the size is based on bytes when in fact the 
units of the cache are entries - I would clarify this in the spreadsheet as I 
have seen numerous blogs and emails confuse this.

*Approach to Cache Sizing* - Given memory requirements are heavily contingent 
on caching I would suggest including at least one approach for how to determine 
cache size

* Query Result Cache
** Estimation: should be greater than 'number of commonly reoccurring unique 
queries' x 'number of sort parameters' x 'number of possible sort orders' 
* Document Cache
** Estimation: should be greater than 'maximum number of documents per query' x 
'maximum number of concurrent queries'
* Filter Cache
** Estimation: should be number of unique filter queries (should clarify what 
constitutes 'unique')
* Field Value Cache
** Estimation: should be ?
* Custom Caches
** Estimation: should be ? - A common use case?

*Faceting* - Surprised there is no reference to use of faceting which is 
increasingly common default query functionality would further increase memory 
requirements for effective use

*Obscure Metrics* - To really give this spreadsheet some teeth, there really 
should be pointers for at least one approach on how to estimate each input 
metric (could be on another tab). 

* Some are fairly easy: 
** Number of Unique Terms / field
** Number of documents
** Number of indexed fields (no norms)
** Number of fields w/ norms
** Number of non-String Sort Fields other than score
** Number of String Sort Fields
** Number of deleted docs on avg
** Avg. number of terms per query

* Some are quite obscure (and guidance on how to estimate is essential):
** Numberof RAM-based Column Stride Fields (DocValues)
** ramBufferSizeMB
** Transient Factor (MB)
** fieldValueCache Max Size
** Custom Cache Size (MB)
** Avg. Number of Bytes per Term
** Bytes/Term
** Field Cache bits/term
** Cache Key Avg. Size (Bytes)
** Avg QueryResultKey size (in bytes)

 Create a Size Estimator model for Lucene and Solr
 -

 Key: LUCENE-3435
 URL: https://issues.apache.org/jira/browse/LUCENE-3435
 Project: Lucene - Java
  Issue Type: Task
  Components: core/other
Affects Versions: 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor

 It is often handy to be able to estimate the amount of memory and disk space 
 that both Lucene and Solr use, given certain assumptions.  I intend to check 
 in an Excel spreadsheet that allows people to estimate memory and disk usage 
 for trunk.  I propose to put it under dev-tools, as I don't think it should 
 be official documentation just yet and like the IDE stuff, we'll see how well 
 it gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr

2011-09-23 Thread Christopher Ball (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113611#comment-13113611
 ] 

Christopher Ball edited comment on LUCENE-3435 at 9/24/11 2:15 AM:
---

Grant - Great start =)

Below is some initial feedback (happy to help further if you want to chat in 
real-time) 

*Quickly Groking* - To make it easier to quickly comprehend, the cells that are 
to be updated in the spreadsheet should be color coded (as opposed to those 
that are calculated)  

*Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and 
documentCache as 512 which subtle implies the size is based on bytes when the 
units of the cache are actually the number of entries. I would clarify the unit 
of measure (I've seen numerous blogs and emails confuse this).

*Approach to Cache Sizing* - Given memory requirements are heavily contingent 
on caching I would suggest including at least one approach for how to determine 
cache size

* Query Result Cache
** Estimation: should be greater than 'number of commonly reoccurring unique 
queries' x 'number of sort parameters' x 'number of possible sort orders' 
* Document Cache
** Estimation: should be greater than 'maximum number of documents per query' x 
'maximum number of concurrent queries'
* Filter Cache
** Estimation: should be number of unique filter queries (should clarify what 
constitutes 'unique')
* Field Value Cache
** Estimation: should be ?
* Custom Caches
** Estimation: should be ? - A common use case?

*Faceting* - Surprised there is no reference to use of faceting which is both 
increasingly common default query functionality and further increases memory 
requirements for effective use

*Obscure Metrics* - To really give this spreadsheet some teeth, there really 
should be pointers for at least one approach on how to estimate each input 
metric (could be on another tab). 

* Some are fairly easy: 
** Number of Unique Terms / field
** Number of documents
** Number of indexed fields (no norms)
** Number of fields w/ norms
** Number of non-String Sort Fields other than score
** Number of String Sort Fields
** Number of deleted docs on avg
** Avg. number of terms per query

* Some are quite obscure (and guidance on how to estimate is essential):
** Numberof RAM-based Column Stride Fields (DocValues)
** ramBufferSizeMB
** Transient Factor (MB)
** fieldValueCache Max Size
** Custom Cache Size (MB)
** Avg. Number of Bytes per Term
** Bytes/Term
** Field Cache bits/term
** Cache Key Avg. Size (Bytes)
** Avg QueryResultKey size (in bytes)

  was (Author: christopherball):
Grant - Great start =)

Below is some initial feedback (happy to help further if you want to chat in 
real-time) 

*Quickly Groking* - To make it easier to quickly comprehend, the cells that are 
to be updated in the spreadsheet should be color coded (as opposed to those 
that are calculated)  

*Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and 
documentCache as 512 which implies the size is based on bytes when in fact the 
units of the cache are entries - I would clarify this in the spreadsheet as I 
have seen numerous blogs and emails confuse this.

*Approach to Cache Sizing* - Given memory requirements are heavily contingent 
on caching I would suggest including at least one approach for how to determine 
cache size

* Query Result Cache
** Estimation: should be greater than 'number of commonly reoccurring unique 
queries' x 'number of sort parameters' x 'number of possible sort orders' 
* Document Cache
** Estimation: should be greater than 'maximum number of documents per query' x 
'maximum number of concurrent queries'
* Filter Cache
** Estimation: should be number of unique filter queries (should clarify what 
constitutes 'unique')
* Field Value Cache
** Estimation: should be ?
* Custom Caches
** Estimation: should be ? - A common use case?

*Faceting* - Surprised there is no reference to use of faceting which is 
increasingly common default query functionality would further increase memory 
requirements for effective use

*Obscure Metrics* - To really give this spreadsheet some teeth, there really 
should be pointers for at least one approach on how to estimate each input 
metric (could be on another tab). 

* Some are fairly easy: 
** Number of Unique Terms / field
** Number of documents
** Number of indexed fields (no norms)
** Number of fields w/ norms
** Number of non-String Sort Fields other than score
** Number of String Sort Fields
** Number of deleted docs on avg
** Avg. number of terms per query

* Some are quite obscure (and guidance on how to estimate is essential):
** Numberof RAM-based Column Stride Fields (DocValues)
** ramBufferSizeMB
** Transient Factor (MB)
** fieldValueCache Max Size
** Custom Cache Size (MB)
** Avg. Number of Bytes per Term
** Bytes/Term
** Field Cache bits/term
** Cache Key