subject:"\[jira\] Commented\: \(LUCENE\-2308\) Separately specify a field's type"


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844722#action_12844722
 ] 

Yonik Seeley commented on LUCENE-2308:
--

Of course... given that Fieldable is an interface, one could create an 
implementation that just delegated all the calls like omitNorms to a shared 
instance, except for the value part.  Add a getAnalyzer() method to Fieldable, 
and it's the same thing in the end?

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844720#action_12844720
 ] 

Chris Male commented on LUCENE-2308:


{quote}
I will, if I can (provided the FieldType does not contain the field name). That 
shouldn't have anything to do with immutability though.
{quote}

Yeah the field name will stay inside the Field.  To me the reuse issue relates 
immutability in that a change to a property in one FieldType after construction 
means the change effects all the Fields that use that type.  

But as you say, if we document that its best to set everything at instantiation 
and that whatever happens after that is undefined, then I imagine it'll be fine.

{quote}
new Field instances should be fine - it's not really my use case anyway. But 
we're designing for the 1000's of use cases that are out there and we should be 
careful about adding new constraints.
{quote}

Yeah I appreciate that this API will be used in lots of different ways.  Baby 
steps as Mike said :)  But to answer your question, yes the flexibility will 
remain.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844716#action_12844716
 ] 

Yonik Seeley commented on LUCENE-2308:
--

bq. I'm really unsure about this if people are going to be using a FieldType 
instance with multiple Fields.

I will, if I can (provided the FieldType does not contain the field name).  
That shouldn't have anything to do with immutability though.

bq. Are you wanting to be able to reuse the same Field instance in both 
documents while defining separate FieldTypes? Or is creating new Field 
instances okay?

new Field instances should be fine - it's not really my use case anyway.  But 
we're designing for the 1000's of use cases that are out there and we should be 
careful about adding new constraints.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844710#action_12844710
 ] 

Chris Male commented on LUCENE-2308:


{quote}
I'm not sure if strict immutability is necessary - there's everything in 
between too.
One can simply say that all changes should be made before first use, and after 
that point it's undefined.
{quote}

I'm really unsure about this if people are going to be using a FieldType 
instance with multiple Fields.  Perhaps this really is just an edge case.

{quote}
Unrelated question: I assume that this would retain the same flexibility as we 
have today... the ability to change FieldType for field "foo" from one document 
to the next?
{quote}

Are you wanting to be able to reuse the same Field instance in both documents 
while defining separate FieldTypes? Or is creating new Field instances okay?

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844707#action_12844707
 ] 

Yonik Seeley commented on LUCENE-2308:
--

I'm not sure if strict immutability is necessary - there's everything in 
between too.
One can simply say that all changes should be made before first use, and after 
that point it's undefined.

Unrelated question: I assume that this would retain the same flexibility as we 
have today... the ability to change FieldType for field "foo" from one document 
to the next?

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844702#action_12844702
 ] 

Chris Male commented on LUCENE-2308:


{quote}
It would be nice if we could do something similar to IndexWriterConfig
(LUCENE-2294), where you use incremental ctor/setters to set up the
configuration but then once it's used ("bound" to a Field), it's
immutable.
{quote}

Yeah we could use something like a FieldTypeBuilder which could provide a fluid 
interface for specifying each property, which then get built into an immutable 
FieldType at the end.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844700#action_12844700
 ] 

Yonik Seeley commented on LUCENE-2308:
--

For the non-expert user, it's just a label and won't have much meaning 
regardless of what it's called, and they will need to consult the docs.  Of 
course, if one starts to dig deeper, "norms" actually does have a physical 
meaning in the index, so preferring a label with "norms" in it seems completely 
reasonable.

There's also history to consider - when you change the name of something, you 
cut the link to the past in search engines, and in the memories of many 
developers.

As it relates to Solr - I don't care so much since it makes sense for the Solr 
schema to isolate these changes and stick with "omitNorms" regardless.


> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type

2010-03-12 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844690#action_12844690
 ] 

Earwin Burrfoot commented on LUCENE-2308:
-

I'm strongly against names like 'matchOnly'. They are perfectly fine in some 
'schema' layer over Lucene, but here, in lowlevel guts, I'd prefer names that 
clearly state what the hell do they do, without forcing me to consult 
javadocs/code.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844688#action_12844688
 ] 

Marvin Humphrey commented on LUCENE-2308:
-

> Also creating a FieldType with args like
> new FieldType(true, false, false) isn't really readable. 

Agreed Another option would be a "flags" integer and bitwise constants:

{code}
FieldType type = new FieldType(analyzer, FieldType.INDEXED | FieldType.STORED);
{code}

> It would be nice if we could do something similar to IndexWriterConfig
> (LUCENE-2294), where you use incremental ctor/setters to set up the
> configuration but then once it's used ("bound" to a Field), it's
> immutable.

I bet that'll be more popular than flags, but I thought it was worth
bringing it up anyway. :)

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type

2010-03-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844684#action_12844684
 ] 

Michael McCandless commented on LUCENE-2308:


Hmm one challenge with making FieldType immutable is we don't want
a zillion ctors over time.  Also creating a FieldType with args like
new FieldType(true, false, false) isn't really readable.

It would be nice if we could do something similar to IndexWriterConfig
(LUCENE-2294), where you use incremental ctor/setters to set up the
configuration but then once it's used ("bound" to a Field), it's
immutable.

I'm torn on naming: yes, search-oriented names like "matchOnly" is
tempting, but then we really should tease apart termFreq and positions
(they are stuck together now with omitTFAP).  And the two are not
fully independent as Marvin noted -- so maybe we use a cryptic enum
(DOCS, DOCS_TERM_FREQ, DOCS_TERM_FREQ_POSITIONS)?  If we can only find
better names...

I'm not sure we can/should find better index-time names.  What is
stored in the index is relatively independent from how/whether
searches make use of it.  EG if you store termFreq (but not positions)
you can still do match only searching, or, you can do full scoring of
the query.  You can't use positional queries.


> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-2308) Separately specify a field's type

2010-03-12 Thread Marvin Humphrey

On Fri, Mar 12, 2010 at 03:01:27PM -0500, Mark Miller wrote:
> Committers are competant in different areas of the code.  Even mike  
> wasn't big into the search side until per segment.  Commiters are  
> trusted to mess with the pieces they know.

Absolutely.  I wouldn't expect every committer to undertand the gory details
of posting formats, and I've been a little caught off guard by the blowback
from what I thought was an inoccuous observation.

But by the same token, I wouldn't expect our users to have sufficient
expertise to understand all the variants of omit*() either.  This stuff
oughtta be implementation details.

> I don't see anyone even remotely suggesting that users should have to  
> understand all of the implications of posting format modifications.

That's what omitTFAP() and omitNorms() do, though.  And as Mike pointed out in
the "baby steps" thread, omitTFAP() is often misunderstood.

Marvin Humphrey

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844661#action_12844661
 ] 

Chris Male commented on LUCENE-2308:


What I covered with Mike earlier was whether FieldType methods would be 
immutable or not.  

If they are, which seems a good idea, then everything will be enabled/disabled 
in the construction of the FieldType so we would only need to support property 
getter methods.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844659#action_12844659
 ] 

Marvin Humphrey commented on LUCENE-2308:
-

I'm simply suggesting that the proposed API is too hard to understand.  

Most users know whether their fields can be "match-only" but have no idea what
TFAP is.  And even advanced users will have difficulty understanding all the
implications for matching and scoring when they selectively disable portions
of the posting format.

I'm not a fan of omitTFAP, omitTF, omitNorms, omitPositions, or omit(flags).
Something that ordinary users can grok would be used more often and more
effectively.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844653#action_12844653
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
If you disable term freq, you also have to disable positions. The "freq"
tells you how many positions there are. 
{quote}

Marvin: as stated, we would have to actually implement this.
There's an issue open for it too: LUCENE-2048.
I was just discussing this with someone the other day.

{quote}
I think it's asking an awful lot of our users to require that they understand
all the implications of posting format modifications when committers
have difficulty mastering all the subtleties.
{quote}

I don't know what I did to piss you off, but I just thought it would be nice
for completeness, to mention that this feature is still open and its
something we should think about.


> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-2308) Separately specify a field's type

2010-03-12 Thread Mark Miller

Committers are competant in different areas of the code. Even mike
wasn't big into the search side until per segment. Commiters are
trusted to mess with the pieces they know.

I don't see anyone even remotely suggesting that users should have to
understand all of the implications of posting format modifications.

Just sounds like a nasty jab to me.

- Mark

http://www.lucidimagination.com

On Mar 12, 2010, at 2:43 PM, "Marvin Humphrey (JIRA)"
wrote:

[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844637#action_12844637
]

Marvin Humphrey commented on LUCENE-2308:
-

If you disable term freq, you also have to disable positions. The
"freq"

tells you how many positions there are.

I think it's asking an awful lot of our users to require that they
understand

all the implications of posting format modifications when committers
have difficulty mastering all the subtleties.

Separately specify a field's type
-

Key: LUCENE-2308
URL: https://issues.apache.org/jira/browse/LUCENE-2308
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless

This came up from dicussions on IRC. I'm summarizing here...
Today when you make a Field to add to a document you can set things
index or not, stored or not, analyzed or not, details like omitTfAP,
omitNorms, index term vectors (separately controlling
offsets/positions), etc.
I think we should factor these out into a new class (FieldType?).
Then you could re-use this FieldType instance across multiple fields.
The Field instance would still hold the actual value.
We could then do per-field analyzers by adding a setAnalyzer on the
FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
for per-field codecs (with flex), where we now have
PerFieldCodecWrapper).
This would NOT be a schema! It's just refactoring what we already
specify today. EG it's not serialized into the index.
This has been discussed before, and I know Michael Busch opened a
more
ambitious (I think?) issue. I think this is a good first baby
step. We could
consider a hierarchy of FIeldType (NumericFieldType, etc.) but
maybe hold

off on that for starters...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844637#action_12844637
 ] 

Marvin Humphrey commented on LUCENE-2308:
-

If you disable term freq, you also have to disable positions.  The "freq" 
tells you how many positions there are.

I think it's asking an awful lot of our users to require that they understand
all the implications of posting format modifications when committers 
have difficulty mastering all the subtleties.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844630#action_12844630
 ] 

Robert Muir commented on LUCENE-2308:
-

Just also to mention (probably too much for this one issue)!

I think it would be nice of OmitTF was separately selectable 
from OmitPositions, as Shai implied. We would have to
actually implement this though I think!


> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type

2010-03-12 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844629#action_12844629
 ] 

Shai Erera commented on LUCENE-2308:


How about enable(TYPE/FEATURE) and corresponding disable? So Type/Feature will 
have NORMS, TF, POSITIONS and calls would look like:
f.enable(Type.NORMS), f.disable(Type.TF)?

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844626#action_12844626
 ] 

Marvin Humphrey commented on LUCENE-2308:
-

I think we might consider matchOnly() instead of omitNorms().  If a field is
"match only", we don't need boost bytes a.k.a. "norms" because they are only
used as a scoring multiplier.

Haven't got a good synonym for "omitTFAP", but I'd sure like one.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-2308) Separately specify a field's type

2010-03-12 Thread Erick Erickson

Congrats Chris!

I vote for thinkAboutNotIncludingNormsMaybe(true|false) .

Seriously double negatives are ugly IMO, +1 for changing

Erick

On Fri, Mar 12, 2010 at 12:56 PM, Chris Male (JIRA)  wrote:

>
>[
> https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844587#action_12844587]
>
> Chris Male commented on LUCENE-2308:
> 
>
> I agree entirely.  This is definitely the moment to remove any ambiguity or
> confusion in this API.  I'll make sure to incorporate this idea.
>
> > Separately specify a field's type
> > -
> >
> > Key: LUCENE-2308
> > URL: https://issues.apache.org/jira/browse/LUCENE-2308
> > Project: Lucene - Java
> >  Issue Type: Improvement
> >  Components: Index
> >Reporter: Michael McCandless
> >
> > This came up from dicussions on IRC.  I'm summarizing here...
> > Today when you make a Field to add to a document you can set things
> > index or not, stored or not, analyzed or not, details like omitTfAP,
> > omitNorms, index term vectors (separately controlling
> > offsets/positions), etc.
> > I think we should factor these out into a new class (FieldType?).
> > Then you could re-use this FieldType instance across multiple fields.
> > The Field instance would still hold the actual value.
> > We could then do per-field analyzers by adding a setAnalyzer on the
> > FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> > for per-field codecs (with flex), where we now have
> > PerFieldCodecWrapper).
> > This would NOT be a schema!  It's just refactoring what we already
> > specify today.  EG it's not serialized into the index.
> > This has been discussed before, and I know Michael Busch opened a more
> > ambitious (I think?) issue.  I think this is a good first baby step.  We
> could
> > consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> > off on that for starters...
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844587#action_12844587
 ] 

Chris Male commented on LUCENE-2308:


I agree entirely.  This is definitely the moment to remove any ambiguity or 
confusion in this API.  I'll make sure to incorporate this idea.

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844585#action_12844585
 ] 

Robert Muir commented on LUCENE-2308:
-

bq. So you are thinking more along the lines indexNorms(true|false)? 

or whatever you come up with, that doesn't create double-negatives!
but yeah, i think something like that is a little easier... no big deal 
just figured I would bring it up if this stuff was getting refactored anyway

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type


[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844579#action_12844579
 ] 

Chris Male commented on LUCENE-2308:


So you are thinking more along the lines indexNorms(true|false)?

> Separately specify a field's type
> -
>
> Key: LUCENE-2308
> URL: https://issues.apache.org/jira/browse/LUCENE-2308
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2308) Separately specify a field's type