[jira] [Updated] (USERGRID-536) Change our index structure for static mapping and cleanup api

Todd Nine (JIRA) Thu, 02 Apr 2015 16:53:00 -0700

     [ 
https://issues.apache.org/jira/browse/USERGRID-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Nine updated USERGRID-536:
-------------------------------
    Description: 
Currently, our dynamic mapping causes several issues with elastic search.  We 
should change our mapping to use a static structure, and resolve this 
operational pain.

We need to make the following changes.

h2. Modify our IndexScope

This should more closely resemble the elements of an edge since this represents 
an edge. It will simplify the use of our query module and make development 
clearer.  This scope should be refactored into the following objects.  

* IndexEdge - Id, name, timestamp, edgeType (source or target)
* SearchEdge - Id, name, edgeType

Note: edgeType is the type of the Id within the edge.  Does this Id represent a 
source Id, or does it represent a targetId?  The entity to be indexed will 
implicitly be the opposite of the type specified.  I.E if it's a source edge, 
the document is the target.  If it's a target edge, the document is the source.

These values should also be stored within our document, so that we can index 
our documents.  Note that we perform bidirectional indexing in some cases, such 
was users, groups etc.  When we do this, we need to ensure that mark the 
direction of the edge appropriately.

h2. Change default sort ordering

When sorting is unspecified, we should order by timestamp descending from our 
index edge.  This ensures that we retain the correct edge time semantics, and 
will properly order collections and connections

h2. Remove the legacy query class

We don't need the Query class, it has far too many functions to be a well 
encapsulated object.  Instead, we should simply take the string QL, the 
SearchEdge and the limit to return our candidates.  From there, we should parse 
and visit the query internally to the query logic, NOT externally.

h2. Create a static mapping

The mapping should contains the following static fields.

* entityId - The entity id
* entityType - The entity type (from the id)
* entityVersion - The entity version
* edgeId - The edge Id
* edgeName - The edge name
* edgeTimestamp - The edge timestamp
* edgeType - source | target
* edgeSearch - edgeId + edgeName + edgeType


It will then contain an array of "fields"  Each of these fields will have the 
following formation.

{code}
{ "name":"[entity field name as a path]", "[field type]":[field value}
{code}



We will define a field type for each type of field.  Note that each field tuple 
will always contain a single field and a single value.  Possible field types 
are the following.

* string - This will be mapped into 2 mapping with multi mappings.  It will be 
a string unanalyzed, and an analyzed string.  The 2 fields will then be 
"string_u" and "string_a".  The Query visitor will need to update the field 
name appropriately
* long - An unanalyzed long
* double - An unanalyzed double
* boolean - An unanalyzed boolean
* location - A geolocation field

The entity path will be a flattened path from the root json element to the max 
json element.  It can be though of as a path through the tree of json elements. 
 We will use a dot '.' to delimit the fields.  X.Y.Z for nested objects.  
Primitive arrays will contain a field object for each element in the array.

h2. Indexing

  When indexing entities, we will no longer modify or prefix field names.  They 
will be inserted into the value exactly as their path appears after lower case.

h2. Querying

  When querying, the "contains" operation for a string will need to use the 
"string_a" data type.  When using =, we will need to use the string_u data 
type.  Each criteria will need to use nested object querying, to ensure the 
property name and property value are both part of the same field tuple.


h3. References

Multi Field Mapping: 
http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html
Nested Objects: 
http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html





  was:
Currently, our dynamic mapping causes several issues with elastic search.  We 
should change our mapping to use a static structure, and resolve this 
operational pain.

We need to make the following changes.

h2. Modify our IndexScope

This should more closely resemble the elements of an edge since this represents 
an edge. It will simplify the use of our query module and make development 
clearer.  This scope should be refactored into the following objects.  

* IndexEdge - Id, name, timestamp, edgeType (source or target)
* SearchEdge - Id, name, edgeType

Note: edgeType is the type of the Id within the edge.  Does this Id represent a 
source Id, or does it represent a targetId?  The entity to be indexed will 
implicitly be the opposite of the type specified.  I.E if it's a source edge, 
the document is the target.  If it's a target edge, the document is the source.

These values should also be stored within our document, so that we can index 
our documents.  Note that we perform bidirectional indexing in some cases, such 
was users, groups etc.  When we do this, we need to ensure that mark the 
direction of the edge appropriately.

h2. Change default sort ordering

When sorting is unspecified, we should order by timestamp descending from our 
index edge.  This ensures that we retain the correct edge time semantics, and 
will properly order collections and connections

h2. Remove the legacy query class

We don't need the Query class, it has far too many functions to be a well 
encapsulated object.  Instead, we should simply take the string QL, the 
SearchEdge and the limit to return our candidates.  From there, we should parse 
and visit the query internally to the query logic, NOT externally.

h2. Create a static mapping

The mapping should contains the following static fields.

* entityId - The entity id
* entityType - The entity type (from the id)
* entityVersion - The entity version
* edgeId - The edge Id
* edgeName - The edge name
* edgeTimestamp - The edge timestamp
* edgeType - source | target
* edgeSearch - edgeId + edgeName + edgeType


It will then contain an array of "fields"  Each of these fields will have the 
following formation.

{code}
{ "name":"[entity field name as a path]", "[field type]":[field value}
{code}



We will define a field type for each type of field.  Note that each field tuple 
will always contain a single field and a single value.  Possible field types 
are the following.

* string - This will be mapped into 2 mapping with multi mappings.  It will be 
a string unanalyzed, and an analyzed string.  The 2 fields will then be 
"string_u" and "string_a".  The Query visitor will need to update the field 
name appropriately
* long - An unanalyzed long
* double - An unanalyzed double
* boolean - An unanalyzed boolean

The entity path will be a flattened path from the root json element to the max 
json element.  It can be though of as a path through the tree of json elements. 
 We will use a dot '.' to delimit the fields.  X.Y.Z for nested objects.  
Primitive arrays will contain a field object for each element in the array.

h2. Indexing

  When indexing entities, we will no longer modify or prefix field names.  They 
will be inserted into the value exactly as their path appears after lower case.

h2. Querying

  When querying, the "contains" operation for a string will need to use the 
"string_a" data type.  When using =, we will need to use the string_u data 
type.  Each criteria will need to use nested object querying, to ensure the 
property name and property value are both part of the same field tuple.


h3. References

Multi Field Mapping: 
http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html
Nested Objects: 
http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html






> Change our index structure for static mapping and cleanup api
> -------------------------------------------------------------
>
>                 Key: USERGRID-536
>                 URL: https://issues.apache.org/jira/browse/USERGRID-536
>             Project: Usergrid
>          Issue Type: Story
>          Components: Stack
>            Reporter: Todd Nine
>            Assignee: Todd Nine
>
> Currently, our dynamic mapping causes several issues with elastic search.  We 
> should change our mapping to use a static structure, and resolve this 
> operational pain.
> We need to make the following changes.
> h2. Modify our IndexScope
> This should more closely resemble the elements of an edge since this 
> represents an edge. It will simplify the use of our query module and make 
> development clearer.  This scope should be refactored into the following 
> objects.  
> * IndexEdge - Id, name, timestamp, edgeType (source or target)
> * SearchEdge - Id, name, edgeType
> Note: edgeType is the type of the Id within the edge.  Does this Id represent 
> a source Id, or does it represent a targetId?  The entity to be indexed will 
> implicitly be the opposite of the type specified.  I.E if it's a source edge, 
> the document is the target.  If it's a target edge, the document is the 
> source.
> These values should also be stored within our document, so that we can index 
> our documents.  Note that we perform bidirectional indexing in some cases, 
> such was users, groups etc.  When we do this, we need to ensure that mark the 
> direction of the edge appropriately.
> h2. Change default sort ordering
> When sorting is unspecified, we should order by timestamp descending from our 
> index edge.  This ensures that we retain the correct edge time semantics, and 
> will properly order collections and connections
> h2. Remove the legacy query class
> We don't need the Query class, it has far too many functions to be a well 
> encapsulated object.  Instead, we should simply take the string QL, the 
> SearchEdge and the limit to return our candidates.  From there, we should 
> parse and visit the query internally to the query logic, NOT externally.
> h2. Create a static mapping
> The mapping should contains the following static fields.
> * entityId - The entity id
> * entityType - The entity type (from the id)
> * entityVersion - The entity version
> * edgeId - The edge Id
> * edgeName - The edge name
> * edgeTimestamp - The edge timestamp
> * edgeType - source | target
> * edgeSearch - edgeId + edgeName + edgeType
> It will then contain an array of "fields"  Each of these fields will have the 
> following formation.
> {code}
> { "name":"[entity field name as a path]", "[field type]":[field value}
> {code}
> We will define a field type for each type of field.  Note that each field 
> tuple will always contain a single field and a single value.  Possible field 
> types are the following.
> * string - This will be mapped into 2 mapping with multi mappings.  It will 
> be a string unanalyzed, and an analyzed string.  The 2 fields will then be 
> "string_u" and "string_a".  The Query visitor will need to update the field 
> name appropriately
> * long - An unanalyzed long
> * double - An unanalyzed double
> * boolean - An unanalyzed boolean
> * location - A geolocation field
> The entity path will be a flattened path from the root json element to the 
> max json element.  It can be though of as a path through the tree of json 
> elements.  We will use a dot '.' to delimit the fields.  X.Y.Z for nested 
> objects.  Primitive arrays will contain a field object for each element in 
> the array.
> h2. Indexing
>   When indexing entities, we will no longer modify or prefix field names.  
> They will be inserted into the value exactly as their path appears after 
> lower case.
> h2. Querying
>   When querying, the "contains" operation for a string will need to use the 
> "string_a" data type.  When using =, we will need to use the string_u data 
> type.  Each criteria will need to use nested object querying, to ensure the 
> property name and property value are both part of the same field tuple.
> h3. References
> Multi Field Mapping: 
> http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html
> Nested Objects: 
> http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (USERGRID-536) Change our index structure for static mapping and cleanup api

Reply via email to