[ https://issues.apache.org/jira/browse/USERGRID-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Nine updated USERGRID-536: ------------------------------- Description: Currently, our dynamic mapping causes several issues with elastic search. We should change our mapping to use a static structure, and resolve this operational pain. We need to make the following changes. h2. Modify our IndexScope This should more closely resemble the elements of an edge since this represents an edge. It will simplify the use of our query module and make development clearer. This scope should be refactored into the following objects. * IndexEdge - Id, name, timestamp, edgeType (source or target) * SearchEdge - Id, name, edgeType Note: edgeType is the type of the Id within the edge. Does this Id represent a source Id, or does it represent a targetId? The entity to be indexed will implicitly be the opposite of the type specified. I.E if it's a source edge, the document is the target. If it's a target edge, the document is the source. These values should also be stored within our document, so that we can index our documents. Note that we perform bidirectional indexing in some cases, such was users, groups etc. When we do this, we need to ensure that mark the direction of the edge appropriately. h2. Change default sort ordering When sorting is unspecified, we should order by timestamp descending from our index edge. This ensures that we retain the correct edge time semantics, and will properly order collections and connections h2. Remove the legacy query class We don't need the Query class, it has far too many functions to be a well encapsulated object. Instead, we should simply take the string QL, the SearchEdge and the limit to return our candidates. From there, we should parse and visit the query internally to the query logic, NOT externally. h2. Create a static mapping The mapping should contains the following static fields. * entityId - The entity id * entityType - The entity type (from the id) * entityVersion - The entity version * edgeId - The edge Id * edgeName - The edge name * edgeTimestamp - The edge timestamp * edgeType - source | target * edgeSearch - edgeId + edgeName + edgeType It will then contain an array of "fields" Each of these fields will have the following formation. {code} { "name":"[entity field name as a path]", "[field type]":[field value} {code} We will define a field type for each type of field. Note that each field tuple will always contain a single field and a single value. Possible field types are the following. * string - This will be mapped into 2 mapping with multi mappings. It will be a string unanalyzed, and an analyzed string. The 2 fields will then be "string_u" and "string_a". The Query visitor will need to update the field name appropriately * long - An unanalyzed long * double - An unanalyzed double * boolean - An unanalyzed boolean * location - A geolocation field The entity path will be a flattened path from the root json element to the max json element. It can be though of as a path through the tree of json elements. We will use a dot '.' to delimit the fields. X.Y.Z for nested objects. Primitive arrays will contain a field object for each element in the array. h2. Indexing When indexing entities, we will no longer modify or prefix field names. They will be inserted into the value exactly as their path appears after lower case. h2. Querying When querying, the "contains" operation for a string will need to use the "string_a" data type. When using =, we will need to use the string_u data type. Each criteria will need to use nested object querying, to ensure the property name and property value are both part of the same field tuple. h3. References Multi Field Mapping: http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html Nested Objects: http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html was: Currently, our dynamic mapping causes several issues with elastic search. We should change our mapping to use a static structure, and resolve this operational pain. We need to make the following changes. h2. Modify our IndexScope This should more closely resemble the elements of an edge since this represents an edge. It will simplify the use of our query module and make development clearer. This scope should be refactored into the following objects. * IndexEdge - Id, name, timestamp, edgeType (source or target) * SearchEdge - Id, name, edgeType Note: edgeType is the type of the Id within the edge. Does this Id represent a source Id, or does it represent a targetId? The entity to be indexed will implicitly be the opposite of the type specified. I.E if it's a source edge, the document is the target. If it's a target edge, the document is the source. These values should also be stored within our document, so that we can index our documents. Note that we perform bidirectional indexing in some cases, such was users, groups etc. When we do this, we need to ensure that mark the direction of the edge appropriately. h2. Change default sort ordering When sorting is unspecified, we should order by timestamp descending from our index edge. This ensures that we retain the correct edge time semantics, and will properly order collections and connections h2. Remove the legacy query class We don't need the Query class, it has far too many functions to be a well encapsulated object. Instead, we should simply take the string QL, the SearchEdge and the limit to return our candidates. From there, we should parse and visit the query internally to the query logic, NOT externally. h2. Create a static mapping The mapping should contains the following static fields. * entityId - The entity id * entityType - The entity type (from the id) * entityVersion - The entity version * edgeId - The edge Id * edgeName - The edge name * edgeTimestamp - The edge timestamp * edgeType - source | target * edgeSearch - edgeId + edgeName + edgeType It will then contain an array of "fields" Each of these fields will have the following formation. {code} { "name":"[entity field name as a path]", "[field type]":[field value} {code} We will define a field type for each type of field. Note that each field tuple will always contain a single field and a single value. Possible field types are the following. * string - This will be mapped into 2 mapping with multi mappings. It will be a string unanalyzed, and an analyzed string. The 2 fields will then be "string_u" and "string_a". The Query visitor will need to update the field name appropriately * long - An unanalyzed long * double - An unanalyzed double * boolean - An unanalyzed boolean The entity path will be a flattened path from the root json element to the max json element. It can be though of as a path through the tree of json elements. We will use a dot '.' to delimit the fields. X.Y.Z for nested objects. Primitive arrays will contain a field object for each element in the array. h2. Indexing When indexing entities, we will no longer modify or prefix field names. They will be inserted into the value exactly as their path appears after lower case. h2. Querying When querying, the "contains" operation for a string will need to use the "string_a" data type. When using =, we will need to use the string_u data type. Each criteria will need to use nested object querying, to ensure the property name and property value are both part of the same field tuple. h3. References Multi Field Mapping: http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html Nested Objects: http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html > Change our index structure for static mapping and cleanup api > ------------------------------------------------------------- > > Key: USERGRID-536 > URL: https://issues.apache.org/jira/browse/USERGRID-536 > Project: Usergrid > Issue Type: Story > Components: Stack > Reporter: Todd Nine > Assignee: Todd Nine > > Currently, our dynamic mapping causes several issues with elastic search. We > should change our mapping to use a static structure, and resolve this > operational pain. > We need to make the following changes. > h2. Modify our IndexScope > This should more closely resemble the elements of an edge since this > represents an edge. It will simplify the use of our query module and make > development clearer. This scope should be refactored into the following > objects. > * IndexEdge - Id, name, timestamp, edgeType (source or target) > * SearchEdge - Id, name, edgeType > Note: edgeType is the type of the Id within the edge. Does this Id represent > a source Id, or does it represent a targetId? The entity to be indexed will > implicitly be the opposite of the type specified. I.E if it's a source edge, > the document is the target. If it's a target edge, the document is the > source. > These values should also be stored within our document, so that we can index > our documents. Note that we perform bidirectional indexing in some cases, > such was users, groups etc. When we do this, we need to ensure that mark the > direction of the edge appropriately. > h2. Change default sort ordering > When sorting is unspecified, we should order by timestamp descending from our > index edge. This ensures that we retain the correct edge time semantics, and > will properly order collections and connections > h2. Remove the legacy query class > We don't need the Query class, it has far too many functions to be a well > encapsulated object. Instead, we should simply take the string QL, the > SearchEdge and the limit to return our candidates. From there, we should > parse and visit the query internally to the query logic, NOT externally. > h2. Create a static mapping > The mapping should contains the following static fields. > * entityId - The entity id > * entityType - The entity type (from the id) > * entityVersion - The entity version > * edgeId - The edge Id > * edgeName - The edge name > * edgeTimestamp - The edge timestamp > * edgeType - source | target > * edgeSearch - edgeId + edgeName + edgeType > It will then contain an array of "fields" Each of these fields will have the > following formation. > {code} > { "name":"[entity field name as a path]", "[field type]":[field value} > {code} > We will define a field type for each type of field. Note that each field > tuple will always contain a single field and a single value. Possible field > types are the following. > * string - This will be mapped into 2 mapping with multi mappings. It will > be a string unanalyzed, and an analyzed string. The 2 fields will then be > "string_u" and "string_a". The Query visitor will need to update the field > name appropriately > * long - An unanalyzed long > * double - An unanalyzed double > * boolean - An unanalyzed boolean > * location - A geolocation field > The entity path will be a flattened path from the root json element to the > max json element. It can be though of as a path through the tree of json > elements. We will use a dot '.' to delimit the fields. X.Y.Z for nested > objects. Primitive arrays will contain a field object for each element in > the array. > h2. Indexing > When indexing entities, we will no longer modify or prefix field names. > They will be inserted into the value exactly as their path appears after > lower case. > h2. Querying > When querying, the "contains" operation for a string will need to use the > "string_a" data type. When using =, we will need to use the string_u data > type. Each criteria will need to use nested object querying, to ensure the > property name and property value are both part of the same field tuple. > h3. References > Multi Field Mapping: > http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html > Nested Objects: > http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)