[jira] [Commented] (ATLAS-872) Add Multitenancy support to Atlas

2016-06-07 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319173#comment-15319173
 ] 

CASSIO DOS SANTOS commented on ATLAS-872:
-

We have implemented multi-tenancy on top of Atlas but, if nothing else, it's 
not efficient, and as we need to implement a different design where we store a 
separate graph per tenant.


I've discussed a "non-disruptive" design for this with Neeru which would be 
based on the use of a ThreadLocal variable which could be set by intercepting 
calls to the Atlas API (from a configured servlet filter, maybe) to get the 
tenant ID passed by the calling application in the HTTP request header and 
checking for that variable and applying it accordingly when submitting requests 
to the underlying graph storage layer (via the new AAG layer or direct access 
to Titan or HBase, etc). This may also involve changes to the type cache 
provider, depending on how it loads data from the storage.

If anyone thinks this approach may not work or turn out to be problematic to 
implement for some reason, or have other ideas or need more details, please 
share your thoughts here.

> Add Multitenancy support to Atlas
> -
>
> Key: ATLAS-872
> URL: https://issues.apache.org/jira/browse/ATLAS-872
> Project: Atlas
>  Issue Type: New Feature
>Affects Versions: 0.7-incubating
>Reporter: Neeru Gupta
>Assignee: Neeru Gupta
> Fix For: trunk
>
>
> Atlas currently does not support multi tenancy.  As part of this feature, 
> will add support to honor requests coming from multiple tenants. Individual 
> Tenant data should remain isolated from one another. 
> All the unique constraints should be applied per tenant and not globally. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-819) All user defined types should have a set of common attributes

2016-05-24 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299053#comment-15299053
 ] 

CASSIO DOS SANTOS commented on ATLAS-819:
-

Can you confirm that this is not going to be the default for "all user defined 
types" as stated in the description, but instead an optional base class, or 
rather a couple of optional base classes, one with the read-only system 
attributes and one with 'name' and 'description', as per Dave comments, with 
which I fully agree? 

We have such classes in our application and other applications are likely to 
have something similar, so making it optional would give more flexibility to 
application developers. 

Based on my experience, it's not uncommon to have classes for which even those 
read-only system attributes are not required or desired (unnecessary overhead), 
as they represent objects that may need to be referenced from multiple objects 
but are to be otherwise handled as "lightweight" sub-objects of another 
root/parent object. 

> All user defined types should have a set of common attributes
> -
>
> Key: ATLAS-819
> URL: https://issues.apache.org/jira/browse/ATLAS-819
> Project: Atlas
>  Issue Type: Bug
>Reporter: Hemanth Yamijala
>
> It would be very convenient if all user defined types have a conventional set 
> of common attributes including:
> * name
> * description
> * owner
> * created at
> * modified at



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ATLAS-737) Ability to retrieve the size of the result of a DSL search without having to retrieve the result

2016-05-02 Thread CASSIO DOS SANTOS (JIRA)
CASSIO DOS SANTOS created ATLAS-737:
---

 Summary: Ability to retrieve the size of the result of a DSL 
search without having to retrieve the result
 Key: ATLAS-737
 URL: https://issues.apache.org/jira/browse/ATLAS-737
 Project: Atlas
  Issue Type: Sub-task
Reporter: CASSIO DOS SANTOS


This can be implemented analogously to a "select count *"  in SQL and is more 
relevant given the added ability to paginate a search result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-541) Soft deletes

2016-04-05 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227027#comment-15227027
 ] 

CASSIO DOS SANTOS commented on ATLAS-541:
-

I think that enabling/disabling soft delete at the type level would give a 
better level of flexibility.
A similar approach could be adopted when versioning support is added.


> Soft deletes
> 
>
> Key: ATLAS-541
> URL: https://issues.apache.org/jira/browse/ATLAS-541
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Shwetha G S
>Assignee: Shwetha G S
>
> We don't have graph versioning currently and hard deletes are not acceptable 
> for data governance. This jira tracks the proposal for soft deletes which can 
> mark an entity as deleted and by default search should return only active 
> entities. However, there should be an option to retrieve deleted entities



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-517) Upgrade titan to 1.x

2016-03-15 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196005#comment-15196005
 ] 

CASSIO DOS SANTOS commented on ATLAS-517:
-

[~yhemanth] I more than understand the dependencies conundrum you have to deal 
with at the platform level, but am I right to assume that getting Atlas to run 
in multiple platforms or environments is a valid project goal?

I can file a JIRA for the HBase 1.2.x, 1.3 support as you suggested, but we're 
also evaluating other short term alternatives like Cassandra so it's not clear 
yet what priority we would want to assign to that work.

When we get to work on the changes to support Titan 1.x, could we try to 
further isolate/minimize the dependencies on Titan by adding an intermediate 
generic TinkerPop 3 based layer, so that support for different versions of 
different graph stores could be more easily plugged in in the future? If you 
have any ideas or plans along those lines we'd like to learn more about them,  
this is an area we'd be very interested in contributing to.

Releasing on Titan 1.x instead of 0.54 would also allow us to avoid having to 
deal with data migration.




> Upgrade titan to 1.x
> 
>
> Key: ATLAS-517
> URL: https://issues.apache.org/jira/browse/ATLAS-517
> Project: Atlas
>  Issue Type: Wish
>Affects Versions: trunk
>Reporter: Nigel Jones
>
> titan 0.54 currently ships with, and is supported by, Atlas.
> This itself officially supports
>  - Cassandra 1.2.z, 2.0.z
>  - HBase 0.94.z, 0.96.z, 0.98.z
>  - ElasticSearch 1.0.z, 1.1.z, 1.2.z
>  - Solr 4.8.1
>  - Tinkerpop 2.5.z
> souce: http://s3.thinkaurelius.com/docs/titan/0.5.4/version-compat.html
> As of 24 Feb 2015 titan 1.0.0 is current and supports
>  - Cassandra 1.2.z, 2.0.z, 2.1.z (ADDS support for 2.1)
>  - HBase 0.94.z, 0.96.z, 0.98.z, 1.0.z (ADDS support for 1.0)
>  - ElasticSearch 1.0.z, 1.1.z, 1.2.z (DROPS these, ADDS 1.5)
>  - Solr 5.2.z (DROPS 4.8.1, ADDS 5.2.z)
>  - Tinkerpop 3.0.z (DROPS 2.5, ADDS 3.0)
> In addition in the titan community 1.1 is now being built, and there are 
> discussions around tinkerpop 3.1 support, as well as hadoop2
> source: 
> https://groups.google.com/forum/#!searchin/aureliusgraphs/1.1/aureliusgraphs/e5L5M6MQozY/QHXtx5hFAwAJ
> I would like to be able to use current versions of titan as my graph store in 
> order to be able to benefit from
>  - a platform on which to better integrate with tinkerpop (see separate issue 
> to be raised)
>  - improvements in indexing
>  - more recent HBase and Cassandra support for underlying storage
> Given titan 1.1 is imminent I would be inclined to aim for that as a target & 
> perhaps we should start experimenting with titan 1.0 since there have been 
> API changes 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-487) Externalize tag in search method

2016-03-14 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194226#comment-15194226
 ] 

CASSIO DOS SANTOS commented on ATLAS-487:
-

[~yhemanth] I think that 
[Atlas-491|https://issues.apache.org/jira/browse/ATLAS-491] is related but more 
generic and at a higher level of abstraction than multi-tenancy considered in 
this thread, where hierarchies are not involved. I wonder if that could be 
implemented more efficiently (and securely) via some "lower-level" partitioning 
of the data, possibly at the graph database layer?  Ideally the storage layer 
under Atlas should provide that type of capability so we were not required to 
do any query rewriting.

> Externalize tag in search method
> 
>
> Key: ATLAS-487
> URL: https://issues.apache.org/jira/browse/ATLAS-487
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Prasad  S Madugundu
>Priority: Critical
>
> Tagging metadata (or adding traits to metadata) can be used for 
> classification of metadata and metadata partitioning for multi-tenancy 
> purpose or partition based on the organization hierarchy. In these use cases, 
> it would be ideal if I can pass the trait as a separate parameter to the 
> search method, instead of including the tag as a predicate in the query 
> string. 
> If I have a complex query that retrieves metadata from multiple types, then 
> the query becomes more complex if I need to add predicates for the tags for 
> all the types that are used in the query.
> Externalizing the tag from the search query would also lead to better 
> structure for the client code, because I can add the classification or 
> partition to the query without modifying the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-517) Upgrade titan to 1.x

2016-03-14 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193853#comment-15193853
 ] 

CASSIO DOS SANTOS commented on ATLAS-517:
-

[~yhemanth] I wanted to check if there has been any activity / discussion on 
that front on your side, and any updates to your plan. Our cloud platform 
supports more recent stable versions of some of the products Atlas depends on, 
like Titan 1.x and HBase 1.2.x, and having to deploy with older versions 
prevents us from leveraging many of the management services available on the 
newer versions, not to mention the fact that some of those older versions are 
no longer supported by their providers. 

I understand that in this particular case you have the additional challenge of 
moving to Java 8, but in the more general case, it seems that the ability to 
more quickly validate and support newer versions of the underlying data stores 
may become more critical in environments like the cloud. I've noticed that the 
Atlas code has some mechanisms in place to address that on top of what Titan 
provides. If getting Atlas supported on top of Titan 1.x is going to require 
more work and take longer, one temporary option that may work for us is to get 
Titan 0.54 to work on top of HBase 1.2.x, which could require some changes to 
the Atlas code that would likely be  much more localized and less disruptive 
that a full port to the latest version of Titan.



> Upgrade titan to 1.x
> 
>
> Key: ATLAS-517
> URL: https://issues.apache.org/jira/browse/ATLAS-517
> Project: Atlas
>  Issue Type: Wish
>Affects Versions: trunk
>Reporter: Nigel Jones
>
> titan 0.54 currently ships with, and is supported by, Atlas.
> This itself officially supports
>  - Cassandra 1.2.z, 2.0.z
>  - HBase 0.94.z, 0.96.z, 0.98.z
>  - ElasticSearch 1.0.z, 1.1.z, 1.2.z
>  - Solr 4.8.1
>  - Tinkerpop 2.5.z
> souce: http://s3.thinkaurelius.com/docs/titan/0.5.4/version-compat.html
> As of 24 Feb 2015 titan 1.0.0 is current and supports
>  - Cassandra 1.2.z, 2.0.z, 2.1.z (ADDS support for 2.1)
>  - HBase 0.94.z, 0.96.z, 0.98.z, 1.0.z (ADDS support for 1.0)
>  - ElasticSearch 1.0.z, 1.1.z, 1.2.z (DROPS these, ADDS 1.5)
>  - Solr 5.2.z (DROPS 4.8.1, ADDS 5.2.z)
>  - Tinkerpop 3.0.z (DROPS 2.5, ADDS 3.0)
> In addition in the titan community 1.1 is now being built, and there are 
> discussions around tinkerpop 3.1 support, as well as hadoop2
> source: 
> https://groups.google.com/forum/#!searchin/aureliusgraphs/1.1/aureliusgraphs/e5L5M6MQozY/QHXtx5hFAwAJ
> I would like to be able to use current versions of titan as my graph store in 
> order to be able to benefit from
>  - a platform on which to better integrate with tinkerpop (see separate issue 
> to be raised)
>  - improvements in indexing
>  - more recent HBase and Cassandra support for underlying storage
> Given titan 1.1 is imminent I would be inclined to aim for that as a target & 
> perhaps we should start experimenting with titan 1.0 since there have been 
> API changes 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-511) Ability to run multiple instances of Atlas Server with automatic failover to one active server

2016-03-08 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185159#comment-15185159
 ] 

CASSIO DOS SANTOS commented on ATLAS-511:
-

Hemanth Yamijala, you're right about what I meant by on "demand", lazy loading 
is maybe a better way to refer to it.
Having said that, as Venkata Madugundu has considered in his comments, 
evaluating the impact to performance of turning cache off is something that we 
could maybe help with in the short term. I'd like to know your thoughts on 
other options like the use of a distributed cache, or a cache with some 
"refresh-if-obsolete" policy that would check for type timestamp in the backend 
store to decide if a cached type needs to be refreshed, or a "refresh type" 
broadcast to all instances. I agree we should use separate JIRAs from this one 
to cover those different alternatives, and maybe have some investigative or 
proof of concept work done in parallel on some of them.

> Ability to run multiple instances of Atlas Server with automatic failover to 
> one active server
> --
>
> Key: ATLAS-511
> URL: https://issues.apache.org/jira/browse/ATLAS-511
> Project: Atlas
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: Hemanth Yamijala
> Attachments: HADesign.pdf
>
>
> One of the most important components that only supports active-standby mode 
> currently is the Atlas server which hosts the API / UI for Atlas. As 
> described in the [HA 
> Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html],
>  we currently are limited to running only one instance of the Atlas server 
> behind a proxy service. If the running instance goes down, a manual process 
> is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server 
> instances. However, as a first step, only one of them will be actively 
> processing requests. To have a consistent terminology, let us call that 
> server the *master*. Any requests sent to the other servers will be 
> redirected to the master.
> When the master suffers a partition, one of the other servers must 
> automatically become the master and start processing requests. What this mode 
> brings us over the current system is the ability to automatically failover 
> the Atlas server instance without any  manual intervention. Note that this 
> can be arguably called an [active/active 
> setup|https://en.wikipedia.org/wiki/High-availability_cluster]
> ATLAS-488 raised to support multiple active Atlas server instances. While 
> that would be ideal, we have to learn more about the underlying system 
> behavior before we can get there, and hopefully we can take smaller steps to 
> improve the system systematically. The method proposed here is similar to 
> what is adopted in many other Hadoop components including HDFS NameNode, 
> HBase HMaster etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-511) Ability to run multiple instances of Atlas Server with automatic failover to one active server

2016-03-07 Thread CASSIO DOS SANTOS (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183156#comment-15183156
 ] 

CASSIO DOS SANTOS commented on ATLAS-511:
-

A couple of questions:

Should "TitanGraphProvider solr 5 index added" be marked in bold on page 2?

Will the failover time grow as the number of types that need to be loaded 
increases (and possibly due to other factors), in particular if Atlas is used 
in a multi-tenant environment, and in that sense should you consider things 
like on-demand type cache initialization or a distributed cache?


> Ability to run multiple instances of Atlas Server with automatic failover to 
> one active server
> --
>
> Key: ATLAS-511
> URL: https://issues.apache.org/jira/browse/ATLAS-511
> Project: Atlas
>  Issue Type: Sub-task
>Reporter: Hemanth Yamijala
>Assignee: Hemanth Yamijala
> Attachments: HADesign.pdf
>
>
> One of the most important components that only supports active-standby mode 
> currently is the Atlas server which hosts the API / UI for Atlas. As 
> described in the [HA 
> Documentation|http://atlas.incubator.apache.org/0.6.0-incubating/HighAvailability.html],
>  we currently are limited to running only one instance of the Atlas server 
> behind a proxy service. If the running instance goes down, a manual process 
> is required to bring up another instance.
> In this JIRA, we propose to have an ability to run multiple Atlas server 
> instances. However, as a first step, only one of them will be actively 
> processing requests. To have a consistent terminology, let us call that 
> server the *master*. Any requests sent to the other servers will be 
> redirected to the master.
> When the master suffers a partition, one of the other servers must 
> automatically become the master and start processing requests. What this mode 
> brings us over the current system is the ability to automatically failover 
> the Atlas server instance without any  manual intervention. Note that this 
> can be arguably called an [active/active 
> setup|https://en.wikipedia.org/wiki/High-availability_cluster]
> ATLAS-488 raised to support multiple active Atlas server instances. While 
> that would be ideal, we have to learn more about the underlying system 
> behavior before we can get there, and hopefully we can take smaller steps to 
> improve the system systematically. The method proposed here is similar to 
> what is adopted in many other Hadoop components including HDFS NameNode, 
> HBase HMaster etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)