Re: An Architecture question on the use of virtualised clusters

2017-06-05 Thread Mich Talebzadeh
My main concern is that the choice of Isolin is not for one use case. It will be a strategic decision for the client and if we decide to go that way we are effectively moving away from HDFS principals (3x replication) etc as well. Granted one can argue this may be OK but of course we have to look

Re: An Architecture question on the use of virtualised clusters

2017-06-05 Thread John Leach
Mich, Yes, Isilon is in production... Isilon is a serious product and has been around for quite a while. For on-premise external storage, we see it quite a bit. Separating the compute from the storage actually helps. It is also a nice transition to the cloud providers. Have you looked

Re: An Architecture question on the use of virtualised clusters

2017-06-05 Thread Mich Talebzadeh
Hi John, Thanks. Did you end up in production or in other words besides PoC did you use it in anger? The intention is to build Isilon on top of the whole HDFS cluster!. If we go that way we also need to adopt it for DR as well. Cheers Dr Mich Talebzadeh LinkedIn *

Re: An Architecture question on the use of virtualised clusters

2017-06-05 Thread John Leach
Mich, We used Isilon for a POC of Splice Machine (Spark for Analytics, HBase for real-time). We were concerned initially and the initial setup took a bit longer than excepted, but it performed well on both low latency and high throughput use cases at scale (our POC ~ 100 TB). Just a data

Re: An Architecture question on the use of virtualised clusters

2017-06-05 Thread Mich Talebzadeh
I am concerned about the use case of tools like Isilon or Panasas to create a layer on top of HDFS, essentially a HCFS on top of HDFS with the usual 3x replication gone into the tool itself. There is interest to push Isilon as a the solution forward but my caution is about scalability and future

Re: An Architecture question on the use of virtualised clusters

2017-06-02 Thread Gene Pang
As Vincent mentioned earlier, I think Alluxio can work for this. You can mount your (potentially remote) storage systems to Alluxio , and deploy Alluxio co-located to the compute cluster. The computation framework will

Re: An Architecture question on the use of virtualised clusters

2017-06-01 Thread Mich Talebzadeh
As a matter of interest what is the best way of creating virtualised clusters all pointing to the same physical data? thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: An Architecture question on the use of virtualised clusters

2017-06-01 Thread vincent gromakowski
If mandatory, you can use a local cache like alluxio Le 1 juin 2017 10:23 AM, "Mich Talebzadeh" a écrit : > Thanks Vincent. I assume by physical data locality you mean you are going > through Isilon and HCFS and not through direct HDFS. > > Also I agree with you that

Re: An Architecture question on the use of virtualised clusters

2017-06-01 Thread Mich Talebzadeh
Thanks Vincent. I assume by physical data locality you mean you are going through Isilon and HCFS and not through direct HDFS. Also I agree with you that shared network could be an issue as well. However, it allows you to reduce data redundancy (you do not need R3 in HDFS anymore) and also you

Re: An Architecture question on the use of virtualised clusters

2017-06-01 Thread vincent gromakowski
I don't recommend this kind of design because you loose physical data locality and you will be affected by "bad neighboors" that are also using the network storage... We have one similar design but restricted to small clusters (more for experiments than production) 2017-06-01 9:47 GMT+02:00 Mich

Re: An Architecture question on the use of virtualised clusters

2017-06-01 Thread Mich Talebzadeh
Thanks Jorn, This was a proposal made by someone as the firm is already using this tool on other SAN based storage and extend it to Big Data On paper it seems like a good idea, in practice it may be a Wandisco scenario again.. Of course as ever one needs to EMC for reference calls ans whether

Re: An Architecture question on the use of virtualised clusters

2017-06-01 Thread Jörn Franke
Hi, I have done this (not Isilon, but another storage system). It can be efficient for small clusters and depending on how you design the network. What I have also seen is the microservice approach with object stores (e.g. In the cloud s3, on premise swift) which is somehow also similar. If

An Architecture question on the use of virtualised clusters

2017-05-31 Thread Mich Talebzadeh
Hi, I realize this may not have direct relevance to Spark but has anyone tried to create virtualized HDFS clusters using tools like ISILON or similar? The prime motive behind this approach is to minimize the propagation or copy of data which has regulatory implication. In shoret you want your