someone who has experience with caffe and hadoop can share
their thoughts on this. Our project is at [0] in case more detail on use
case is required.
[0] - https://github.com/USCDataScience/hadoop-pot/
Thanks
--
Madhav Sharan
- Code snippet -
https://github.com/USCDataScience/hadoop-pot/blob/master/hadoop-pot-core/src/main/java/org/pooledtimeseries/seqfile/TextVectorsToSequenceFile.java
[1] - Blog for handling small files -
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
Cheers!
--
Madhav Sharan
On Thu, N
SeqFile and my map jobs are faster.
--
Madhav Sharan
On Wed, Aug 17, 2016 at 11:07 PM, Daniel Haviv
wrote:
> Store them within a sequencefile
>
>
> On Thursday, 18 August 2016, Madhav Sharan wrote:
>
>> Hi , can someone please recommend a fast way in hadoop to store and
/USCDataScience/hadoop-pot/blob/master/src/main/java/org/pooledtimeseries/PoT.java#L596
--
Madhav Sharan
so that I read file only once and my mapper jobs
receive contents of file rather than file path.
Can someone please share any technique they have used in past that might
help?
Thanks
--
Madhav Sharan
Thanks Mahesh
Till now I am not able to run the whole job in a limited time period. So I
am looking for optimizations and resource utilization. May be I can try
tweaking input split size if it helps.
Thanks for your help, It explains the behaviour
--
Madhav Sharan
On Tue, Aug 9, 2016 at 1:28
lived.
I was doing experiments and if I split input file into N files where N =
number of cores then my job starts running on all cores. So may be I need
to look at split size. Any trick to set split size = number of cores?
I can try adjusting mapred.min.split.size manually otherwise.
--
Mad
/
MeanChiSquareDistanceCalculation.java#L135
--
Madhav Sharan
/MeanChiSquareDistanceCalculation.java#L135
--
Madhav Sharan
ttp://blog.cloudera.com/blog/2009/02/the-small-files-problem/
--
Madhav Sharan
: 7336382464 (6.83 GB)
Non DFS Used: 60541867008 (56.38 GB)
DFS Remaining: 6155422720 (5.73 GB)
DFS Used%: 9.91%
DFS Remaining%: 8.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 847
--
Madhav Sharan
On Mon, Jul
Thanks a lot Ron. It helps
--
Madhav Sharan
On Sun, Jul 24, 2016 at 2:19 PM, Ron Gonzalez wrote:
> In a manner of speaking. I would imagine that you would like to take
> advantage of resource management that comes with yarn. If you're planning
> to make this a product that your
$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
--
Madhav Sharan
Hi Ron, Thanks for replying.
Unfortunately I could not find a VideoCapture method accepting stream input.
I will look into second option. Will it be similar to copying file from
hdfs to a tmp directory and then using tmp file?
--
Madhav Sharan
On Sun, Jul 24, 2016 at 12:08 PM, Ron's
s.java#L66
--
Madhav Sharan
15 matches
Mail list logo