hadoop data structures

2014-12-09 Thread steven

hi,


i got this code which extracts timeframes frome a logfile and does some 
calculation on it.

input lines looks like this:

1000,T,0,104,1000,1100,27147,80,80,80,80,81,81,98,98,98,101,137,137,139,177,177,177,173,166,149,134,130,124,119,111,104,92 

1000,T,1,743,300,300,4976,492,492,492,492,492,497,497,856,856,863,866,875,875,954,954,954,954,954,954,954,954,770,770,770,770,743 

1000,T,2,40,800,1000,11922,29,29,29,29,29,29,29,44,46,46,50,51,51,65,65,65,61,52,47,47,47,44,42,40,32,30 

2001,T,0,103,6700,7000,44658,80,80,80,80,80,81,96,98,98,101,134,137,139,220,192,176,168,162,156,149,144,132,122,112,104,95 


1002,U,


the first value being the time in ms,
T being the lines im interrested in
0,1,2 being a product ID,
104,743,40,103 being the price i want.


now i need to extract all prices for some specific timeframe, lets say 
3000ms.
the code at the end works but has the problem that the variable 
"numberOfRuns" is counted up and used to calculate the time and i guess 
using this system in hadoop doesnt work.
so i need a way to extract the "timeframes" in the mapper and what data 
structure would you use?







import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;

import java.util.List;

public class Test {

public List> splitFileByTime(List lines, 
int timeFrame) {
List> myTimes = new 
ArrayList>();



ArrayList lines_new = new ArrayList();


for (String z: lines) {
//System.out.println(z);
}

int numberOfRuns = 1;

for (String current : lines) {
String[] parts = current.split(",");

int time = Integer.parseInt(parts[0]);


if (time < 0) {
// Zeiten vor Beginn der Simulation, uninteressant
} else {



if (parts[1].contains("T")) {

lines_new.add(current);
}
else {

}
if (time >= timeFrame * numberOfRuns) {
numberOfRuns++;
myTimes.add(lines_new);


lines_new = new ArrayList();
}

}
}
return myTimes;
}



public void getOpenAndClose(List> lines) {

int abschnitt = 1;
for (ArrayList x: lines) {
System.out.println("Abschnitt: " + abschnitt);
List tmp = new ArrayList();
int high = 0;
int low = 1;
for (String y:x) {

String[] parts = y.split(",");
if (parts[2].contains("0")) {
int kurs = Integer.parseInt(parts[3]);
if (kurs > high) {
high = kurs;
}

if (kurs < low) {
low = kurs;
}
System.out.println("Produkt: " + parts[2] + " wurde 
um " + parts[0] + " gehandelt mit kurs: " + kurs );

tmp.add(kurs);


}

}
System.out.println("open: " + tmp.get(0));
System.out.println("close: " + tmp.get(tmp.size()-1));
System.out.println("high: " + high);
System.out.println("low: " + low);
abschnitt++;
}

}


public List readFile(String filename) {

List lines = new ArrayList();


BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}


String line;

try {
while ((line = reader.readLine()) != null) {
lines.add(line);

}
} catch (IOException e) {
e.printStackTrace();
}

try {
reader.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}


return lines;
}




public static void main(String[] args) {
//String filename = "Standard-2014-04-29-12-04.csv";
String filename = "Standard-small.txt";
//Zeitspanne für Zeilen in Millisekunden
int timeFrame = 3000;

Test x = new Test();

List lines = x.readFile(filename);
List> lines_split = x.splitFileByTime(lines, 
timeFrame);




x.getOpenAndClose(lines_split);

}
}



Re: hadoop data structures

2014-12-09 Thread Shahab Yunus
Are you asking about the type for the numberOfRuns variable which you are
declaring as a Java primitive int?

If yes, then you can use IntWritable class in Hadoop to define a integer
variable which will work with M/R

Regards,
Shahab

On Tue, Dec 9, 2014 at 3:47 AM, steven  wrote:

>  hi,
>
>
> i got this code which extracts timeframes frome a logfile and does some
> calculation on it.
> input lines looks like this:
>
> 1000,T,0,104,1000,1100,27147,80,80,80,80,81,81,98,98,98,101,137,137,139,177,177,177,173,166,149,134,130,124,119,111,104,92
>
> 1000,T,1,743,300,300,4976,492,492,492,492,492,497,497,856,856,863,866,875,875,954,954,954,954,954,954,954,954,770,770,770,770,743
>
> 1000,T,2,40,800,1000,11922,29,29,29,29,29,29,29,44,46,46,50,51,51,65,65,65,61,52,47,47,47,44,42,40,32,30
>
> 2001,T,0,103,6700,7000,44658,80,80,80,80,80,81,96,98,98,101,134,137,139,220,192,176,168,162,156,149,144,132,122,112,104,95
>
> 1002,U,
>
>
> the first value being the time in ms,
> T being the lines im interrested in
> 0,1,2 being a product ID,
> 104,743,40,103 being the price i want.
>
>
> now i need to extract all prices for some specific timeframe, lets say
> 3000ms.
> the code at the end works but has the problem that the variable
> "numberOfRuns" is counted up and used to calculate the time and i guess
> using this system in hadoop doesnt work.
> so i need a way to extract the "timeframes" in the mapper and what data
> structure would you use?
>
>
>
>
>
>
> import java.io.BufferedReader;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import java.io.IOException;
> import java.util.ArrayList;
>
> import java.util.List;
>
> public class Test {
>
> public List> splitFileByTime(List lines, int
> timeFrame) {
> List> myTimes = new
> ArrayList>();
>
>
> ArrayList lines_new = new ArrayList();
>
>
> for (String z: lines) {
> //System.out.println(z);
> }
>
> int numberOfRuns = 1;
>
> for (String current : lines) {
> String[] parts = current.split(",");
>
> int time = Integer.parseInt(parts[0]);
>
>
> if (time < 0) {
> // Zeiten vor Beginn der Simulation, uninteressant
> } else {
>
>
>
> if (parts[1].contains("T")) {
>
> lines_new.add(current);
> }
> else {
>
> }
> if (time >= timeFrame * numberOfRuns) {
> numberOfRuns++;
> myTimes.add(lines_new);
>
>
> lines_new = new ArrayList();
> }
>
> }
> }
> return myTimes;
> }
>
>
>
> public void getOpenAndClose(List> lines) {
>
> int abschnitt = 1;
> for (ArrayList x: lines) {
> System.out.println("Abschnitt: " + abschnitt);
> List tmp = new ArrayList();
> int high = 0;
> int low = 1;
> for (String y:x) {
>
> String[] parts = y.split(",");
> if (parts[2].contains("0")) {
> int kurs = Integer.parseInt(parts[3]);
> if (kurs > high) {
> high = kurs;
> }
>
> if (kurs < low) {
> low = kurs;
> }
> System.out.println("Produkt: " + parts[2] + " wurde um
> " + parts[0] + " gehandelt mit kurs: " + kurs );
> tmp.add(kurs);
>
>
> }
>
> }
> System.out.println("open: " + tmp.get(0));
> System.out.println("close: " + tmp.get(tmp.size()-1));
> System.out.println("high: " + high);
> System.out.println("low: " + low);
> abschnitt++;
> }
>
> }
>
>
> public List readFile(String filename) {
>
> List lines = new ArrayList();
>
>
> BufferedReader reader = null;
> try {
> reader = new BufferedReader(new FileReader(filename));
> } catch (FileNotFoundException e1) {
> e1.printStackTrace();
> }
>
>
> String line;
>
> try {
> while ((line = reader.readLine()) != null) {
> lines.add(line);
>
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
>
> try {
> reader.close();
> } catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
>
>
> return lines;
> }
>
>
>
>
> public static void main(String[] args) {
> //String filename = "Standard-2014-04-29-12-04.csv";
> String filename = "Standard-small.txt";
> //Zeitspanne für Zeilen in Millisekunden
> int timeFrame = 3000;
>
> Test x = new Test();
>
> List lines = x.readFile(filename);
> List> lines_split = x.s

Re: How to get hadoop issues data for research?

2014-12-09 Thread Akira AJISAKA

You can use REST API. Example:
https://issues.apache.org/jira/rest/api/2/search?jql=project%20%3D%20HADOOP

This general@ mailing list is for announcements and project management.
For end-user questions and discussions, please use user@ mailing list.

Regards,
Akira

(12/9/14, 18:22), zfx wrote:

Hi, all


I am a graduate student in Peking University, our lab do some research on open 
source projects.
This is our introduction:
https://passion-lab.org/


Now we need hadoop issues data for research, I found the issues list:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20HADOOP


I want to download the hadoop issues data, Could anyone tell me how to download 
the data? Or is there some links or API for download the data?


Many thanks!


Beat regards,
Feixue, Zhang‍





Eclipse plugin for Hadoop2.5.2

2014-12-09 Thread Todd
Hi,hadoopers,

I am new to hadoop. I am using Hadoop2.5.2 and Yarn as MR. I would ask the two 
ports, M/R(v2) master port and DFS Master port that is to be configured in the 
Eclipse hadoop plugin view.

Which properties do these ports correspond to in the hadoop configuration 
files,eg, yarn-site.xml.

Thanks.


Re: Eclipse plugin for Hadoop2.5.2

2014-12-09 Thread Ted Yu
bq. M/R(v2) master port
Did you mean port for resourcemanager ?

Take a look
at 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
where you can find:
yarn.resourcemanager.bind-host
yarn.nodemanager.bind-host

Cheers

On Tue, Dec 9, 2014 at 6:51 AM, Todd  wrote:

> Hi,hadoopers,
>
> I am new to hadoop. I am using Hadoop2.5.2 and Yarn as MR. I would ask the
> two ports, M/R(v2) master port and DFS Master port that is to be configured
> in the Eclipse hadoop plugin view.
>
> Which properties do these ports correspond to in the hadoop configuration
> files,eg, yarn-site.xml.
>
> Thanks.
>


Re:Re: Eclipse plugin for Hadoop2.5.2

2014-12-09 Thread Todd


I paste the Image below for what I mean, there are two ports out there 
(M/R(v2)Master) and DFS Master.
Wonder to know where these two ports come from.







At 2014-12-09 22:59:17, "Ted Yu"  wrote:

bq. M/R(v2) master port
Did you mean port for resourcemanager ?


Take a look at 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
 where you can find:
yarn.resourcemanager.bind-host

yarn.nodemanager.bind-host



Cheers


On Tue, Dec 9, 2014 at 6:51 AM, Todd  wrote:

Hi,hadoopers,

I am new to hadoop. I am using Hadoop2.5.2 and Yarn as MR. I would ask the two 
ports, M/R(v2) master port and DFS Master port that is to be configured in the 
Eclipse hadoop plugin view.

Which properties do these ports correspond to in the hadoop configuration 
files,eg, yarn-site.xml.

Thanks.




Re:Re:Re: Eclipse plugin for Hadoop2.5.2

2014-12-09 Thread Todd

I figured out that the default is 50020. Thank Ted.





At 2014-12-09 23:06:37, "Todd"  wrote:



I paste the Image below for what I mean, there are two ports out there 
(M/R(v2)Master) and DFS Master.
Wonder to know where these two ports come from.







At 2014-12-09 22:59:17, "Ted Yu"  wrote:

bq. M/R(v2) master port
Did you mean port for resourcemanager ?


Take a look at 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
 where you can find:
yarn.resourcemanager.bind-host

yarn.nodemanager.bind-host



Cheers


On Tue, Dec 9, 2014 at 6:51 AM, Todd  wrote:

Hi,hadoopers,

I am new to hadoop. I am using Hadoop2.5.2 and Yarn as MR. I would ask the two 
ports, M/R(v2) master port and DFS Master port that is to be configured in the 
Eclipse hadoop plugin view.

Which properties do these ports correspond to in the hadoop configuration 
files,eg, yarn-site.xml.

Thanks.




Re: API to find current active namenode.

2014-12-09 Thread Magnus Runesson
I was more interested in a way to do it programmatically. Found out it 
today that


Configuration conf =newConfiguration();
conf.addResource(newPath("/etc/hadoop/conf/core-site.xml"));
conf.addResource(newPath("/etc/hadoop/conf/hdfs-site.xml"));
String ns = conf.get("fs.defaultFS");
FileSystem fs = FileSystem.get(conf);

 does what I need without have to care about which namenode is active.

/Magnus

On 2014-12-08 22:18, Andras POTOCZKY wrote:

hi

# sudo -u hdfs hdfs haadmin -getServiceState nn1
active
# sudo -u hdfs hdfs haadmin -getServiceState nn2
standby

Where nn1 and nn2 are the dfs.ha.namenodes.mycluster property values.

This is what you need?

Andras


On 2014.12.08. 21:12, Magnus Runesson wrote:
I develop an application that will access HDFS. Is there a single API 
to get current active namenode?


I want it be independent of if my cluster has HA NameNode deployed or 
a single NameNode. The typical Hadoop-client configuration files will 
be installed on the host.


/Magnus






Possible typo in the Hadoop "Latest Stable Release Page"

2014-12-09 Thread Corey Nolet
I'm looking @ this page: http://hadoop.apache.org/docs/stable/

Is it  a typo that Hadoop 2.6.0 is based on 2.4.1?

Thanks.


Question about container recovery

2014-12-09 Thread scwf

Hi, all
  Here is my question: is there a mechanisms that when one container exit 
abnormally, yarn will prefer to dispatch the container on other NM?

We have a cluster with 3 NMs(each NM 135g mem) and 1 RM, and we running a job 
which start 13 container(= 1 AM + 12 executor containers).

Each NM has 4 executor container and the mem configured for each executor 
container is 30g. There is a interesting test, when we killed

4 containers in one NM1, only 2 containers restarted on NM1, other 2 containers 
reserved on the NM2 and NM3.

  Any idea?

Fei.